Comparison of Confidence and Prediction Intervals for Different Mixed-Poisson Regression Models
|
|
- Shonda Pitts
- 6 years ago
- Views:
Transcription
1 0 0 0 Comparison of Confidence and Prediction Intervals for Different Mixed-Poisson Regression Models Submitted by John E. Ash Research Assistant Department of Civil and Environmental Engineering, University of Washington Box 00, Seattle, WA -00 Tel: () - jeash@uw.edu Yajie Zou, Ph. D. Key Laboratory of Road and Traffic Engineering Ministry of Education Tongji University, Shanghai 00, China Tel: () 0 yajiezou@hotmail.com Dominique Lord, Ph. D. Professor Department of Civil Engineering Texas A&M University, TAMU College Station, TX - Tel: () -, fax: () - d-lord@tamu.edu Yinhai Wang, Ph.D. (Corresponding Author) Professor Department of Civil and Environmental Engineering, University of Washington Box 00, Seattle, WA -00 Tel: (0) -, Fax: (0) - yinhai@uw.edu Word Count:, + ( figure * 0) + ( tables * 0) =, words Submitted for Presentation at the th Annual Meeting of the Transportation Research Board Washington, D.C., January -, 0 Submitted on August, st, 0 Revised November 0
2 0 ABSTRACT A major focus for transportation safety analysts is the development of crash prediction models, a task for which an extremely wide selection of model types are available. Perhaps the most common crash prediction model is the negative binomial (NB) regression model. The NB model gained popularity due to its relative ease of implementation and its ability to handle overdispersion in crash data. Recently, many new models including the Poisson-inverse-Gaussian, Sichel, Poissonlognormal, and Poisson-Weibull models have been introduced as they can also accommodate overdispersion and could potentially replace the NB model, since many have been found to perform better. All five of the aforementioned models, including the NB model, can be classified as mixed-poisson models. A mixed-poisson model arises when an error term, following a chosen mixture distribution, enters the functional form for the Poisson parameter. For the NB model, the mixture distribution is selected as gamma, hence the alternate model name of Poisson-gamma model. In this paper, confidence intervals (CIs) for the Poisson mean (μ) and Poisson parameter (m, alternately referred to as safety), as well as prediction intervals (PIs) for the predicted number of crashes at a new site are derived for each of the aforementioned types of mixed-poisson models. After the derivations, the theory is put into practice when CIs and PIs are estimated for mixed- Poisson models developed from a Texas crash dataset. Ultimately, this study provides safety analysts with tools to express levels of uncertainty associated with estimates from safety-modeling efforts instead of simply providing point estimates.
3 Ash et al INTRODUCTION Transportation safety analysts often develop statistical models to predict crash frequencies that take into account a variety of factors including geometric characteristics of facilities and traffic volumes among many others (). With constant advances in statistical methodologies, a variety of potential model types are available to analyze crash frequency data (). Early efforts in crash frequency modeling typically focused around the use of a Poisson regression model to predict crash frequency (-). Although the Poisson model is very straightforward to use, it is unable to handle overdispersion (and underdispersion for that matter) that is commonly observed in crash data (). Overdispersion is said to occur when the variance of the crash counts is found to be greater than the mean and is quite common in crash data (). Lord et al. () noted that overdispersion is a consequence of considering crash data as resulting from Poisson trials (i.e., Bernoulli trials where the probability of a crash in each trial is not constant). Common features of overdispersed crash datasets include high frequency of zero-valued and/or large-valued crash counts that are not able to be modeled properly by a simple Poisson distribution (). In an effort to accommodate the overdispersion in crash data, a variety of models have been introduced. Perhaps the most popular is the negative binomial (NB) model which has been used by many researchers to model overdispersed crash data (; -). A key feature of the NB model is the assumption that the mean crash frequency (i.e., the Poisson parameter) for any site i, λi, follows a gamma distribution (). Thus, a formulation for the marginal mean and variance for a crash count, yi, is obtained in which the variance can exceed the mean (; ; ). The NB model is alternately referred to as the Poisson-gamma model as the crash count for site i, yi, conditioned on the Poisson parameter λi (which follows the gamma distribution), is itself Poisson distributed. That being said, there is no reason to assume that λi must be Gamma distributed (; ). In fact, researchers have investigated a variety of other distributions for the λ parameter, which result in several other types of mixed-poisson regression models (i.e., models in which the crash count conditioned on the Poisson parameter, whose distribution is known as the mixing or mixture distribution, follows a Poisson distribution) (; ). For example, one alternate choice of mixture distributions is the generalized inverse Gaussian (GIG) distribution which gives rise to the Sichel (SI) model for modeling overdispersed count data (). Zou et al. () applied a Sichel model to analyze a highly-dispersed crash dataset from Texas and compared the results to those obtained from a traditional NB model. They found the SI model yielded lower values of both the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) (i.e., better statistical fit) than the NB model. Yet another choice of mixture distribution is the inverse Gaussian (IG) distribution, which gives rise to the Poisson-inverse-Gaussian (PIG) model (). Zha et al. (0) analyzed the aforementioned Texas crash dataset as well as a crash dataset from Washington and found that PIG regression models provided better fit (in terms of AIC and BIC) than traditional NB models. Other possible choices for the mixture distribution include, but are not limited to, the Weibull and lognormal distributions, leading to the Poisson-Weibull (PW) and Poisson-lognormal (PLN) models, respectively. Both types of models can accommodate overdispersion; the PW model has been applied to crash data by Cheng et al. () and the PLN model has been investigated by Lord and Miranda-Moreno () and Aguero-Valverde and Jovanis (), among others. All of the aforementioned mixed-poisson regression models provide only a point estimate of the expected crash frequency at a given site. Although point estimates may provide some benefit in prediction, in many cases a confidence interval for a given estimate is preferable (). Notably, confidence intervals are important for use in safety decision-making as they express the uncertainty for a given point estimate (; ). Wood () derived formulae for the prediction intervals (PIs)
4 Ash et al for the predicted response (i.e., crash frequency at a new site, yi), and confidence intervals (CI)s for the gamma mean (mi) and the true mean crash frequency (alternately referred to as the mean response or Poisson mean, μi), from the NB (Poisson-gamma) regression model. It is important to note the distinction here between the Poisson parameter and the Poisson mean. In the case of standard Poisson regression, the two values are in fact equal. However, in a mixed-poisson model, introduction of an error term into the Poisson parameter makes it such that the two terms are no longer equal. Further detail on this issue is provided later in the paper. Ultimately, he noted that PIs and CIs may be especially useful to predict the number of crashes expected to occur at different site similar features to sites considered in the model development. Lord () developed a methodology for calculating the predicted confidence intervals for the multiplication of NB regression models with crash modification factors (CMFs). Geedipally and Lord () compared PIs for the predicted response (y) and CIs for the gamma mean (m) and mean response/poisson mean (μ) as computed from NB models with fixed and varying dispersion parameters, as well as for univariate and bivariate NB models (). Lord et al. () estimated PIs for the number of crashes (y) as obtained from an NB model with several covariates and compared them to those estimated from a baseline model (i.e., flow-only model) that was adjusted with accidentmodification factors (AMFs). Connors et al. () plotted CIs for predicted values of μ and m for variable flows and segment lengths for the NB and PLN cases; they did not, however, provide explicit formulae for the CIs and PIs associated with the PLN model. The goal of this study is to extend the work of Wood () and develop the associated CIs and PIs for a broader range of mixed-poisson regression models commonly used by safety analysts today. In order to ensure this work aligns with Wood (), the use of PI in reference to y and the use of CI in reference to m and μ will continue henceforth in this paper. Besides reviewing the derivation of the PI for y and the CIs for m and μ, respectively, for the NB model (Model ) per Wood (), derivations of CIs and PIs for the aforementioned three values (where m is now generalized to be the Poisson parameter which follows the given mixture distribution, alternately referred to as the safety) will be provided for the Poisson-inverse-Gaussian (Model ), Sichel (Model ), Poisson-Weibull (Model ), and Poisson-lognormal (Model ) regression models. Then, a case study making use of the aforementioned Texas crash dataset will be conducted, in which the five mixed-poisson models in consideration will be estimated. Once the models have been estimated, CIs will be estimated and plotted for y, m, and μ, as determined from each model. These CIs and PIs will then be compared and discussed with regards to the case study and in general terms. Ultimately, this study provides safety analysts with tools to express levels of uncertainty associated with estimates from safety-modeling efforts instead of simply providing point estimates. DERIVATION OF CONFIDENCE AND PREDICTION INTERVALS This section provides the derivations for the confidence and prediction intervals for each type of mixed-poisson model considered in this study. First, however, brief background information on mixed-poisson models is provided. In general, there are three-levels in the hierarchy of a mixed- Poisson model. At the lowest level in the hierarchy is the mean response (μi), also known as the Poisson mean, which itself follows a normal distribution, N(μ0,σ ). One level up is the Poisson parameter (mi), alternately known as the safety, which when conditioned on the follows the mixture distribution in consideration. Finally, there is the predicted response (yi), i.e., the crash frequency at site i, which when conditioned on the Poisson parameter (mi), follows a Poisson distribution.
5 Ash et al Mixed-Poisson Models and Formulation A mixed-poisson model is defined by two primary criteria. First, the count in consideration (i.e., number of crashes yi) follows a Poisson distribution conditional on the Poisson parameter λi (). Following the terminology and notation of Wood (), the Poisson parameter λi will be referred to as the safety and denoted mi: f(y i m i ) = exp( m i ) m y i i, y m i! i = 0,, () Second, the Poisson parameter, mi, has a multiplicative error term following a chosen mixture distribution (e.g., gamma, inverse Gaussian etc.) that is expressed in the conditional mean (i.e., the safety conditioned on the mean response μi) (). Note without the error term, the expression reduces to that of the mean response (μi), alternately referred to as the Poisson mean. K m i = exp (β 0 + j= x ij β j + ε i ) K = exp(β 0 + ε i ) exp( j= x ij β j ) K = exp(β 0 + j= x ij β j ) exp(ε i ) = μ i ν i () Where, i = site index; βj = j th regression coefficient; xij = j th covariate for site i; and εi = error term such that exp(εi), itself referred to as νi, follows the chosen mixture distribution. As aforementioned, Y mi~poisson(mi). The marginal distribution for Y is derived by integrating out the error term νi as follows, note g(νi) is the mixture distribution (): f(y i μ i ) = g(y i μ i, ν i ) h(ν i ) dν 0 i = E ν [g(y i μ i, ν i )] () It can then be shown, via application of the equality mi=μiνi, that the Poisson parameter mi does indeed follow the mixture distribution (just like νi) (). Parametrizations of Mixture Distributions A total of five mixed-poisson models are considered in this study. The models, corresponding mixture distributions for mi and νi, and parameterizations for the mixture distributions are presented in the following where the section header notes the model type and the distribution in parentheses is the mixture distribution. The subscript i is left out without loss of generality. Negative Binomial [NB] Model (Gamma) The NB model arises when the choice of mixture distribution for the Poisson-mixture model is chosen to the gamma distribution. Specifically, ν~gamma(δ,φ); however, in order to properly
6 Ash et al. 0 identify the intercept in the regression equation, E[ν]=. This result is obtained by setting δ=φ, and thus leading to a one-parameter gamma distribution. It then follows that Var[ν]=/φ, alternately stated Var[ν]=α, and further that m φ,μ~gamma(φ,φ/μ) (). Poisson-Inverse-Gaussian [PIG] Model (Inverse Gaussian) The PIG model arises when the inverse Gaussian (IG) distribution is selected as the mixture distribution. Specifically, ν~ig(μig,λ), where the subscript IG is used to distinguish μig (the mean of the IG distribution) from the Poisson mean (μ). As was the case with the NB model, the intercept identification condition calls for E[ν]=. If μig=, then E[ν]=, and further Var[ν]=/λ (0). Sichel [SI] (Generalized Inverse Gaussian) If the mixture distribution is selected to be the generalized inverse Gaussian (GIG) distribution, the Sichel model is obtained. Here, ν~gig(μgig,σ,νgig), where the subscript GIG is used to distinguish the mean (μgig) and shape parameter (νgig) of the GIG distribution from the Poisson mean (μ) and error term (ν), respectively. If E[ν]= (intercept identification condition), the variance of ν is expressed as follows (; 0): Where, Var[ν] = σ(ν GIG+) c + c () c = R νgig ( ); σ R λ (t) = K λ+(t)/k λ (t); and K λ (t) = xλ exp [ t(x + x )] dx (where, Kλ(t) is the modified Bessel function 0 of the third kind). Poisson-lognormal [PLN] (Lognormal) When the mixture distribution is selected as the lognormal distribution, the Poisson-Lognormal model is obtained. Here, ν~log N(d, σ LN ) and the LN subscript is used to denote σ LN is the variance 0 0 of the lognormal distribution for ν, not the variance of the normal distributed Poisson mean, σ 0. The mean of the lognormal distribution is expressed as follows (): E[ν] = exp (d + σ LN ) () The intercept identification condition E[ν]= is then obtained by requiring d=-σ LN /. The variance of ν is then obtained by substituting the aforementioned expression for d into the following equation for Var[ν]: Var[υ] = (e σ LN )e d+σ LN = e σ LN ()
7 Ash et al Poisson-Weibull [PW] (Weibull) When the error term, ν, follows the Weibull distribution, the Poisson-Weibull model is obtained. Specifically, ν~weibull(μwei,σ), where WEI is used to distinguish the mean of the Weibull distribution from the Poisson mean (μ). The mean of the Weibull distribution is expressed as follows (; 0): hold: E[ν] = /σ Γ ( + ) () μ σ WEI Thus, in order to meet the intercept identification condition of E[ν]=, the following must μ WEI = (Γ ( σ + )) σ With E[ν]=, the variance of ν can thus be obtained by substituting the aforementioned expression for μwei into the following equation for Var[ν]. Var[ν] = /σ [Γ ( ] + ) (Γ μ σ ( + )) σ WEI = Γ( σ +) (Γ( σ +)) () Derivation of Confidence Intervals for Poisson Mean (True Mean Crash Frequency) (μ) For this study, we consider a generalized linear model (GLM) for crash prediction, where each site of interest is a road segment, of form shown in Equation ( a) and ( b). η = log ( μ n L t 0) + i= x ij β j ( a) η = log ( μ n L t 0 + i= x ij β j ( b) Where, η = linear predictor; βj = j th regression coefficient; xij = j th predictor for segment (site); L = segment length (mi); and t = time period over which crash data was collected. The product of segment length and the duration over which crash data was collected for each site are considered as an offset, leading to a reformulation in Equation () of the regression as follows: n η = log(μ) = log (β 0 ) + i= x ij β j + log (L t) () ()
8 Ash et al Under Equation () if we consider X as a traffic volume (F), we have: μ = β 0 F β (L t) exp ( x ij β j ) n i= In the GLM, estimators for the regression coefficients, β j, follow a multivariate normal distribution, [β 0,, β n] ~N([β 0,, β n ], Σ) (). From, Equation (), it is clear that μ=exp(η). As done previously, the subscript i for the values of η and μ at site i are omitted without loss of generality. We can use this fact to derive an approximate (-α)% confidence interval for the Poisson mean (alternately, the true mean crash count), μ, as follows (where Z-α/ is the critical value for the -α/ quantile of the standard normal distribution) (): exp (η ± Z α/ Var(η )) = exp(η ) exp (±Z α/ Var(η )) = μ exp (±Z α/ Var(η )) μ = [, μ exp(z exp(z α/ Var(η )) α/ Var(η ))] () Thus, regardless of the choice of mixture distribution, the approximate (-α)% CI for the Poisson mean (alternately, the true mean crash count), μ, is as formulated in Equation (). The reader is referred to () for the steps to calculate the variance of the linear predictor. Derivation of Confidence Intervals for Poisson Parameter (m) Here, the derivation of an approximate (-α)% confidence interval for the Poisson parameter, alternately referred to as the safety, m is presented based on the procedure outlined in Wood (). Before beginning the derivation, it is important to note a useful result that is used in several subsequent calculations, that being that the distribution of the estimator for the Poisson mean (μ ) although technically lognormal, can be approximated as normal (). Hence, μ ~N(μ 0 = μ, σ 0 = μ Var(η )). Following Wood (), the basic formulation for an approximate (-α)% CI for the mean of, alternately the safety, m is presented in Equation (). μ ± Z α/ Var(m) () Mathematically, it is possible for the lower bound of the CI in Equation () to be negative, though physically speaking, negative values of m are not sensible. Hence, the CI for m in equation () is reformulated as: [max {0, μ Z α Var(m)}, μ + Z α Var(m)] () The variance of m is formulated as follows: Var(m) = Var(μν)
9 Ash et al. = E(μ ν ) E(μν) = E(μ )E(υ ) E(μ) E(υ) [by independence of μ and υ] = [Var(μ) + E(μ) ] [Var(υ) + E(υ) ] E(μ) E(υ) () The derived expressions for the variance of m for each of the mixture distributions considered in this study are presented in Table. TABLE Variance of m for Mixture Distributions Mixture Distribution Var(m) Gamma α (σ 0 + μ 0 ) + σ 0 [Note: α = φ ] Inverse Gaussian Generalized Inverse Gaussian Lognormal (σ λ 0 + μ 0 ) + σ 0 [σ 0 + μ 0 ] ( σ(ν GIG+) e σ LN [σ 0 + μ 0 ] μ 0 Weibull [σ 0 + μ 0 ] [ Γ( σ +) (Γ( σ +)) ] μ 0 c + c ) μ 0 With formulations for Var(M) in hand, and recalling that the distribution of μ can be approximated as normal (which leads to the key substitution σ 0 = μ Var(η )), the derived CIs for m for each type of mixed-poisson model considered in this study are presented in Table. For simplicity, % CIs are shown. Model Negative Binomial (NB) Poisson- Inverse- Gaussian (PIG) Sichel (SI) Poisson- Lognormal (PLN) Poisson- Weibull (PW) TABLE % Confidence Intervals for m % CI for m [max (0, μ. μ [α (Var(η ) + ) + Var(η )]), μ +. μ [α (Var(η ) + ) + Var(η )]] [max (0, μ. μ [ (Var(η ) λ + ) + Var(η )]), μ +. μ [ (Var(η ) + ) + Var(η )]] λ [max (0, μ. μ {[Var(η ) + ] ( σ GIG (ν +) c + c ) }), μ +. μ {[Var(η ) + ] ( [max (0, μ. μ [e σ LN (Var(η ) + ) ]), μ +. μ [e σ LN (Var(η ) + ) ] ] σ GIG (ν +) c + c) }] Γ ( max 0, μ. μ [Var(η ) + ] σ + ) Γ ( (Γ (, μ +. μ [Var(η ) + ] σ + ) [ ( ( [ σ + )) (Γ ( ] )) ( [ σ + )) ] )] Derivation of Prediction Intervals for Predicted Crash Count (Y) The final type of confidence interval of interest for a mixed-poisson model is that for the predicted crash count (y) at a new site. The formulation for the PI is developed based on Chebyshev s inequality and further assumes the following: () The lower bound for y is zero (make the PI more
10 Ash et al. 0 conservative and follows the convention of Wood ()); () y must be integer-valued (). In general, a (-α)% PI for y is shown in Equation (), where the floor of the upper bound is taken to ensure it is integer-valued. As an example, to obtain a % PI for y, the expression under the first radical would evaluate to. [0, μ + α Var(y) ] () The variance of Y is evaluated as follows: Var(Y) = E{Var(Y M)} + Var{E(Y M)} = E(M) + Var(M) = E(μν) + Var(M) = E(μ) E(υ) + Var(M) = μ 0 + Var(M) () Hence, the (-α)% PI for Y can be re-expressed as shown in Equation (). [0, μ + α μ + Var(m) ] () Thus, using the formulation for Var(M) as provided in Equation (), the % PIs for Y in the case of each of the five mixed-poisson models are developed and shown in Table. Model Negative Binomial (NB) Poisson- Inverse- Gaussian (PIG) Sichel (SI) TABLE % Prediction Intervals for y % PI for y [0, μ + μ + μ [α (Var(η ) + ) + Var(η )] ] [0, μ + μ + μ [ (Var(η ) + ) + Var(η )] ] λ [0, μ + μ + μ σ GIG (ν +) {[Var(η ) + ] ( + c c ) } ] Poisson- Lognormal (PLN) [0, μ + μ + μ [e σ LN (Var(η ) + ) ] ] Poisson- Weibull (PW) Γ( σ +) 0, μ + μ + μ ([Var(η ) + ] [ ] ) (Γ( [ [ σ +)) ]] Full derivations for any of the aforementioned expressions for Var(M), the CIs for μ and m, or the PIs for y are available from the authors upon request.
11 Ash et al CASE STUDY This section provides a case study in which mixed-poisson models are developed from a crash dataset collected in Texas. Following model development, confidence and prediction intervals for the aforementioned crash rates are estimated and displayed graphically. Although the dataset provides access to several explanatory, models considered in this study were of the flow-only variety (i.e., traffic volume was the only covariate in each model) due to constraints in terms of visualizing results discussed momentarily. Flow-only models are not without their drawbacks, perhaps most notably omitted variable bias (). That being said, these types of models permit simple graphical representations of their associated CIs and PIs. The confidence and prediction intervals developed in the previous section are general in the sense that they can be estimated for models with any number of covariates and do take into account an arbitrary number of covariates as specified in the linear predictor η; however, they cannot be plotted unless the number of independent variables is less than or equal to two or if the values of several covariates are fixed (conditioned on) such that only two or less predictors are allowed to vary. Data Description The dataset used in this study was collected as part of the efforts for the NCHRP - ( Methodology to Predict the Safety Performance of Rural Multilane Highways ) project (). Overall,, segments with an average length of 0. miles were considered. In total,, crashes were reported during the five-year study period, though these crashes only occurred on of the segments; thus, segments (%) did not experience any crashes. The mean value of crashes was. and the variance was., yielding a variance-to-mean ratio of.. Average daily traffic values over the five-year study period ranged from to,00, with a mean of,. and a standard deviation of,0.0. More detailed information and summary statistics of the dataset are available in Zou et al. (). Model Development A total of five mixed-poisson models were estimated from the Texas dataset. Each model had one covariate (ADT) and an offset term as previously described. The negative binomial, Poisson- Inverse-Gaussian, and Sichel models were estimated through a maximum likelihood (ML) approach in the GAMLSS package for the R statistical software (). The Poisson-lognormal and Poisson-Weibull likelihood functions do not have a closed form, hence these models had to be estimated through a Bayesian approach in the WinBUGS software package () (using 0,000 iterations following 000 iterations for burn in). Model parameters and goodness-of-fit statistics are presented in Table (a) and (b).
12 Ash et al. TABLE (a) Model Results Estimated with ML Approach Model NB PIG SI Value SE Value SE Value SE Intercept log(β0) log(adt) β 0.*** *** *** 0.0 Dispersion parameter α (=/φ) 0.*** Shape parameter λ - -.*** - - Scale parameter σ Scale parameter νgig *** Global Deviance... AIC.. 0. BIC Note: *** denotes variable significant at 0.00 level TABLE (b) Model Results Estimated with Bayesian Approach Poisson-Lognormal (PLN) Model Mean SD MC Error.0% Median.0% Intercept log(β0) log(adt) β E Scale Parameter σln Poisson-Weibull (PW) Model Mean SD MC Error.0% Median.0% Intercept log(β0) log(adt) β E Scale Parameter σ..e Confidence and Prediction Intervals In order to compare and contrast the prediction intervals for y and the confidence intervals for m and μ associated with each of the five types of mixed-poisson model considered, plots were made constructed to show the intervals for each model (Figure (a)-(e)). In each plot, ADT values were considered to range from 0 to,000 (as this was approximately the range in the Texas dataset) and segment length was fixed at one mile for the offset term. The notations LB and UB refer to the lower and upper bounds, respectively, for the interval of interest. Note that since the likelihood functions for Poisson-lognormal and Poisson-Weibull models do not have a closed form, the estimation method developed by Connors et al. () was adopted to calculate confidence and prediction intervals using WinBUGS. We first consider the % CI for the Poisson mean (μ). From Table (a) and (b) it can be seen that regardless of the type of model considered, the estimates for the model coefficients (log(β0) and β) are quite similar. Hence, the estimates for the Poisson mean were quite similar between models, with maximum values ranging between. for the Poisson-Weibull model to
13 Ash et al. 0 0 as high as. for the Sichel model, when ADT=,000. From Figure (a) through (e), it can be seen that both the lower and upper bounds of the % CI for the Poisson mean are nearly identical, and further nearly identical to the estimate of the mean for ADT values of approximately,000 or less. As ADT increases beyond,000, the distance between the lower and upper bounds of each CI begins to increase. Ultimately, at ADT=,000, the tightest interval around the estimate of μ resulted from the Poisson-Weibull model ([.,.], width=.) and the widest interval resulted from the Sichel model ([.,0.], width=.). As the true value of the Poisson mean is not known, conclusions on whether or not the narrowest interval is best cannot be made. When examining the % CIs for the safety, m, in Figure (a) through (e), one may first notice that for the NB, PIG, and SI models, the lower bound always has a value of zero, regardless of the ADT value. Such models produced negative values for the lower bounds when calculated, however, as noted in Equation (), negative values of the safety (m) are not sensible and hence the lowest reasonable value is zero. The lower bounds for the PLN and PW (the models estimated through a Bayesian approach) models were found to be non-negative in all cases. For the PLN model, the width of the interval at an ADT value of,000 was., and said interval ranged from. to., the maximum value of m estimated from any of the % CIs. The lowest value of an upper bound for the % CIs for m at,000 ADT, across all models, was. as obtained from the Poisson-Inverse-Gaussian model. As was the case with the Poisson mean, the true value of m is unknown, thus no comments on which interval is narrowest, while still capturing the true parameter value can be made. Regardless of model considered, the lower bound for the % prediction intervals for the predicted response at a new site (y) is always zero. Additionally, one will likely notice that the upper bounds for the PIs for y are much greater (specifically,. to. times greater at,000 ADT) than the respective upper bounds for the % CIs for m. Besides yielding the largest values, the curves for the PIs for y are notably less smooth than those representing the CIs for μ and m. This step-function appearance is a result of the use of the floor function in calculation for the upper bound of the PI as is shown in Equation (), as the number of crashes predicted to occur at a new site should be integer-valued. The upper bounds for the PIs for y ranged from as high as 0 for the Poisson-Lognormal model to as low as for the Poisson-Inverse-Gaussian model. The upper bound for PI for y as predicted from the Sichel model,, was found to be quite close to that from the PLN model.
14 Number of Crashes Number of Crashes Ash et al. 0 % CIs and PI for NB Model mu mu LB mu UB m LB m UB y LB y UB ADT FIGURE (a) % CIs and PI for Negative Binomial Model % CIs and PI for PIG Model mu mu LB mu UB m LB m UB y LB y UB ADT FIGURE (b) % CIs and PI for Poisson-Inverse-Gaussian Model
15 Number of Crashes Number of Crashes Ash et al. 0 % CIs and PI for SI Model mu mu LB mu UB m LB m UB y LB y UB ADT FIGURE (c) % CIs and PI for Sichel Model % Confidence Intervals for PLN Model mu mu LB mu UB m LB m UB y LB y UB ADT FIGURE (d) % CIs and PI for Poisson-Lognormal Model
16 Number of Crashes Ash et al. 0 % Confidence Intervals for PW Model mu mu LB mu UB m LB m UB y LB y UB ADT FIGURE (e) % CIs and PI for Poisson-Weibull Model SUMMARY AND CONCLUSIONS Based upon the initial work of Wood (), confidence intervals for the Poisson mean (μ), safety or Poisson parameter (m), and predicted response (i.e., number of crashes at a new site, y) for four types of mixed-poisson models beyond that for the negative binomial (Poisson-Gamma) model as given by Wood () were developed. Formulae for these intervals are now available for researchers and practitioners to use in order to obtain a window of uncertainty associated with predictions, as compared to a sole point estimate. Specifically, the types of mixed-poisson models considered in this study were the Poisson-Inverse Gaussian, Sichel, Poisson-Lognormal, and Poisson-Weibull models, all of which arise by allowing for a multiplicative error term following a corresponding mixture distribution to enter the functional form of the Poisson parameter m. After motivating mixed-poisson models, derivations for the aforementioned confidence and prediction intervals were provided. Once the formulae for the CIs and PIs had been established, the theory was put into practice by investigating the intervals associated with each of the five aforementioned types of mixed- Poisson models. Flow-only models were estimated for each model type in order to obtain the necessary parameters needed for calculation of the intervals. Since real-life, observed crash data was used, the true values of the Poisson mean (μ) and safety or Poisson parameter (m) are unknown, hence comments cannot be made on which intervals perform best. Nonetheless, several important conclusions can be drawn from the case study considering the Texas data: () For small ADT values, the lower and upper bounds of the % CI for the Poisson mean (μ) were quite similar in value and also close to the estimator of the Poisson mean as predicted from the models;
17 Ash et al () Of the models developed in this study, the Sichel model yielded the widest intervals for the Poisson mean (μ) and safety (m). It further yielded the second greatest upper bound when considering all % PIs for the predicted response y. That being said, there is no way to confirm narrower intervals on the μ, m, and y are necessarily better as the true values of these parameters is unknown; () For the Poisson mean (μ), the Poisson-Weibull model yielded the narrowest % CI. The Poisson-Inverse-Gaussian model yielded the narrowest % CI for m and PI for y if all models considered; () All three models estimated via a maximum likelihood approach (NB, PIG, and SI) yielded negative values for the lower bound on m (before coercing them to be zero), a behavior that was not observed for the models estimated via a Bayesian approach (PLN and PW); and () At the largest ADT values considered, the upper bounds on the PIs for y ranged from. to. times the values of the upper bounds of m at the same ADT. In terms of future work, this study introduces a few possibilities currently under investigation by the authors. First, it would be beneficial to compare the behavior of the CIs and PIs for functional forms of the mixed-poisson models that include more covariates than simply the traffic flow. This will be especially important to help overcome the likely omitted variable bias associated with the models developed in this study. Additionally, a simulation study involving simulation of values for the Poisson mean (μ), safety (m), and response (i.e., crash count, y) at a new site could help determine which CIs and PIs best represent the true intervals.
18 Ash et al REFERENCES [] Mannering, F. L., and C. R. Bhat. Analytic methods in accident research: Methodological frontier and future directions. Analytic Methods in Accident Research, Vol., 0, pp. -. [] Lord, D., and F. Mannering. The statistical analysis of crash-frequency data: A review and assessment of methodological alternatives. Transportation Research Part A: Policy and Practice, Vol., No., 0, pp. -0. [] Gustavsson, J. On the use of regression models in the study of road accidents. Accident Analysis & Prevention, Vol., No.,, pp. -. [] Gustavsson, J., and Å. Svensson. A Poisson regression model applied to classes of road accidents with small frequencies. Scandinavian Journal of Statistics,, pp. -0. [] Jovanis, P. P., and H.-L. Chang. Modeling the relationship of accidents to miles traveled. Transportation research record, Vol.,, pp. -. [] Lord, D., S. P. Washington, and J. N. Ivan. Poisson, Poisson-gamma and zero-inflated regression models of motor vehicle crashes: balancing statistical fit and theory. Accident Analysis & Prevention, Vol., No., 00, pp. -. [] Maycock, G., and R. Hall. Accidents at -arm roundabouts.. [] Hauer, E., J. C. Ng, and J. Lovell. Estimation of safety at signalized intersections (with discussion and closure).. [] Connors, R. D., M. Maher, A. Wood, L. Mountain, and K. Ropkins. Methodology for fitting and updating predictive accident models with trend. Accident Analysis & Prevention, Vol., 0, pp. -. [] Park, E. S., P. J. Carlson, R. J. Porter, and C. K. Andersen. Safety effects of wider edge lines on rural, two-lane highways. Accident Analysis & Prevention, Vol., 0, pp. -. [] El-Basyouny, K., and T. Sayed. Comparison of two negative binomial regression techniques in developing accident prediction models. Transportation Research Record: Journal of the Transportation Research Board, Vol. 0, No., 00, pp. -. [] Ye, X., R. M. Pendyala, V. Shankar, and K. C. Konduri. A simultaneous equations model of crash frequency by severity level for freeway sections. Accident Analysis & Prevention, Vol., 0, pp. 0-. [] Hauer, E. Empirical Bayes approach to the estimation of unsafety : the multivariate regression method. Accident Analysis & Prevention, Vol., No.,, pp. -. [] Lawless, J. F. Negative binomial and mixed Poisson regression. Canadian Journal of Statistics, Vol., No.,, pp. 0-. [] Hauer, E. Observational Before/After Studies in Road Safety. Estimating the Effect of Highway and Traffic Engineering Measures on Road Safety.. [] Cameron, A. C., and P. K. Trivedi. Regression analysis of count data. Cambridge university press, 0. [] Rigby, R., D. Stasinopoulos, and C. Akantziliotou. A framework for modelling overdispersed count data, including the Poisson-shifted generalized inverse Gaussian distribution. Computational Statistics & Data Analysis, Vol., No., 00, pp. -. [] Zou, Y., L. Wu, and D. Lord. Modeling over-dispersed crash data with a long tail: Examining the accuracy of the dispersion parameter in Negative Binomial models. Analytic Methods in Accident Research, Vol., No. 0, 0, pp. -. [] Dean, C., J. Lawless, and G. Willmot. A mixed Poisson-inverse-Gaussian regression model. The Canadian Journal of Statistics/La Revue Canadienne de Statistique,, pp. -.
19 Ash et al. 0 0 [0] Zha, L., D. Lord, and Y. Zou. The Poisson Inverse Gaussian (PIG) Generalized Linear Regression Model for Analyzing Motor Vehicle Crash Data. Journal of Transportation Safety & Security, No. just-accepted, 0, pp [] Cheng, L., S. R. Geedipally, and D. Lord. The Poisson Weibull generalized linear model for analyzing motor vehicle crash data. Safety Science, Vol., 0, pp. -. [] Lord, D., and L. F. Miranda-Moreno. Effects of low sample mean values and small sample size on the estimation of the fixed dispersion parameter of Poisson-gamma models for modeling motor vehicle crashes: a Bayesian perspective. Safety Science, Vol., No., 00, pp. -0. [] Aguero-Valverde, J., and P. P. Jovanis. Analysis of road crash frequency with spatial models. Transportation Research Record: Journal of the Transportation Research Board, Vol. 0, No., 00, pp. -. [] Casella, G., and R. L. Berger. Statistical inference. Duxbury Pacific Grove, CA, 00. [] Lord, D. Methodology for estimating the variance and confidence intervals for the estimate of the product of baseline models and AMFs. Accident Analysis & Prevention, Vol. 0, No., 00, pp. -. [] Lord, D., P.-F. Kuo, and S. Geedipally. Comparison of Application of Product of Baseline Models and Accident-Modification Factors and Models with Covariates: Predicted Mean Values and Variance. Transportation Research Record: Journal of the Transportation Research Board, No., 0, pp. -. [] Wood, G. Confidence and prediction intervals for generalised linear accident models. Accident Analysis & Prevention, Vol., No., 00, pp. -. [] Geedipally, S., and D. Lord. Effects of Varying Dispersion Parameter of Poisson-Gamma Models on Estimation of Confidence Intervals of Crash Prediction Models. Transportation Research Record: Journal of the Transportation Research Board, No. 0, 00, pp. -. [] Geedipally, S. R., and D. Lord. Investigating the effect of modeling single-vehicle and multi-vehicle crashes separately on confidence intervals of Poisson gamma models. Accident Analysis & Prevention, Vol., No., 0, pp. -. [0] Rigby, R., and D. Stasinopoulos. A flexible regression approach using GAMLSS in R. London Metropolitan University, London, 00. [] Rodríguez, G. Lectures notes about generalized linear models.in, 00. [] Lord, D., S. R. Geedipally, B. N. Persaud, S. P. Washington, I. van Schalkwyk, J. N. Ivan, C. Lyon, and T. Jonsson. Methodology to predict the safety performance of rural multilane highways.in, 00. [] Rigby, R. Stasinopoulos. Generalized additive models for location, scale and shape. Appl Statist, Vol., No. part, 00, pp. -. [] Lunn, D. J., A. Thomas, N. Best, and D. Spiegelhalter. WinBUGS-a Bayesian modelling framework: concepts, structure, and extensibility. Statistics and Computing, Vol., No., 000, pp. -.
Does the Dispersion Parameter of Negative Binomial Models Truly. Estimate the Level of Dispersion in Over-dispersed Crash data with a. Long Tail?
Does the Dispersion Parameter of Negative Binomial Models Truly Estimate the Level of Dispersion in Over-dispersed Crash data wh a Long Tail? Yajie Zou, Ph.D. Research associate Smart Transportation Applications
More informationAnalyzing Highly Dispersed Crash Data Using the Sichel Generalized Additive Models for Location, Scale and Shape
Analyzing Highly Dispersed Crash Data Using the Sichel Generalized Additive Models for Location, Scale and Shape By Yajie Zou Ph.D. Candidate Zachry Department of Civil Engineering Texas A&M University,
More informationTRB Paper # Examining the Crash Variances Estimated by the Poisson-Gamma and Conway-Maxwell-Poisson Models
TRB Paper #11-2877 Examining the Crash Variances Estimated by the Poisson-Gamma and Conway-Maxwell-Poisson Models Srinivas Reddy Geedipally 1 Engineering Research Associate Texas Transportation Instute
More informationThe Negative Binomial Lindley Distribution as a Tool for Analyzing Crash Data Characterized by a Large Amount of Zeros
The Negative Binomial Lindley Distribution as a Tool for Analyzing Crash Data Characterized by a Large Amount of Zeros Dominique Lord 1 Associate Professor Zachry Department of Civil Engineering Texas
More informationEffects of the Varying Dispersion Parameter of Poisson-gamma models on the estimation of Confidence Intervals of Crash Prediction models
Effects of the Varying Dispersion Parameter of Poisson-gamma models on the estimation of Confidence Intervals of Crash Prediction models By Srinivas Reddy Geedipally Research Assistant Zachry Department
More informationExploring the Application of the Negative Binomial-Generalized Exponential Model for Analyzing Traffic Crash Data with Excess Zeros
Exploring the Application of the Negative Binomial-Generalized Exponential Model for Analyzing Traffic Crash Data with Excess Zeros Prathyusha Vangala Graduate Student Zachry Department of Civil Engineering
More informationTRB Paper Examining Methods for Estimating Crash Counts According to Their Collision Type
TRB Paper 10-2572 Examining Methods for Estimating Crash Counts According to Their Collision Type Srinivas Reddy Geedipally 1 Engineering Research Associate Texas Transportation Institute Texas A&M University
More informationThe Conway Maxwell Poisson Model for Analyzing Crash Data
The Conway Maxwell Poisson Model for Analyzing Crash Data (Discussion paper associated with The COM Poisson Model for Count Data: A Survey of Methods and Applications by Sellers, K., Borle, S., and Shmueli,
More informationBayesian Poisson Hierarchical Models for Crash Data Analysis: Investigating the Impact of Model Choice on Site-Specific Predictions
Khazraee, Johnson and Lord Page 1 of 47 Bayesian Poisson Hierarchical Models for Crash Data Analysis: Investigating the Impact of Model Choice on Site-Specific Predictions S. Hadi Khazraee, Ph.D.* Safety
More informationLEVERAGING HIGH-RESOLUTION TRAFFIC DATA TO UNDERSTAND THE IMPACTS OF CONGESTION ON SAFETY
LEVERAGING HIGH-RESOLUTION TRAFFIC DATA TO UNDERSTAND THE IMPACTS OF CONGESTION ON SAFETY Tingting Huang 1, Shuo Wang 2, Anuj Sharma 3 1,2,3 Department of Civil, Construction and Environmental Engineering,
More informationFULL BAYESIAN POISSON-HIERARCHICAL MODELS FOR CRASH DATA ANALYSIS: INVESTIGATING THE IMPACT OF MODEL CHOICE ON SITE-SPECIFIC PREDICTIONS
FULL BAYESIAN POISSON-HIERARCHICAL MODELS FOR CRASH DATA ANALYSIS: INVESTIGATING THE IMPACT OF MODEL CHOICE ON SITE-SPECIFIC PREDICTIONS A Dissertation by SEYED HADI KHAZRAEE KHOSHROOZI Submitted to the
More informationApplication of the hyper-poisson generalized linear model for analyzing motor vehicle crashes
Application of the hyper-poisson generalized linear model for analyzing motor vehicle crashes S. Hadi Khazraee 1 Graduate Research Assistant Zachry Department of Civil Engineering Texas A&M University
More informationTRB Paper Hot Spot Identification by Modeling Single-Vehicle and Multi-Vehicle Crashes Separately
TRB Paper 10-2563 Hot Spot Identification by Modeling Single-Vehicle and Multi-Vehicle Crashes Separately Srinivas Reddy Geedipally 1 Engineering Research Associate Texas Transportation Institute Texas
More informationThe Negative Binomial-Lindley Generalized Linear Model: Characteristics and Application using Crash Data
The Negative Binomial-Lindley Generalized Linear Model: Characteristics and Application using Crash Data Srinivas Reddy Geedipally 1 Engineering Research Associate Texas Transportation Institute Texas
More informationConfidence and prediction intervals for. generalised linear accident models
Confidence and prediction intervals for generalised linear accident models G.R. Wood September 8, 2004 Department of Statistics, Macquarie University, NSW 2109, Australia E-mail address: gwood@efs.mq.edu.au
More informationIncluding Statistical Power for Determining. How Many Crashes Are Needed in Highway Safety Studies
Including Statistical Power for Determining How Many Crashes Are Needed in Highway Safety Studies Dominique Lord Assistant Professor Texas A&M University, 336 TAMU College Station, TX 77843-336 Phone:
More informationCrash Data Modeling with a Generalized Estimator
Crash Data Modeling with a Generalized Estimator Zhirui Ye* Professor, Ph.D. Jiangsu Key Laboratory of Urban ITS Jiangsu Province Collaborative Innovation Center of Modern Urban Traffic Technologies Southeast
More informationInvestigating the effects of the fixed and varying dispersion parameters of Poisson-gamma models on empirical Bayes estimates
Investigating the effects of the fixed and varying dispersion parameters of Poisson-gamma models on empirical Bayes estimates Dominique Lord, Ph.D., P.Eng.* Assistant Professor Department of Civil Engineering
More informationAccident Analysis and Prevention xxx (2006) xxx xxx. Dominique Lord
Accident Analysis and Prevention xxx (2006) xxx xxx Modeling motor vehicle crashes using Poisson-gamma models: Examining the effects of low sample mean values and small sample size on the estimation of
More informationPoisson Inverse Gaussian (PIG) Model for Infectious Disease Count Data
American Journal of Theoretical and Applied Statistics 2016; 5(5): 326-333 http://www.sciencepublishinggroup.com/j/ajtas doi: 10.11648/j.ajtas.20160505.22 ISSN: 2326-8999 (Print); ISSN: 2326-9006 (Online)
More informationInvestigating the Effect of Modeling Single-Vehicle and Multi-Vehicle Crashes Separately on Confidence Intervals of Poisson-gamma Models
Investigating the Effect of Modeling Single-Vehicle and Multi-Vehicle Crashes Separately on Confidence Intervals of Poisson-gamma Models Srinivas Reddy Geedipally 1 Engineering Research Associate Texas
More informationLOGISTIC REGRESSION Joseph M. Hilbe
LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of
More informationMODELING COUNT DATA Joseph M. Hilbe
MODELING COUNT DATA Joseph M. Hilbe Arizona State University Count models are a subset of discrete response regression models. Count data are distributed as non-negative integers, are intrinsically heteroskedastic,
More informationVarieties of Count Data
CHAPTER 1 Varieties of Count Data SOME POINTS OF DISCUSSION What are counts? What are count data? What is a linear statistical model? What is the relationship between a probability distribution function
More informationSTATISTICAL MODEL OF ROAD TRAFFIC CRASHES DATA IN ANAMBRA STATE, NIGERIA: A POISSON REGRESSION APPROACH
STATISTICAL MODEL OF ROAD TRAFFIC CRASHES DATA IN ANAMBRA STATE, NIGERIA: A POISSON REGRESSION APPROACH Dr. Nwankwo Chike H and Nwaigwe Godwin I * Department of Statistics, Nnamdi Azikiwe University, Awka,
More informationHigh-Throughput Sequencing Course
High-Throughput Sequencing Course DESeq Model for RNA-Seq Biostatistics and Bioinformatics Summer 2017 Outline Review: Standard linear regression model (e.g., to model gene expression as function of an
More informationA Full Bayes Approach to Road Safety: Hierarchical Poisson. Mixture Models, Variance Function Characterization, and. Prior Specification
A Full Bayes Approach to Road Safety: Hierarchical Poisson Mixture Models, Variance Function Characterization, and Prior Specification Mohammad Heydari A Thesis in The Department of Building, Civil and
More informationParametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1
Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson
More informationGeneral Regression Model
Scott S. Emerson, M.D., Ph.D. Department of Biostatistics, University of Washington, Seattle, WA 98195, USA January 5, 2015 Abstract Regression analysis can be viewed as an extension of two sample statistical
More informationConfirmatory and Exploratory Data Analyses Using PROC GENMOD: Factors Associated with Red Light Running Crashes
Confirmatory and Exploratory Data Analyses Using PROC GENMOD: Factors Associated with Red Light Running Crashes Li wan Chen, LENDIS Corporation, McLean, VA Forrest Council, Highway Safety Research Center,
More informationLinear, Generalized Linear, and Mixed-Effects Models in R. Linear and Generalized Linear Models in R Topics
Linear, Generalized Linear, and Mixed-Effects Models in R John Fox McMaster University ICPSR 2018 John Fox (McMaster University) Statistical Models in R ICPSR 2018 1 / 19 Linear and Generalized Linear
More informationSubject CS1 Actuarial Statistics 1 Core Principles
Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and
More informationLINEAR REGRESSION CRASH PREDICTION MODELS: ISSUES AND PROPOSED SOLUTIONS
LINEAR REGRESSION CRASH PREDICTION MODELS: ISSUES AND PROPOSED SOLUTIONS FINAL REPORT PennDOT/MAUTC Agreement Contract No. VT-8- DTRS99-G- Prepared for Virginia Transportation Research Council By H. Rakha,
More informationMixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data
Mixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data Cinzia Viroli 1 joint with E. Bonafede 1, S. Robin 2 & F. Picard 3 1 Department of Statistical Sciences, University
More informationStatistical Model Of Road Traffic Crashes Data In Anambra State, Nigeria: A Poisson Regression Approach
Statistical Model Of Road Traffic Crashes Data In Anambra State, Nigeria: A Poisson Regression Approach Nwankwo Chike H., Nwaigwe Godwin I Abstract: Road traffic crashes are count (discrete) in nature.
More informationParameters Estimation Methods for the Negative Binomial-Crack Distribution and Its Application
Original Parameters Estimation Methods for the Negative Binomial-Crack Distribution and Its Application Pornpop Saengthong 1*, Winai Bodhisuwan 2 Received: 29 March 2013 Accepted: 15 May 2013 Abstract
More informationAccident Prediction Models for Freeways
TRANSPORTATION RESEARCH RECORD 1401 55 Accident Prediction Models for Freeways BHAGWANT PERSAUD AND LESZEK DZBIK The modeling of freeway accidents continues to be of interest because of the frequency and
More informationEXAMINING THE USE OF REGRESSION MODELS FOR DEVELOPING CRASH MODIFICATION FACTORS. A Dissertation LINGTAO WU
EXAMINING THE USE OF REGRESSION MODELS FOR DEVELOPING CRASH MODIFICATION FACTORS A Dissertation by LINGTAO WU Submitted to the Office of Graduate and Professional Studies of Texas A&M University in partial
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationHow to Incorporate Accident Severity and Vehicle Occupancy into the Hot Spot Identification Process?
How to Incorporate Accident Severity and Vehicle Occupancy into the Hot Spot Identification Process? Luis F. Miranda-Moreno, Liping Fu, Satish Ukkusuri, and Dominique Lord This paper introduces a Bayesian
More informationUnobserved Heterogeneity and the Statistical Analysis of Highway Accident Data. Fred Mannering University of South Florida
Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data Fred Mannering University of South Florida Highway Accidents Cost the lives of 1.25 million people per year Leading cause
More informationGeneralized Linear Models Introduction
Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,
More informationSemiparametric Generalized Linear Models
Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student
More informationPart 8: GLMs and Hierarchical LMs and GLMs
Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course
More informationCharacterizing the Performance of the Conway-Maxwell Poisson Generalized Linear Model
Characterizing the Performance of the Conway-Maxwell Poisson Generalized Linear Model Royce A. Francis 1,2, Srinivas Reddy Geedipally 3, Seth D. Guikema 2, Soma Sekhar Dhavala 5, Dominique Lord 4, Sarah
More informationExponential Families
Exponential Families David M. Blei 1 Introduction We discuss the exponential family, a very flexible family of distributions. Most distributions that you have heard of are in the exponential family. Bernoulli,
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationStatistical Methods in HYDROLOGY CHARLES T. HAAN. The Iowa State University Press / Ames
Statistical Methods in HYDROLOGY CHARLES T. HAAN The Iowa State University Press / Ames Univariate BASIC Table of Contents PREFACE xiii ACKNOWLEDGEMENTS xv 1 INTRODUCTION 1 2 PROBABILITY AND PROBABILITY
More informationTABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1
TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1 1.1 The Probability Model...1 1.2 Finite Discrete Models with Equally Likely Outcomes...5 1.2.1 Tree Diagrams...6 1.2.2 The Multiplication Principle...8
More informationMultivariate negative binomial models for insurance claim counts
Multivariate negative binomial models for insurance claim counts Peng Shi (Northern Illinois University) and Emiliano A. Valdez (University of Connecticut) 9 November 0, Montréal, Quebec Université de
More informationBayesian Inference for Regression Parameters
Bayesian Inference for Regression Parameters 1 Bayesian inference for simple linear regression parameters follows the usual pattern for all Bayesian analyses: 1. Form a prior distribution over all unknown
More informationThe LmB Conferences on Multivariate Count Analysis
The LmB Conferences on Multivariate Count Analysis Title: On Poisson-exponential-Tweedie regression models for ultra-overdispersed count data Rahma ABID, C.C. Kokonendji & A. Masmoudi Email Address: rahma.abid.ch@gmail.com
More informationMultivariate Survival Analysis
Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in
More informationSpatial discrete hazards using Hierarchical Bayesian Modeling
Spatial discrete hazards using Hierarchical Bayesian Modeling Mathias Graf ETH Zurich, Institute for Structural Engineering, Group Risk & Safety 1 Papers -Maes, M.A., Dann M., Sarkar S., and Midtgaard,
More informationIntroduction to Probability and Statistics (Continued)
Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:
More informationSafety Effectiveness of Variable Speed Limit System in Adverse Weather Conditions on Challenging Roadway Geometry
Safety Effectiveness of Variable Speed Limit System in Adverse Weather Conditions on Challenging Roadway Geometry Promothes Saha, Mohamed M. Ahmed, and Rhonda Kae Young This paper examined the interaction
More informationRate-Quality Control Method of Identifying Hazardous Road Locations
44 TRANSPORTATION RESEARCH RECORD 1542 Rate-Quality Control Method of Identifying Hazardous Road Locations ROBERT W. STOKES AND MADANIYO I. MUTABAZI A brief historical perspective on the development of
More informationApplication of Poisson and Negative Binomial Regression Models in Modelling Oil Spill Data in the Niger Delta
International Journal of Science and Engineering Investigations vol. 7, issue 77, June 2018 ISSN: 2251-8843 Application of Poisson and Negative Binomial Regression Models in Modelling Oil Spill Data in
More informationHigh Friction Surface Treatment on Bridge Safety
High Friction Surface Treatment on Bridge Safety Brian Porter/Rebecca Szymkowski- WisDOT Andrea Bill- UW-Madison TOPS Lab Objectives Weather in WI can be harsh Bridges can be problematic in inclement weather
More informationGauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA
JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter
More informationMohammed. Research in Pharmacoepidemiology National School of Pharmacy, University of Otago
Mohammed Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago What is zero inflation? Suppose you want to study hippos and the effect of habitat variables on their
More informationMacro-level Pedestrian and Bicycle Crash Analysis: Incorporating Spatial Spillover Effects in Dual State Count Models
Macro-level Pedestrian and Bicycle Crash Analysis: Incorporating Spatial Spillover Effects in Dual State Count Models Qing Cai Jaeyoung Lee* Naveen Eluru Mohamed Abdel-Aty Department of Civil, Environment
More informationLocal Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina
Local Likelihood Bayesian Cluster Modeling for small area health data Andrew Lawson Arnold School of Public Health University of South Carolina Local Likelihood Bayesian Cluster Modelling for Small Area
More informationMixed models in R using the lme4 package Part 5: Generalized linear mixed models
Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates Madison January 11, 2011 Contents 1 Definition 1 2 Links 2 3 Example 7 4 Model building 9 5 Conclusions 14
More informationA New Generalized Gumbel Copula for Multivariate Distributions
A New Generalized Gumbel Copula for Multivariate Distributions Chandra R. Bhat* The University of Texas at Austin Department of Civil, Architectural & Environmental Engineering University Station, C76,
More informationGeneralized linear models
Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models
More informationGeneralized Linear Models (GLZ)
Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the
More informationBayesian Linear Regression
Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective
More informationEVALUATION OF SAFETY PERFORMANCES ON FREEWAY DIVERGE AREA AND FREEWAY EXIT RAMPS. Transportation Seminar February 16 th, 2009
EVALUATION OF SAFETY PERFORMANCES ON FREEWAY DIVERGE AREA AND FREEWAY EXIT RAMPS Transportation Seminar February 16 th, 2009 By: Hongyun Chen Graduate Research Assistant 1 Outline Introduction Problem
More informationLISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014
LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers
More informationGeneralized Linear Models for Count, Skewed, and If and How Much Outcomes
Generalized Linear Models for Count, Skewed, and If and How Much Outcomes Today s Class: Review of 3 parts of a generalized model Models for discrete count or continuous skewed outcomes Models for two-part
More informationLecture-19: Modeling Count Data II
Lecture-19: Modeling Count Data II 1 In Today s Class Recap of Count data models Truncated count data models Zero-inflated models Panel count data models R-implementation 2 Count Data In many a phenomena
More informationGlobal Journal of Engineering Science and Research Management
DEVELOPMENT AND APPLICATION OF CRASH MODIFICATION FACTORS FOR TRAFFIC FLOW PARAMETERS ON URBAN FREEWAY SEGMENTS Eugene Vida Maina, Ph.D*, Janice R. Daniel, Ph.D * Operations Systems Research Analyst, Dallas
More informationFreeway rear-end collision risk for Italian freeways. An extreme value theory approach
XXII SIDT National Scientific Seminar Politecnico di Bari 14 15 SETTEMBRE 2017 Freeway rear-end collision risk for Italian freeways. An extreme value theory approach Gregorio Gecchele Federico Orsini University
More informationPrediction of Bike Rental using Model Reuse Strategy
Prediction of Bike Rental using Model Reuse Strategy Arun Bala Subramaniyan and Rong Pan School of Computing, Informatics, Decision Systems Engineering, Arizona State University, Tempe, USA. {bsarun, rong.pan}@asu.edu
More informationGeneralized Linear Models
Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.
More informationBayesian non-parametric model to longitudinally predict churn
Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics
More informationModel selection and comparison
Model selection and comparison an example with package Countr Tarak Kharrat 1 and Georgi N. Boshnakov 2 1 Salford Business School, University of Salford, UK. 2 School of Mathematics, University of Manchester,
More informationBayesian Nonparametric Regression for Diabetes Deaths
Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,
More informationGeneralized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.
Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint
More informationEstimation of Operational Risk Capital Charge under Parameter Uncertainty
Estimation of Operational Risk Capital Charge under Parameter Uncertainty Pavel V. Shevchenko Principal Research Scientist, CSIRO Mathematical and Information Sciences, Sydney, Locked Bag 17, North Ryde,
More informationLecture 8. Poisson models for counts
Lecture 8. Poisson models for counts Jesper Rydén Department of Mathematics, Uppsala University jesper.ryden@math.uu.se Statistical Risk Analysis Spring 2014 Absolute risks The failure intensity λ(t) describes
More informationSTAT 6350 Analysis of Lifetime Data. Probability Plotting
STAT 6350 Analysis of Lifetime Data Probability Plotting Purpose of Probability Plots Probability plots are an important tool for analyzing data and have been particular popular in the analysis of life
More informationGeneralized Linear Models: An Introduction
Applied Statistics With R Generalized Linear Models: An Introduction John Fox WU Wien May/June 2006 2006 by John Fox Generalized Linear Models: An Introduction 1 A synthesis due to Nelder and Wedderburn,
More informationComparison of Accident Rates Using the Likelihood Ratio Testing Technique
50 TRANSPORTATION RESEARCH RECORD 101 Comparison of Accident Rates Using the Likelihood Ratio Testing Technique ALI AL-GHAMDI Comparing transportation facilities (i.e., intersections and road sections)
More informationif n is large, Z i are weakly dependent 0-1-variables, p i = P(Z i = 1) small, and Then n approx i=1 i=1 n i=1
Count models A classical, theoretical argument for the Poisson distribution is the approximation Binom(n, p) Pois(λ) for large n and small p and λ = np. This can be extended considerably to n approx Z
More informationApproximating the Conway-Maxwell-Poisson normalizing constant
Filomat 30:4 016, 953 960 DOI 10.98/FIL1604953S Published by Faculty of Sciences and Mathematics, University of Niš, Serbia Available at: http://www.pmf.ni.ac.rs/filomat Approximating the Conway-Maxwell-Poisson
More informationAn Introduction to Path Analysis
An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving
More informationPlanning Level Regression Models for Crash Prediction on Interchange and Non-Interchange Segments of Urban Freeways
Planning Level Regression Models for Crash Prediction on Interchange and Non-Interchange Segments of Urban Freeways Arun Chatterjee, Professor Department of Civil and Environmental Engineering The University
More informationPart 6: Multivariate Normal and Linear Models
Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of
More informationMaximum Likelihood Estimation. only training data is available to design a classifier
Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationLatent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent
Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary
More informationCourse Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model
Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model EPSY 905: Multivariate Analysis Lecture 1 20 January 2016 EPSY 905: Lecture 1 -
More informationComparative Analysis of Zonal Systems for Macro-level Crash Modeling: Census Tracts, Traffic Analysis Zones, and Traffic Analysis Districts
Comparative Analysis of Zonal Systems for Macro-level Crash Modeling: Census Tracts, Traffic Analysis Zones, and Traffic Analysis Districts Qing Cai* Mohamed Abdel-Aty Jaeyoung Lee Naveen Eluru Department
More informationSTAT 4385 Topic 01: Introduction & Review
STAT 4385 Topic 01: Introduction & Review Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 Outline Welcome What is Regression Analysis? Basics
More informationMixed models in R using the lme4 package Part 5: Generalized linear mixed models
Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates 2011-03-16 Contents 1 Generalized Linear Mixed Models Generalized Linear Mixed Models When using linear mixed
More informationMixed models in R using the lme4 package Part 7: Generalized linear mixed models
Mixed models in R using the lme4 package Part 7: Generalized linear mixed models Douglas Bates University of Wisconsin - Madison and R Development Core Team University of
More information