A Flexible Modeling Approach Using Dirichlet Process Mixtures: Application to Municipality-Level Railway Grade Crossing Crash Data

Size: px
Start display at page:

Download "A Flexible Modeling Approach Using Dirichlet Process Mixtures: Application to Municipality-Level Railway Grade Crossing Crash Data"

Transcription

1 S. Heydari, L. Fu, D. Lord, And B. K. Mallick A Flexible Modeling Approach Using Dirichlet Process Mixtures: Application to Municipality-Level Railway Grade Crossing Crash Data Shahram Heydari (Corresponding Author) PhD Candidate Department of Civil and Environmental Engineering University of Waterloo 00 University Avenue W., Ontario NL G, Canada shahram.heydari@uwaterloo.ca, -- & Zachary Department of Civil Engineering Texas A&M University College Station, Texas, USA Liping Fu Professor Department of Civil and Environmental Engineering University of Waterloo 00 University Avenue W., Ontario NL G, Canada lfu@uwaterloo.ca, -- Dominique Lord Associate Professor Zachary Department of Civil Engineering Texas A&M University College Station, Texas, USA d-lord@tamu.edu, -- Bani K. Mallick Distinguished Professor Department of Statistics Texas A&M University College Station, Texas, USA bmallick@stat.tamu.edu, -- Total # of words = (Text) ( tables) = + references Submission July 0 for TRB 0

2 S. Heydari, L. Fu, D. Lord, And B. K. Mallick Abstract This paper introduces a new approach to addressing two of the most challenging issues in road safety research, namely, how to account for unobserved heterogeneity and how to identify latent subpopulations in data. Compared to the approaches of applying random effects/parameters models and finite mixtures, the proposed approach employs a Bayesian semi-parametric methodology based on Dirichlet process mixtures. Our method has four noteworthy advantages: (i) it allows examining the robustness of distributional assumptions in random effects/parameters models; (ii) it allows identifying latent clusters in data; (iii) it enables identification of outliers (extreme observations) while allowing accommodating them in analyses without compromising the quality of estimates; and (iv) it is capable of estimating the number of latent clusters in data using an elegant mathematical structure. In this paper, we evaluate the proposed method on a railway grade crossing crash dataset with hierarchical (multilevel) structure, at municipality level, from Canada for the years 00 to 0. We use cross-validation predictive densities and pseudo Bayes factor for Bayesian model selection. While confirming the need for the multilevel modeling approach, the results pointed out the inadequacy of the parametric assumption. In fact, our proposed method improved model fitting significantly for the municipality-level data. In a fully probabilistic framework, we also identified the expected number of latent clusters with similar unknown/unmeasured features among Canadian municipalities. It is possible thus to further investigate the reasons behind such similarities and dissimilarities, which could have important policy implications in terms of safety management process.

3 S. Heydari, L. Fu, D. Lord, And B. K. Mallick INTRODUCTION Addressing issues relating to unobserved heterogeneity has been receiving increasing attention in transportation safety literature recently. Employing random effects/parameters models (RPM), including random intercept and/or slope models, is a viable approach to account for such issues (-). However, parametric assumptions (e.g., normally distributed random parameters) as an inherent component of RPMs impose a strong restriction to any statistical model (, ). As a result, distributional assumptions might compromise the quality of the analysis or even produce misleading results. When using RPMs, the main goal is to allow one or more model parameters to vary across observations (in single-level model settings) (,,) or groups of observations (in multilevel model settings) (-). For example, Anastasopoulos and Mannering () used an RPM to analyze roadway segments in the state of Indiana allowing model parameters to vary across observations. The authors concluded that the employed RPM provided a superior fit; and therefore, it better represented the observed data. In another study, Yannis et al. () employed RPMs for a multilevel data in which observations were nested into various regions of Greece. The authors analyzed the data using both random intercept and random slop models. They allowed model parameters to vary between different regions (in contrast to the former study in which parameters were allowed to vary between observations). The latter paper concluded that regional variations were significant when considering the effect of enforcement on crash frequencies. Given examples above, regardless of the model setting (single-level or multilevel), distributional assumptions (such as normally distributed random parameters) might not hold or be able to properly represent an observed data or capture unobserved heterogeneity. To clarify the problem, suppose a scenario in which the modeller is only interested in potential variations in intercept (random intercept model) and the effect of other covariates are assumed to be fixed across observations. In this scenario, one basic approach is to assume that all observations have exactly the same intercept and that there is no extra variability in data. Obviously, this assumption is rather simplistic and does not take into consideration the fact that there might be some unknown and/or unmeasured attributes that change between observations. In the aforementioned scenario, for tackling unobserved heterogeneity through intercept, two major possible approaches are described as follows. The first possibility is estimating one intercept for each single observation based on the belief that each observation in data completely differs from the others; that is, the assumption of complete independence (). This assumption is not realistic since observations (e.g., intersections) are not totally dissimilar and they certainly have some similar features. The second and the most common approach is to assume that intercepts for various observations are generated from a single distribution; e.g., normal distribution. Depending on the extent to which standard distributional assumptions are capable of capturing heterogeneity in a given data, say, in the form of random parameters, the results will be biased by various degrees. It should be noted that standard assumptions usually do not accommodate skewness, kurtosis, and multimodality (0). As an example of a problem that may arise following parametric assumptions, let s consider a multilevel dataset in which observations are nested in different groups such as geographical regions (see () for a review of multilevel models in road safety literature). When there are outlier regions (extreme cases) in data, large outlier regions affect other regions excessively. Consequently, estimates relating to smaller outlier regions erroneously tend to approach the overall mean. In these circumstances, a more flexible modeling approach is necessary. The flexible model must satisfy two requirements: (i) it should be able to avoid the

4 S. Heydari, L. Fu, D. Lord, And B. K. Mallick complete independence assumption described above; and (ii) it should be able to relax the distributional assumption while adapting itself to the complexity of an observed data. It should be mentioned that while the use of RPMs has been relatively limited in singlelevel (observational level) crash data, their application in multilevel settings, which have gained popularity in recent years, has been inevitable and frequent. In fact, allowing varying parameters (e.g., varying intercepts) across higher levels or groups (e.g., counties, municipalities, regions, etc.) is a common practice in analyzing datasets characterized by hierarchical structures (,, ). This is due to the fact that observations nested in the same groups are likely to share similar unknown and/or unmeasured traits and are thus correlated (). Therefore, RPMs could account for such correlation and reduce unobserved heterogeneity in data. With this background, our paper takes advantage of an innovative class of flexible statistical models that has been developed recently in Bayesian nonparametric literature (and computer science literature) based on Dirichlet process mixtures (). The Flexible Dirichlet Process Model (FDPM) not only allows examine the robustness of parametric assumptions, but also enables identifying outliers and latent subpopulations in data. This model is highly flexible and adjusts its complexity to any observed data. That is, for example, the number of mass points or clusters (components of mixture) increases as the complexity of observed data increases. Note that when using finite mixture models (), the number of clusters in data should be anticipated (this number is usually decided without any sound justification). Then, different models should be estimated using various pre-specified numbers of clusters, and eventually the number of clusters that provides the best fit is selected as the optimal number. In contrast, FDPMs directly estimate the required number of latent clusters in a mathematically elegant framework. We evaluate our FDPM on a Canadian railway grade crossing crash dataset that contains observations nested in different geographical regions: municipalities. For a review of railway grade crossing crash analysis see (), (), and (). In this research, at the first step, the aforementioned dataset is analyzed by accounting for its hierarchical structure. To the best of our knowledge, no attempts have been made so far, especially in Canada, to accommodate the hierarchical form of grade crossing crash data. We examine the presence and the extent of dependencies between crossings nested in different regions. We explore the robustness of our standard parametric assumption in a multilevel setting using FDPMs. Lastly, we investigate the presence of outlier municipalities across Canada and identify latent clusters (subpopulations) among different Canadian regions. To our knowledge, this is the first instance of clustering groups (sites, regions, municipalities, etc.) that belong to higher levels of hierarchy in multilevel crash data literature using such flexible ad efficient statistical model. Our flexible model can be implemented in the freely available software, WinBugs (), making its use more convenient. It can also be easily adopted to cluster observations; for example, intersections or road segments in single-level model settings. This paper contributes to transportation safety literature methodologically by presenting a flexible modeling framework that has several theoretical and practical capabilities and potentials (not only in road safety analysis, but also in other areas of transportation such as travel demand research). This paper also contributes empirically to grade crossing safety literature by establishing reliable safety performance functions, providing evidence on the presence of hierarchical levels in grade crossing crash data, and identifying latent similarities and dissimilarities between different Canadian regions.

5 S. Heydari, L. Fu, D. Lord, And B. K. Mallick DATA DESCRIPTION In this research, the data used to illustrate the proposed FDPM is based on railway grade crossing crash data from Canada. This nationwide crash data set, provided by Transportation Safety Board of Canada, consists of, public crossings equipped with flashing lights and bells (FLB) for a six-year period, We obtained the crash data by combining two databases: IRIS (Integrated Railway Information System) and RODS (Railway Occurrence Database System). The IRIS database contains a set of inventory data on railway crossings across Canada. Main pieces of information that can be extracted, from IRIS, are geometric/operational characteristics, train and vehicle flows, and protection type. The RODS database, in contrast to IRIS, is mainly designed to record information related to accident occurrences such as crash occurrence date and time, number of fatalities, seriously injured, etc. Since the presence of regional dependencies in the data is suspected, we prepared the data to account for its hierarchical structure as explained bellow. Municipality-Level Data To prepare this dataset, municipalities with at least 0 FLB crossings in their boundary were considered. The final municipality-level data included, crossings sitting in municipalities, which come from major Canadian provinces: British Columbia, Alberta, Saskatchewan, Manitoba, Ontario, Quebec, New Brunswick, and Nova Scotia. A total of crashes were observed in the latter data. The municipality-level dataset includes all major Canadian cities such as Toronto, Montreal, Winnipeg, Edmonton, Vancouver, etc. Several factors (e.g., driver behavior, climate, regulations, etc.) might differ between municipalities. One scope of this research was to verify the existence of dependencies among FLB crossings nested in the same municipalities. More importantly, we aimed at examining the standard parametric assumption for the data while accounting for its multilevel form. We were also interested in identifying outlier municipalities (those that are different from the rest of data) and also municipalities that manifest similar patterns (latent subpopulations) in terms of crash frequency at FLB crossings. Among, FLB crossings in this dataset,.% were located in urban areas and whistle prohibition were applied to.% of them. A host of explanatory variables were available, but many of them were not statistically significant in describing crash frequencies. Table provides a summary statistics of the data for the most important variables. As it was discussed in the introduction, the most widespread approach assumes a common normal distribution on varying parameters between different municipalities in the multilevel modeling approach. Considering municipality-level data, for example, it may not be reasonable to assume that all municipalities are generated from the same distribution. In other words, we suspect the presence of latent subpopulations among these municipalities. The section of methodology describes our flexible modelling framework that can efficiently deal with such circumstances.

6 S. Heydari, L. Fu, D. Lord, And B. K. Mallick Table Summary Statistics of the Municipality-Level Data Variable Mean Std. Dev. Min Max Train flow (average annual daily) Vehicle flow (average annual daily) Log of exposure (product of train and vehicle flows) Number of tracks Number of lanes Track angle (deviation from 0 ) Road speed (km/h) Train speed (km/h) Whistle prohibition ( if present, 0 otherwise) Urban area ( if urban area, 0 otherwise) Crash frequency ( years) Number of FLB crossings: 0 0 METHODOLOGY This section first provides a brief methodological background on the proposed modeling approach and its origins. It then describes the main component of our approach; i.e., Dirichlet process, followed by details relating to the proposed FDPM adopted for the study in context. This section concludes by discussing model selection criteria based on cross-validation predictive densities. Methodological Background This paper illustrates a class of flexible RPMs models that are developed in Bayesian nonparametric literature (0, ) based on Dirichlet process mixtures (, ). In this regard, Escobar and West () state that Bayesian models involving Dirichlet process mixtures are at the heart of the modern nonparametric Bayesian movement. The Bayesian models used in this paper are however semi-parametric since parametric distributional assumptions are not relaxed for all model parameters in our flexible model. The original ideas of nonparametric Bayesian inference were initially developed and discussed by Ferguson () and Antoniak (); however, their application was very limited due to computational complexities. It was mainly in the 0s that Bayesian nonparametric models have attracted the attention of more researchers due to improvements in Markov Chain Monte Carlo (MCMC) schemes and also substantial computational advances during those years. At that stage, several developments have been made in various aspects of Bayesian nonparametric modeling (, ). Consequently, Bayesian nonparametric concepts have been used in different scientific articles mainly in Biostatistics and computer science research (, ), whereas their use in transportation research, especially, transportation safety has been extremely rare if not exist. One of the main motivations behind the nonparametric Bayesian inference is to remove constrains associated with specific parametric assumptions. These constraints may affect inferences made by parametric models. Therefore, employing a nonparametric Bayesian approach enables us to circumvent restrictive distributional assumptions and make parametric models more robust in terms of statistical inference. It is important to mention that the Bayesian nonparametric term does not mean that the model is parameter-free. In contrast, it may have an infinite number

7 S. Heydari, L. Fu, D. Lord, And B. K. Mallick of parameters (0). In Bayesian nonparametrics, in effect, the number of parameters increases as the complexity of data escalates. This characteristic leads to an important difference with finite mixture modeling approach () that decides the number of latent clusters in advance. In Bayesian nonparametric modeling, however, the number of latent clusters is estimated as part of the estimation algorithm and process, which is more realistic, convenient, and flexible. Dirichlet Process (DP) and Truncated Dirichlet Process In the parametric modelling approach, we assume a specific density function G(.) with a limited number of unknown parameters. In contrast, the nonparametric Bayesian approach takes G(.) as unknown with the possibility of infinite number of parameters (number of parameters depends on the complexity of observed data) and assumes a continuous baseline distribution (prior) for G(.). In other words, the unknown density G(.) is centered around the baseline distribution (G0) and its variation around G0 is determined by a real positive precision parameter, κ (, ). As κ approaches infinity, G(.) becomes more similar to G0. That is in a random intercept model, for example, larger values of κ imply that each unit effect, say ηi (i=,,,l where l denotes the number of observations), tends to be in a distinct subpopulation. This condition is similar to the standard random effects/parameters assumption that assumes a common normal distribution for all random effects (here, intercepts) (); i.e., ηi ~ Normal(m,ν) where m and ν are the mean and the variance. A κ approaching zero, however, indicates a major difference between G(.) and G0. In this condition, all unit effects (in our example, intercepts) tend to be in the same cluster. This is similar to the common intercept assumption: η = η = = ηl. The latter condition does not occur in most real applications (e.g., due to unobserved heterogeneity). Moreover, the former condition, which is more probable to occur, might be too strict since it hypothesizes that all random units are generated from a unique distribution. Dirichlet process mixing helps build a flexible model that relies on the aforementioned conditions. Given G0 and κ explained above, a generic Dirichlet process GDP can be defined as G DP ~Dirichlet(κG 0 ) () This is a random density measure on the space of all probability measures GDP (). That is for any partition P,, Pn belonging to the parameter space, the vector of random densities (G(p),, G(pn)) follows a Dirichlet distribution (0): (G(p ),, G(p n )) ~ Dirichlet(κG 0 (p ),, κg 0 (p n )) () To clarify this procedure and for the sake of simplification, let s assume that a real line represents the entire sample space of a given parameter, as in (). This line can be partitioned into several intervals: (-, p), (p, p),, (pn-, pn-), (pn-, ) where p stands for partitions or intervals. Then, the probability (PR) of falling into each interval will be as follows: PR = G(p), PR = G(p) - G(p),, PRn- = G(pn-) - G(pn-), and PRn = - G(pn-) Similarly, the probability (PR0) for the baseline distribution Go can be obtained from

8 S. Heydari, L. Fu, D. Lord, And B. K. Mallick PR0,n- = G0(pn-) - G0(pn-) Considering the partitioned line discussed above, the probabilities PR to PRn follow a Dirichlet distribution: (PR, PR,, PR n ) ~ Dirichlet(κPR 0,, κpr 0,,, κpr 0,n ) () To generate random density functions from a Dirichlet process, stick-breaking procedure () can be employed. Here, we adopt the description provided by () based on which various steps of steak-breaking are as follows: From the baseline distribution G0, generate a vector of random variables θ, θ, ; From the density function Beta(, κ), generate a vector of random variables ξ, ξ, ; thus, we can write κ PR(ξ n ) = κξ n () E(ξ n ) = ( + κ) () Assign probabilities PR, PR,..., PRn to random variables θ, θ,, θn, respectively, where PR = ξ PR = ( - ξ)ξ PR = ( - ξ)( ξ)ξ PRn = ( - ξ)( ξ) ( ξn-)( ξn-)ξ Let Iθ be an indicator function and f(.) be the probability function (infinite mixture of point masses) that corresponds to GDP, we can then write f(. ) = n= PR n I θn ; θ n ~ G 0 () It should be noted that the above density indicates that random draws from Dirichlet process are discrete probability distributions (). Nevertheless, this can be problematic when the underlying distribution is continuous. Therefore, a modification is required to substitute the indicator function Iθ with a continuous density function denoted here by γ(. θn). This results in Dirichlet process mixing. f(. ) = n= PR n γ(. θ n ); θ n ~ G 0 () Because of the complexities associated with a full Dirichlet process in terms of computation, a truncation approach can be employed to obtain an approximation (). This results in a truncated Dirichlet process in which the main idea is to limit the maximum number of possible partitions (say, on the partitioned real line discussed above). So that instead of allowing n to go to infinity, it can grow until a certain discrete value C; i.e., the maximum number of clusters. Doing so, GDP thus depends on κ, G0, and also C. One condition that should be satisfied here is related to the probability of the final partition. It is expected that the probability for the last partition be a very small value, ε, such as 0.0 (). When C is a relatively large value like 0, while κ is limited

9 S. Heydari, L. Fu, D. Lord, And B. K. Mallick to 0, the probability of the last partition approaches zero (). Under the truncated Dirichlet process the following condition should be satisfied for the last partition: PR C = C n= PR n In a truncated Dirichlet process, Eq. () can be written as f(. ) = C n= PR n γ(. θ n ); θ n ~ G 0 () It can be shown that the maximum number of clusters, C, depends on ε and κ (): C + log(ε) log [ κ + κ ] We can then approximate a full Dirichlet process by choosing C, instead of, as the maximum possible number of clusters. One should take into account that C cannot be greater than the total number of observations (or groups) in data. For instance, if the data include 00 observations, C cannot be greater than 00. Note that a prior distribution can be assumed on κ to estimate its value as part of the analysis. This prior should be in accordance with C and data in general. In this regard, further discussion is provided in the section of results. Proposed model framework applied to the study in context As a starting point, in this research, the Simple Poisson-Lognormal Model (SPLM) was used as the base model to analyze the crash frequency data: y j ~ Poisson (λ j ) () λ j = µ j e ε j () log(μ j ) = η + βx j () log(λ j ) = η + βx j + ε j () ε j ~ normal(0, ν ε ) () Where yj and λj denote observed and expected crash frequencies for site j, respectively; η is the intercept; β is the vector of coefficients; X is the vector of covariates; e ε is a lognormally distributed error term that accounts for overdispersion; and νε is the variance of the error term. In this study, we adopted a multilevel modeling approach as the hierarchical structure of the data necessitates. Such hierarchical structure, which occurs often in transportation safety studies (), requires allowing one or more model parameters to vary across groups of observations (here, regions). In many instances, a normal distribution is used for any random parameter of interest resulting in a fully parametric model. In the subsequent sections, we first describe a generic parametric random intercept multilevel model and then a semi-parametric random intercept multilevel model. Parametric Random Intercept Multilevel Model Let s consider the data in context (Section ) with grade crossings nested in different municipalities. Assume also a Random Intercept Multilevel Poisson-Lognormal Model () (0)

10 S. Heydari, L. Fu, D. Lord, And B. K. Mallick (RIMPLM) in which intercept varies between regions based on the belief that each region is likely to have its own characteristics, unobserved/unmeasured attributes. Let r denotes regions, a typical parametric multilevel model with varying intercept across regions can be obtained by extending the previously discussed SPLM as follows: y rj X rj, ε rj, η r ~ Poisson (λ rj ) () log(λ rj ) = η r + βx rj + ε rj () η r ~ normal(m η, ν η ) () ε rj ~ normal(0, ν ε ) () Where mη and νη are, respectively, the mean and the variance for the varying intercept ηr. It can be seen that the RIMPLM assumes a common normally distributed random intercept at region-level. As described earlier, for example, in case of the municipality-level dataset, the RIMPLM assumes a common normal distribution for all the municipalities. This impose a strong assumption that implies all municipalities come from the same population. The RIMPLM neglects the possibility that some municipalities might behave very different (outliers) from the rest of municipalities in the data. In the next section, we relax this assumption with a flexible model that adapts itself to the complexity of the observed data. Semi-parametric random intercept multilevel model Standard parametric assumptions on random parameters might compromise the quality of analyses. Our Flexible Dirichlet Process Multilevel Model (FDPMM) examines the quality of such parametric models while providing further insights into the data; e.g., identifying latent clusters and outliers. It is important to mention that many outlier detection methods are designed to identify outliers. Then, the modeller should exclude them from the data and conduct the analysis without them to avoid biased or less reliable estimates. Our flexible modeling approach, however, allows us to accommodate outliers in analyses without undermining the quality of analyses. Note that the flexible model presented here is used in a multilevel framework with the Poisson-lognormal model for crash frequency. However, it can be similarly adopted in single-level settings (non-multilevel analysis) and/or with different statistical models such as the Poisson, Poisson-gamma, etc. Besides its application to count models for crash frequency datasets, the proposed flexible model can also be employed in different contexts such as injury-severity analysis, travel demand research, etc. For the purpose of this specific research, the FDPMM can be defined as follows. y rj X rj, ε rj, η i ~ Poisson (λ rj ) (0) log(λ rj ) = η r + βx rj + ε rj () ε rj ~ normal(0, ν ε ) () η r = η DP ~ Dirichlet(κη 0 ); θ r ~ η 0 & r =,,, C () θ r ~ normal(m 0, ν 0 ) & κ ~ g(. ) () Where θ0 (with unknown parameters, the mean m0 and the variance ν0) is the realization of the baseline distribution η0 for ηr; and κ is the precision parameter as explained earlier in Section.. Recall that r denotes latent clusters and C stands for the maximum possible number of latent clusters (see Section.., Eq.0). In the previous model (i.e., RIMPLM), the varying intercept ηr

11 S. Heydari, L. Fu, D. Lord, And B. K. Mallick was normally distributed, whereas under the FDPMM it is defined non-parametrically using a Dirichlet process mixing. Doing so, we remove the restriction of the standard distributional assumption and allow the observed dataset to decide its proper form of the varying intercept. If the FDPMM provides a significantly better fit to the data compared to the RIMPLM, one can doubt the appropriateness of the parametric assumption. One should also take into account that the parameters of the baseline distribution, η0, are estimated here as part of the modeling process allowing us to account for uncertainties associated with the baseline distribution for the varying intercept. It is important to mention that to maintain interpretative capabilities of the model, as it can be seen in the representation of the FDPMM above, the vector of coefficients β associated with the known covariates vector X (site characteristics) does not follow a Dirichlet process and is fixed. Other extensions are obviously possible; for example, one might allow the effect of one or more covariates to vary across different regions. Note that a Dirichlet process mixing over the intercept (as in our FDPMM) allows us to deal with heterogeneity in data with respect to the mean (); that is, mean crash frequency in our paper. In the study in context, thus, such mixing enables the identification of latent clusters among different regions being various municipalities. Elicitation of Priors Bayesian analysis requires the elicitation of priors for parameters of interest. In this research, we used non-informative normal priors with mean zero for β, mη, and m0. For the inverse of variances νε, νη, and ν0, we used a diffuse gamma prior with shape and scale parameters being equal to 0.0. It is also necessary to define a prior distribution on the precision parameter κ for which different priors are possible such as gamma, exponential, and uniform. This prior should agree with the maximum number of allowed clusters C (Eq. 0). Here, we used two different uniform priors for each dataset as it is important to choose this prior based on data characteristics; for example, the number of observations over which we want to cluster the data. For the municipality-level dataset, we set the maximum number of cluster C to be 0 given the number of municipalities; i.e.,. Doing so, a better approximation of a full Dirichlet process can be obtained (). This also allows larger values of κ, which means that we do not force κ to be a small value. Therefore, we chose a uniform prior with an upper bound of 0 that corresponds to approximately 0 clusters based on Eq. 0. A lower bound of 0. was selected here to allow smaller values of κ and also to circumvent problems associated with the estimation of PRn. Therefore, we assume κ ~ uniform(0., 0). Model Selection: Conditional Predictive Ordinate and Pseudo Bayes Factor In this research, we used Conditional Predictive Ordinate (CPO) to estimate Log Pseudo Marginal Likelihood (LPML) and Pseudo Bayes Factor (PBF) (-0) to compare the three models described previously: SPLM, RIMPLM, and FDPMM. The use of CPOs for model selection in road safety literature has been extremely rare (, ). With regard to LPML and PBF, this would be probably the first instance of using such model selection criteria in transportation safety studies. One should take into account that CPO and PBF are in general more robust than the commonly used Deviance Information Criterion (DIC) (). It is important to mention that DIC is known for its problematic issues; for example, in terms of significant sensitivity to parameterization () or in situations in which the posterior density is not unimodal. In fact, WinBugs cannot estimate the DIC value when estimating the FDPMM, which involves multimodal posteriors. For a further

12 S. Heydari, L. Fu, D. Lord, And B. K. Mallick discussion on the DIC drawbacks readers are referred to (). In this section, we therefore focus on the estimation of LPML and PBF. The main idea behind cross-validation methods constitutes the base for the estimation of CPOs. In cross-validation, a given data set is divided into two groups. One is used to make the posterior inference, whereas the second group is used to validate the previously estimated model. The problem here is the sensitivity of the results to how these groups are selected. CPO circumvents this problem by leaving out only one observation each time (0). Consider a full set of observed data Y including i=,,,l observations. For a given observation Yi, the leaving-out cross-validation predictive density, as in (0), is CPO i = f(y i y i ) = f(y i ψ)f(ψ y i )dψ () Where yi consists of Y when Yi (the i th observation) is excluded from the data; and ψ denotes model parameters. Therefore, CPO of an observation in a given data is the likelihood of that observation given the rest of observations in that data (). CPO can be therefore used to identify datapoints that are in conflict with the rest of observations in a given dataset (0). The estimation of CPOs can be readily obtained from the adopted MCMC algorithm as CPO i = ( T T t= ) () f(y i ψ (t) ) Where T stands for the total number of iterations (t =,,,T) in MCMC runs. CPO is thus the mean of the probability distribution function estimated at observation Yi for each ψ (t). The product of CPOs is referred to as pseudo marginal likelihood (PML) (): l PML = i= CPO i () Similar to the log likelihood, LPML is usually computed: l l LPML = log { i= CPO i } = i= log (CPO i ) () PML or LPML can be used as a measure of Bayesian model fit and selection. The model with the largest LPML indicates the best fit to the data. As another model selection criteria, pseudo Bayes factor (PBF) can be easily estimated by dividing the PML of two models (). For example, to verify whether model fits the data better than model, the PBF is given by PBF = PML model PML model (0) Table shows how model selection can be carried out based on Bayes factor values as reported in (0). Note that the interpretation of PBF is similar to Bayes factor.

13 S. Heydari, L. Fu, D. Lord, And B. K. Mallick Table Bayesian Model Selection via Bayes Factor Bayes Factor Degree of support for the model of interest - No evidence of support -0 Support 0-0 Strong support >0 Very strong support RESULTS AND DISCUSSION The models explained above were implemented in the statistical software WinBugs. A total of 0000 MCMC iterations, in addition to 000 burn-in iterations, with chains were utilized to obtain posterior inferences. All three models ran smoothly and converged relatively quickly. For example, the FDPMM converged at around 000 iterations. This is an indication of well-defined models and priors. MCMC convergence was verified through history plots, trace plots, and Gelman-Rubin diagram, being available in WinBugs. Table presents the analyses results (at % level of confidence) related to the municipalitylevel data. The standard model (the SPLM) provided a poor fit compared to other two models that account for the hierarchy in data. The results highlighted that traffic exposure, urban area, whistle prohibition, and train speed are positively associated with crash frequencies at FLB crossings. The significant variance of the varying intercept in the multilevel framework indicates that crossings nested in the same municipalities are somehow dependent. Therefore, the SPLM is not a proper choice. Interestingly, whistle prohibition is significant at a level of confidence of 0.0 in the SPLM, but this variable is only significant at a level of confidence of 0.0 in the RIMPLM and the FDPMM. This is in accordance with previous research (, ). As discussed in (), single-level models such as the SPML employed here assume that all observations are generated from a unique homogeneous population. This in turn implies that the residuals are independent resulting in underestimated standard errors; and consequently, erroneous confidence intervals. Our flexible model provided the best fit to the municipality-level data. The log marginal likelihood of the FDPMM is the highest (see Table ). When comparing the FDPMM with the RIMPLM, a pseudo Bayes factor of. indicates a strong support (see also Table ) for the proposed flexible model. This leads to the question about the adequacy of the standard parametric assumption on varying intercept for the municipality-level data. In other words, assuming a common distribution for all municipalities is not appropriate. It can be seen in Table that the expected number of non-empty clusters is. in the FDPMM. Since the number of municipalities is large, we avoid providing all clusters here. For illustration, a small part of the clustering results are reported in Table. As an example, we found that the following municipalities share the same cluster with a probability greater than 0.0: Calgary, Edmonton, Regina, Saskatoon, Winnipeg, Grand Prairie, and Nanaimo.

14 S. Heydari, L. Fu, D. Lord, And B. K. Mallick Table Estimation Results for FLB Crossings, Municipality-Level Data Posterior Credible intervals Variable mean Std. dev..%.% Log of exposure Urban area Whistle prohibition Train speed Intercept Variance (ν ε) LPML (log pseudo marginal likelihood) Log of exposure Urban area Whistle prohibition Train speed Intercept mean Intercept variance Variance (ν ε) LPML (log pseudo marginal likelihood) Log of exposure Urban area Whistle prohibition Train speed Intercept mean Intercept variance Intercept s baseline mean (m 0) Intercept s baseline variance (ν 0) Variance (ν ε) Dirichlet precision parameter (κ) Expected number of non-empty clusters LPML (log pseudo marginal likelihood) Model Comparison based on PBF (Pseudo Bayes Factor) PBF (FDPMM vs. RIMPLM) =. Whistle prohibition is significant at a significance level of 0.0 in multilevel models. Note: SPML is the Standard Poisson-Lognormal Model; RIMPLM is the Random Intercept Multilevel Poisson-Lognormal Model; and FDPMM is the Flexible Dirichlet Process Multilevel Model. SPLM RIMPLM FDPMM Table Cluster and Outlier Identification Results Municipality Average size of cluster (% interval) Similar municipalities Probability > 0. (, ),,,,,, 0,, 0 (, ) Probability > 0.,,,,,,,,,,,, 0,,,,,, 0 Probability > 0. (, ),,,,, 0 Note: size of cluster is the median of the number of municipalities in the same cluster

15 S. Heydari, L. Fu, D. Lord, And B. K. Mallick It should be mentioned that no outlier municipality was identified, and the smallest cluster included 0 municipalities. An outlier municipality can be detected when no other municipality shares the same cluster with this outlier. That is, the size of cluster for the outlier is. Note that Table uses different threshold probabilities, for illustration, to define clusters among different municipalities. Obviously, alternative threshold values result in different members in clusters. It should be mentioned that larger probabilities will result in higher number of clusters. In other words, as the threshold probability approaches, for a given observation i, the number remaining observations that share the same cluster (with observation i) approaches SUMMARY AND CONCLUSIONS To overcome unobserved heterogeneity in data, random effects/parameters models and mixture models are often used in transportation safety literature. Standard distributional assumptions are an intrinsic part of random effects/parameters models. Because of the fact that sensitivity to such assumptions might be of a major concern in some datasets or applications, this paper propose a class of advanced flexible statistical models to investigate the adequacy of these parametric assumptions. The adopted approach has several additional advantages such as the ability to identify outliers and latent subpopulations in data. The method is also capable of accommodating outliers in analyses while preventing the latter from affecting the quality of estimates. It should be noted that the mixture modeling approach is an alternative method that can deal with some concerns associated with random effects/parameters models. In mixture models, however, the number of latent components in data should be specified in advance. In most applications, there is not any sound justification for selecting the number of components. Our proposed technique considers the number of latent components as an unknown parameter and estimates its expectation as part of its efficient mathematical algorithm. We adopted a multilevel dataset containing crash frequencies for FLB grade crossings in Canada to show the feasibility of the adopted flexible model. Log pseudo marginal likelihoods and pseudo Bayes factors computed from conditional predictive ordinates were utilized for model selection. The results confirmed the need for a multilevel modeling approach. We found that the single-level model underestimated standard errors for the coefficient associated with whistle prohibition in the municipality-level data. Traffic exposure, location of crossing (urban vs. non-urban), train speed, whistle prohibition were positively associated with crash frequencies. The results illustrated that the adequacy of the standard parametric assumption was under question for the municipality-level data. We identified latent subpopulations among Canadian municipalities. And finally, in terms of outliers, the results indicated that there is not any outlier municipality among those analyzed in this paper. It should be noted that the identification of clusters among various regions has a significant interpretative value. This is an indicator of common unmeasured/unknown covariates among those regions that are in the same subgroup. Based on the identified clusters, further investigations can be conducted to detect the presence (or extent) of such unmeasured/unknown covariates and attributes. Latent similarities and dissimilarities are expected among different regions due to variations in different regional policies, population demography, driver behaviour, climate, traffic regulations, etc.

16 S. Heydari, L. Fu, D. Lord, And B. K. Mallick Acknowledgments The authors would like to acknowledge the Natural Sciences and Engineering Research Council of Canada for their financial support. We would also like to thank Transport Canada (Rail Safety Directorate) for providing the data and financial support. References. Anastasopoulos, P., and F. Mannering. A note on modeling vehicle accident frequencies with random-parameters count models. Accident Analysis and Prevention, Vol., 00, pp... Chen, E., and A. Tarko. Modeling safety of highway work zones with random parameters and random effects models. Analytic Methods in Accident Research, Vol., 0, pp. -.. Mannering, F., and C. R. Bhat. Analytic Methods in Accident Research: Methodological Frontier and Future Directions. Analytic Methods in Accident Research, Vol., 0, pp. -.. Ohlssen, D. I., L. D. Sharples, and D. J. Spiegelhalter. Flexible Random-effects Models Using Bayesian Semi-Parametric models: Application to institutional Comparisons. Statistics in Medicine, Vol., 00, pp Wu, Z., A. Sharma, F. Mannering, and S. Wang. Safety impacts of signal-warning flashers and speed control at high-speed signalized intersections. Accident Analysis and Prevention, Vol., 0, pp. 0.. Jones, A., and S. Jørgensen. The use of multilevel models for the prediction of road accident outcomes. Accident Analysis and Prevention, Vol., 00, pp... Yannis, G., E. Papadimitriou, C. Antoniou. Multilevel modelling for the Regional Effect of Enforcement on road Accidents. Accident Analysis and Prevention, Vol., 00, pp. -.. Huang, H., H. C. Chin, M. M. Haque. Severity of driver injury and vehicle damage in traffic crashes at intersections: a Bayesian hierarchical analysis. Accident Analysis and Prevention, Vol. 0, 00, pp... Papadimitriou, E., A. Theofilatos, G. Yannis, J. Cestac, and S. Kraïem. Motorcycle Riding Under the Influence of Alcohol: Results from the SARTRE- Survey. Accident Analysis and Prevention, Vol. 0, 0, pp Xiong, Y., and F. Mannering. The Heteroscedastic Effects of Guardian Supervision on Adolescent Driver-Injury Severities: A Finite Mixture-Random Parameters Approach. Transportation Research Part B, Vol., 0, pp. -.. Dupont, E., E. Papadimitriou, H. Martensen, and G. Yannis. Multilevel Analysis in Road Safety Research. Accident Analysis and Prevention, Vol. 0, 0, pp H. Huang, M. Abdel-Aty. Multilevel data and Bayesian analysis in traffic safety. Accident Analysis and Prevention, Vol., 00, pp... Heydari, S., Miranda-Moreno, L.F., Liping, F. Speed limit reduction in urban areas: A beforeafter study using Bayesian generalized mixed linear models. Accident Analysis and Prevention, Vol., pp. -.. Escobar, M., and M. West. Bayesian Density Estimation and Inference Using Mixtures. Journal of the American Statistical Association, Vol. 0,, pp. -.. Park, B. J., and D. Lord. Application of Finite Mixture Models for Vehicle Crash Data Analysis. Accident Analysis and Prevention, Vol., 00, pp. -.

17 S. Heydari, L. Fu, D. Lord, And B. K. Mallick Saccomanno F. F., and X. Lai. A Model for Evaluating Countermeasures at Highway-Railway Grade Crossings. Transportation Research Record: Journal of the Transportation Research Board, No., 00, pp. -.. Oh, J., S. P. Washington, and N. Doohee. Accident Prediction Models for Railway-Highway interfaces. Accident Analysis and Prevention, Vol., 00, pp Yan, X., S. Richards, and X. Su. Using Hierarchical Tree-Based Regression Model to Predict Train-Vehicle Crashes at Passive Highway-Rail Grade Crossings. Accident Analysis and Prevention. Vol.. 00, pp. -.. Spiegelhalter, D. J., A. Thomas, N. G. Best. WinBUGS. User Manual. MRC Biostatistics unit and Imperial College, 00. Available from 0. Muller P., and F. A. Quintana. Nonparametric Bayesian data analysis. Statistical Science, Vol., 00; pp. 0.. Hjort, N., C. Holmes, P. Müller, and S. G. Walker. Bayesian Nonparametrics: Principles and Practice. Cambridge University Press, 00.. Ferguson, T. S. A Bayesian Analysis of Some Nonparametric Problems. The Annals of Statistics, Vol.,, pp Antoniak, C. E. Mixtures of Dirichlet Processes with Applications to nonparametric Problems. The Annals of Statistics, Vol.,, pp. -.. Bush, C. A., and S. N. MacEachern. A Semi-Parametric Bayesian Model for Randomized Block Designs. Biometrika, Vol.,, pp. -.. Mukhopadhyay, S., and A. E. Gelfand. Dirichlet Process Mixed Generalized Linear Models. Journal of the American Statistical Association, Vol.,, pp. -.. Dhavala, S. S., S. Datta, B. K. Mallick, R. J. Carroll, S. Khare, S. D. Lawhon, and L. G. Adams. Bayesian Modeling of MPSS Data: Gene Expression Analysis of Bovine Salmonella Infection. Journal of the American Statistical Association, Vol. 0, 00, pp. -.. Ishwaran, H., L. F. James. Gibbs Sampling Methods for Stick-Breaking Priors. Journal of the American Statistical Association, Vol., 00, pp... Gelfand, A. Model determination using sampling-based methods, in W. Gilks, S. Richardson, and D. Spiegelhalter, eds., Markov Chain Monte Carlo in Practice, Chapman & Hall, Suffolk,.. Carlin, B. P. and T. A. Louis. Bayesian Methods for Data Analysis, third edition. Boca Raton: Chapman & Hall/CRC, Ntzoufras, I. Bayesian Modeling using WinBugs. John Wiley & Sons, 00.. Yang, H., K. Ozbay, O. Ozturk, M. Yildirimoglu. Modeling Work Zone Crash Frequency by Quantifying Measurement Errors in Work Zone Length. Accident Analysis and Prevention, Vol., 0, pp Kun, X., X. Wang, K. Ozbay, H. Yang. Crash Frequency Modeling for Signalized Intersections in a High-Density Urban Road Network. Analytic Methods in Accident Research, Vol., 0, pp. -.. Spiegelhalter, D. J., N. G. Best, B. P. Carlin, and A. van der Linde. Bayesian Measures of Complexity and Fit (with Discussion). Journal of the Royal Statistics Society, Series B, Vol., 00,.. Geedipally, S. R., D. Lord, and S. S. Dhavala. A Caution about Using Deviance Information Criterion While Modelling Traffic Crashes. Safety Science, Vol., 0, pp. -.

18 S. Heydari, L. Fu, D. Lord, And B. K. Mallick. Kim, D. G., Y. Lee, S. Washington, K. Choi. Modeling Crash Outcome Probabilities at Rural Intersections: Application of Hierarchical Binomial Logistic Models. Accident Analysis and Prevention, Vol., 00, pp. -.

TRB Paper # Examining the Crash Variances Estimated by the Poisson-Gamma and Conway-Maxwell-Poisson Models

TRB Paper # Examining the Crash Variances Estimated by the Poisson-Gamma and Conway-Maxwell-Poisson Models TRB Paper #11-2877 Examining the Crash Variances Estimated by the Poisson-Gamma and Conway-Maxwell-Poisson Models Srinivas Reddy Geedipally 1 Engineering Research Associate Texas Transportation Instute

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

How to Incorporate Accident Severity and Vehicle Occupancy into the Hot Spot Identification Process?

How to Incorporate Accident Severity and Vehicle Occupancy into the Hot Spot Identification Process? How to Incorporate Accident Severity and Vehicle Occupancy into the Hot Spot Identification Process? Luis F. Miranda-Moreno, Liping Fu, Satish Ukkusuri, and Dominique Lord This paper introduces a Bayesian

More information

A Full Bayes Approach to Road Safety: Hierarchical Poisson. Mixture Models, Variance Function Characterization, and. Prior Specification

A Full Bayes Approach to Road Safety: Hierarchical Poisson. Mixture Models, Variance Function Characterization, and. Prior Specification A Full Bayes Approach to Road Safety: Hierarchical Poisson Mixture Models, Variance Function Characterization, and Prior Specification Mohammad Heydari A Thesis in The Department of Building, Civil and

More information

Bayesian Nonparametric Regression for Diabetes Deaths

Bayesian Nonparametric Regression for Diabetes Deaths Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,

More information

Bayesian Mixture Modeling of Significant P Values: A Meta-Analytic Method to Estimate the Degree of Contamination from H 0 : Supplemental Material

Bayesian Mixture Modeling of Significant P Values: A Meta-Analytic Method to Estimate the Degree of Contamination from H 0 : Supplemental Material Bayesian Mixture Modeling of Significant P Values: A Meta-Analytic Method to Estimate the Degree of Contamination from H 0 : Supplemental Material Quentin Frederik Gronau 1, Monique Duizer 1, Marjan Bakker

More information

Exploring the Application of the Negative Binomial-Generalized Exponential Model for Analyzing Traffic Crash Data with Excess Zeros

Exploring the Application of the Negative Binomial-Generalized Exponential Model for Analyzing Traffic Crash Data with Excess Zeros Exploring the Application of the Negative Binomial-Generalized Exponential Model for Analyzing Traffic Crash Data with Excess Zeros Prathyusha Vangala Graduate Student Zachry Department of Civil Engineering

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

eqr094: Hierarchical MCMC for Bayesian System Reliability

eqr094: Hierarchical MCMC for Bayesian System Reliability eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167

More information

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional

More information

Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data. Fred Mannering University of South Florida

Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data. Fred Mannering University of South Florida Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data Fred Mannering University of South Florida Highway Accidents Cost the lives of 1.25 million people per year Leading cause

More information

The Effect of Sample Composition on Inference for Random Effects Using Normal and Dirichlet Process Models

The Effect of Sample Composition on Inference for Random Effects Using Normal and Dirichlet Process Models Journal of Data Science 8(2), 79-9 The Effect of Sample Composition on Inference for Random Effects Using Normal and Dirichlet Process Models Guofen Yan 1 and J. Sedransk 2 1 University of Virginia and

More information

TRB Paper Examining Methods for Estimating Crash Counts According to Their Collision Type

TRB Paper Examining Methods for Estimating Crash Counts According to Their Collision Type TRB Paper 10-2572 Examining Methods for Estimating Crash Counts According to Their Collision Type Srinivas Reddy Geedipally 1 Engineering Research Associate Texas Transportation Institute Texas A&M University

More information

Analyzing Highly Dispersed Crash Data Using the Sichel Generalized Additive Models for Location, Scale and Shape

Analyzing Highly Dispersed Crash Data Using the Sichel Generalized Additive Models for Location, Scale and Shape Analyzing Highly Dispersed Crash Data Using the Sichel Generalized Additive Models for Location, Scale and Shape By Yajie Zou Ph.D. Candidate Zachry Department of Civil Engineering Texas A&M University,

More information

Crash Data Modeling with a Generalized Estimator

Crash Data Modeling with a Generalized Estimator Crash Data Modeling with a Generalized Estimator Zhirui Ye* Professor, Ph.D. Jiangsu Key Laboratory of Urban ITS Jiangsu Province Collaborative Innovation Center of Modern Urban Traffic Technologies Southeast

More information

The Negative Binomial Lindley Distribution as a Tool for Analyzing Crash Data Characterized by a Large Amount of Zeros

The Negative Binomial Lindley Distribution as a Tool for Analyzing Crash Data Characterized by a Large Amount of Zeros The Negative Binomial Lindley Distribution as a Tool for Analyzing Crash Data Characterized by a Large Amount of Zeros Dominique Lord 1 Associate Professor Zachry Department of Civil Engineering Texas

More information

Penalized Loss functions for Bayesian Model Choice

Penalized Loss functions for Bayesian Model Choice Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Nonparametric Bayesian Methods - Lecture I

Nonparametric Bayesian Methods - Lecture I Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics

More information

The Negative Binomial-Lindley Generalized Linear Model: Characteristics and Application using Crash Data

The Negative Binomial-Lindley Generalized Linear Model: Characteristics and Application using Crash Data The Negative Binomial-Lindley Generalized Linear Model: Characteristics and Application using Crash Data Srinivas Reddy Geedipally 1 Engineering Research Associate Texas Transportation Institute Texas

More information

Does the Dispersion Parameter of Negative Binomial Models Truly. Estimate the Level of Dispersion in Over-dispersed Crash data with a. Long Tail?

Does the Dispersion Parameter of Negative Binomial Models Truly. Estimate the Level of Dispersion in Over-dispersed Crash data with a. Long Tail? Does the Dispersion Parameter of Negative Binomial Models Truly Estimate the Level of Dispersion in Over-dispersed Crash data wh a Long Tail? Yajie Zou, Ph.D. Research associate Smart Transportation Applications

More information

A hidden semi-markov model for the occurrences of water pipe bursts

A hidden semi-markov model for the occurrences of water pipe bursts A hidden semi-markov model for the occurrences of water pipe bursts T. Economou 1, T.C. Bailey 1 and Z. Kapelan 1 1 School of Engineering, Computer Science and Mathematics, University of Exeter, Harrison

More information

LEVERAGING HIGH-RESOLUTION TRAFFIC DATA TO UNDERSTAND THE IMPACTS OF CONGESTION ON SAFETY

LEVERAGING HIGH-RESOLUTION TRAFFIC DATA TO UNDERSTAND THE IMPACTS OF CONGESTION ON SAFETY LEVERAGING HIGH-RESOLUTION TRAFFIC DATA TO UNDERSTAND THE IMPACTS OF CONGESTION ON SAFETY Tingting Huang 1, Shuo Wang 2, Anuj Sharma 3 1,2,3 Department of Civil, Construction and Environmental Engineering,

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

DEVELOPING DECISION SUPPORT TOOLS FOR THE IMPLEMENTATION OF BICYCLE AND PEDESTRIAN SAFETY STRATEGIES

DEVELOPING DECISION SUPPORT TOOLS FOR THE IMPLEMENTATION OF BICYCLE AND PEDESTRIAN SAFETY STRATEGIES DEVELOPING DECISION SUPPORT TOOLS FOR THE IMPLEMENTATION OF BICYCLE AND PEDESTRIAN SAFETY STRATEGIES Deo Chimba, PhD., P.E., PTOE Associate Professor Civil Engineering Department Tennessee State University

More information

Fully Bayesian Spatial Analysis of Homicide Rates.

Fully Bayesian Spatial Analysis of Homicide Rates. Fully Bayesian Spatial Analysis of Homicide Rates. Silvio A. da Silva, Luiz L.M. Melo and Ricardo S. Ehlers Universidade Federal do Paraná, Brazil Abstract Spatial models have been used in many fields

More information

The Conway Maxwell Poisson Model for Analyzing Crash Data

The Conway Maxwell Poisson Model for Analyzing Crash Data The Conway Maxwell Poisson Model for Analyzing Crash Data (Discussion paper associated with The COM Poisson Model for Count Data: A Survey of Methods and Applications by Sellers, K., Borle, S., and Shmueli,

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH Lecture 5: Spatial probit models James P. LeSage University of Toledo Department of Economics Toledo, OH 43606 jlesage@spatial-econometrics.com March 2004 1 A Bayesian spatial probit model with individual

More information

Bayesian non-parametric model to longitudinally predict churn

Bayesian non-parametric model to longitudinally predict churn Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition. Christian P. Robert The Bayesian Choice From Decision-Theoretic Foundations to Computational Implementation Second Edition With 23 Illustrations ^Springer" Contents Preface to the Second Edition Preface

More information

Bayesian Analysis of Latent Variable Models using Mplus

Bayesian Analysis of Latent Variable Models using Mplus Bayesian Analysis of Latent Variable Models using Mplus Tihomir Asparouhov and Bengt Muthén Version 2 June 29, 2010 1 1 Introduction In this paper we describe some of the modeling possibilities that are

More information

DIC: Deviance Information Criterion

DIC: Deviance Information Criterion (((( Welcome Page Latest News DIC: Deviance Information Criterion Contact us/bugs list WinBUGS New WinBUGS examples FAQs DIC GeoBUGS DIC (Deviance Information Criterion) is a Bayesian method for model

More information

Bayesian Poisson Hierarchical Models for Crash Data Analysis: Investigating the Impact of Model Choice on Site-Specific Predictions

Bayesian Poisson Hierarchical Models for Crash Data Analysis: Investigating the Impact of Model Choice on Site-Specific Predictions Khazraee, Johnson and Lord Page 1 of 47 Bayesian Poisson Hierarchical Models for Crash Data Analysis: Investigating the Impact of Model Choice on Site-Specific Predictions S. Hadi Khazraee, Ph.D.* Safety

More information

Varieties of Count Data

Varieties of Count Data CHAPTER 1 Varieties of Count Data SOME POINTS OF DISCUSSION What are counts? What are count data? What is a linear statistical model? What is the relationship between a probability distribution function

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

ABSTRACT (218 WORDS) Prepared for Publication in Transportation Research Record Words: 5,449+1*250 (table) + 6*250 (figures) = 7,199 TRB

ABSTRACT (218 WORDS) Prepared for Publication in Transportation Research Record Words: 5,449+1*250 (table) + 6*250 (figures) = 7,199 TRB TRB 2003-3363 MODELING TRAFFIC CRASH-FLOW RELATIONSHIPS FOR INTERSECTIONS: DISPERSION PARAMETER, FUNCTIONAL FORM, AND BAYES VERSUS EMPIRICAL BAYES Shaw-Pin Miaou Research Scientist Texas Transportation

More information

Distance-Based Probability Distribution for Set Partitions with Applications to Bayesian Nonparametrics

Distance-Based Probability Distribution for Set Partitions with Applications to Bayesian Nonparametrics Distance-Based Probability Distribution for Set Partitions with Applications to Bayesian Nonparametrics David B. Dahl August 5, 2008 Abstract Integration of several types of data is a burgeoning field.

More information

Quantifying the Price of Uncertainty in Bayesian Models

Quantifying the Price of Uncertainty in Bayesian Models Provided by the author(s) and NUI Galway in accordance with publisher policies. Please cite the published version when available. Title Quantifying the Price of Uncertainty in Bayesian Models Author(s)

More information

Spatial Analysis of Incidence Rates: A Bayesian Approach

Spatial Analysis of Incidence Rates: A Bayesian Approach Spatial Analysis of Incidence Rates: A Bayesian Approach Silvio A. da Silva, Luiz L.M. Melo and Ricardo Ehlers July 2004 Abstract Spatial models have been used in many fields of science where the data

More information

Marginal Specifications and a Gaussian Copula Estimation

Marginal Specifications and a Gaussian Copula Estimation Marginal Specifications and a Gaussian Copula Estimation Kazim Azam Abstract Multivariate analysis involving random variables of different type like count, continuous or mixture of both is frequently required

More information

Plausible Values for Latent Variables Using Mplus

Plausible Values for Latent Variables Using Mplus Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can

More information

Markov Chain Monte Carlo in Practice

Markov Chain Monte Carlo in Practice Markov Chain Monte Carlo in Practice Edited by W.R. Gilks Medical Research Council Biostatistics Unit Cambridge UK S. Richardson French National Institute for Health and Medical Research Vilejuif France

More information

An Alternative Infinite Mixture Of Gaussian Process Experts

An Alternative Infinite Mixture Of Gaussian Process Experts An Alternative Infinite Mixture Of Gaussian Process Experts Edward Meeds and Simon Osindero Department of Computer Science University of Toronto Toronto, M5S 3G4 {ewm,osindero}@cs.toronto.edu Abstract

More information

Bayesian multiple testing procedures for hotspot identification

Bayesian multiple testing procedures for hotspot identification Accident Analysis and Prevention 39 (2007) 1192 1201 Bayesian multiple testing procedures for hotspot identification Luis F. Miranda-Moreno a,b,, Aurélie Labbe c,1, Liping Fu d,2 a Centre for Data and

More information

Bayesian Networks in Educational Assessment

Bayesian Networks in Educational Assessment Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University Roy.Levy@asu.edu 2017 Roy Levy MCMC 1 MCMC 2 Posterior

More information

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

A Fully Nonparametric Modeling Approach to. BNP Binary Regression A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation

More information

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks (9) Model selection and goodness-of-fit checks Objectives In this module we will study methods for model comparisons and checking for model adequacy For model comparisons there are a finite number of candidate

More information

The Bayesian Approach to Multi-equation Econometric Model Estimation

The Bayesian Approach to Multi-equation Econometric Model Estimation Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation

More information

Accident Prediction Models for Freeways

Accident Prediction Models for Freeways TRANSPORTATION RESEARCH RECORD 1401 55 Accident Prediction Models for Freeways BHAGWANT PERSAUD AND LESZEK DZBIK The modeling of freeway accidents continues to be of interest because of the frequency and

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures 17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

Gentle Introduction to Infinite Gaussian Mixture Modeling

Gentle Introduction to Infinite Gaussian Mixture Modeling Gentle Introduction to Infinite Gaussian Mixture Modeling with an application in neuroscience By Frank Wood Rasmussen, NIPS 1999 Neuroscience Application: Spike Sorting Important in neuroscience and for

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Petr Volf. Model for Difference of Two Series of Poisson-like Count Data

Petr Volf. Model for Difference of Two Series of Poisson-like Count Data Petr Volf Institute of Information Theory and Automation Academy of Sciences of the Czech Republic Pod vodárenskou věží 4, 182 8 Praha 8 e-mail: volf@utia.cas.cz Model for Difference of Two Series of Poisson-like

More information

A note on Reversible Jump Markov Chain Monte Carlo

A note on Reversible Jump Markov Chain Monte Carlo A note on Reversible Jump Markov Chain Monte Carlo Hedibert Freitas Lopes Graduate School of Business The University of Chicago 5807 South Woodlawn Avenue Chicago, Illinois 60637 February, 1st 2006 1 Introduction

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

A Latent Class Modeling Approach for Identifying Injury Severity Factors and Individuals at High Risk of Death at Highway-Railway Crossings

A Latent Class Modeling Approach for Identifying Injury Severity Factors and Individuals at High Risk of Death at Highway-Railway Crossings 1584 A Latent Class Modeling Approach for Identifying Injury Severity Factors and Individuals at High Risk of Death at Highway-Railway Crossings Naveen ELURU 1, Morteza BAGHERI 2, Luis F. MIRANDA-MORENO

More information

McGill University. Department of Epidemiology and Biostatistics. Bayesian Analysis for the Health Sciences. Course EPIB-675.

McGill University. Department of Epidemiology and Biostatistics. Bayesian Analysis for the Health Sciences. Course EPIB-675. McGill University Department of Epidemiology and Biostatistics Bayesian Analysis for the Health Sciences Course EPIB-675 Lawrence Joseph Bayesian Analysis for the Health Sciences EPIB-675 3 credits Instructor:

More information

Bayesian Model Diagnostics and Checking

Bayesian Model Diagnostics and Checking Earvin Balderama Quantitative Ecology Lab Department of Forestry and Environmental Resources North Carolina State University April 12, 2013 1 / 34 Introduction MCMCMC 2 / 34 Introduction MCMCMC Steps in

More information

Effects of the Varying Dispersion Parameter of Poisson-gamma models on the estimation of Confidence Intervals of Crash Prediction models

Effects of the Varying Dispersion Parameter of Poisson-gamma models on the estimation of Confidence Intervals of Crash Prediction models Effects of the Varying Dispersion Parameter of Poisson-gamma models on the estimation of Confidence Intervals of Crash Prediction models By Srinivas Reddy Geedipally Research Assistant Zachry Department

More information

FULL BAYESIAN POISSON-HIERARCHICAL MODELS FOR CRASH DATA ANALYSIS: INVESTIGATING THE IMPACT OF MODEL CHOICE ON SITE-SPECIFIC PREDICTIONS

FULL BAYESIAN POISSON-HIERARCHICAL MODELS FOR CRASH DATA ANALYSIS: INVESTIGATING THE IMPACT OF MODEL CHOICE ON SITE-SPECIFIC PREDICTIONS FULL BAYESIAN POISSON-HIERARCHICAL MODELS FOR CRASH DATA ANALYSIS: INVESTIGATING THE IMPACT OF MODEL CHOICE ON SITE-SPECIFIC PREDICTIONS A Dissertation by SEYED HADI KHAZRAEE KHOSHROOZI Submitted to the

More information

Freeway rear-end collision risk for Italian freeways. An extreme value theory approach

Freeway rear-end collision risk for Italian freeways. An extreme value theory approach XXII SIDT National Scientific Seminar Politecnico di Bari 14 15 SETTEMBRE 2017 Freeway rear-end collision risk for Italian freeways. An extreme value theory approach Gregorio Gecchele Federico Orsini University

More information

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:

More information

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Matthew S. Johnson New York ASA Chapter Workshop CUNY Graduate Center New York, NY hspace1in December 17, 2009 December

More information

Bayesian Nonparametric Rasch Modeling: Methods and Software

Bayesian Nonparametric Rasch Modeling: Methods and Software Bayesian Nonparametric Rasch Modeling: Methods and Software George Karabatsos University of Illinois-Chicago Keynote talk Friday May 2, 2014 (9:15-10am) Ohio River Valley Objective Measurement Seminar

More information

Measurement Error and Linear Regression of Astronomical Data. Brandon Kelly Penn State Summer School in Astrostatistics, June 2007

Measurement Error and Linear Regression of Astronomical Data. Brandon Kelly Penn State Summer School in Astrostatistics, June 2007 Measurement Error and Linear Regression of Astronomical Data Brandon Kelly Penn State Summer School in Astrostatistics, June 2007 Classical Regression Model Collect n data points, denote i th pair as (η

More information

Time-varying failure rate for system reliability analysis in large-scale railway risk assessment simulation

Time-varying failure rate for system reliability analysis in large-scale railway risk assessment simulation Time-varying failure rate for system reliability analysis in large-scale railway risk assessment simulation H. Zhang, E. Cutright & T. Giras Center of Rail Safety-Critical Excellence, University of Virginia,

More information

Slice Sampling Mixture Models

Slice Sampling Mixture Models Slice Sampling Mixture Models Maria Kalli, Jim E. Griffin & Stephen G. Walker Centre for Health Services Studies, University of Kent Institute of Mathematics, Statistics & Actuarial Science, University

More information

David B. Dahl. Department of Statistics, and Department of Biostatistics & Medical Informatics University of Wisconsin Madison

David B. Dahl. Department of Statistics, and Department of Biostatistics & Medical Informatics University of Wisconsin Madison AN IMPROVED MERGE-SPLIT SAMPLER FOR CONJUGATE DIRICHLET PROCESS MIXTURE MODELS David B. Dahl dbdahl@stat.wisc.edu Department of Statistics, and Department of Biostatistics & Medical Informatics University

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture

More information

A Nonparametric Model for Stationary Time Series

A Nonparametric Model for Stationary Time Series A Nonparametric Model for Stationary Time Series Isadora Antoniano-Villalobos Bocconi University, Milan, Italy. isadora.antoniano@unibocconi.it Stephen G. Walker University of Texas at Austin, USA. s.g.walker@math.utexas.edu

More information

Neutral Bayesian reference models for incidence rates of (rare) clinical events

Neutral Bayesian reference models for incidence rates of (rare) clinical events Neutral Bayesian reference models for incidence rates of (rare) clinical events Jouni Kerman Statistical Methodology, Novartis Pharma AG, Basel BAYES2012, May 10, Aachen Outline Motivation why reference

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Wrapped Gaussian processes: a short review and some new results

Wrapped Gaussian processes: a short review and some new results Wrapped Gaussian processes: a short review and some new results Giovanna Jona Lasinio 1, Gianluca Mastrantonio 2 and Alan Gelfand 3 1-Università Sapienza di Roma 2- Università RomaTRE 3- Duke University

More information

Spatial discrete hazards using Hierarchical Bayesian Modeling

Spatial discrete hazards using Hierarchical Bayesian Modeling Spatial discrete hazards using Hierarchical Bayesian Modeling Mathias Graf ETH Zurich, Institute for Structural Engineering, Group Risk & Safety 1 Papers -Maes, M.A., Dann M., Sarkar S., and Midtgaard,

More information

Bayesian Sparse Correlated Factor Analysis

Bayesian Sparse Correlated Factor Analysis Bayesian Sparse Correlated Factor Analysis 1 Abstract In this paper, we propose a new sparse correlated factor model under a Bayesian framework that intended to model transcription factor regulation in

More information

Lecture 3a: Dirichlet processes

Lecture 3a: Dirichlet processes Lecture 3a: Dirichlet processes Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced Topics

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Bayesian Nonparametrics

Bayesian Nonparametrics Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent

More information

DEVELOPMENT OF CRASH PREDICTION MODEL USING MULTIPLE REGRESSION ANALYSIS Harshit Gupta 1, Dr. Siddhartha Rokade 2 1

DEVELOPMENT OF CRASH PREDICTION MODEL USING MULTIPLE REGRESSION ANALYSIS Harshit Gupta 1, Dr. Siddhartha Rokade 2 1 DEVELOPMENT OF CRASH PREDICTION MODEL USING MULTIPLE REGRESSION ANALYSIS Harshit Gupta 1, Dr. Siddhartha Rokade 2 1 PG Student, 2 Assistant Professor, Department of Civil Engineering, Maulana Azad National

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Indirect Clinical Evidence of Driver Inattention as a Cause of Crashes

Indirect Clinical Evidence of Driver Inattention as a Cause of Crashes University of Iowa Iowa Research Online Driving Assessment Conference 2007 Driving Assessment Conference Jul 10th, 12:00 AM Indirect Clinical Evidence of Driver Inattention as a Cause of Crashes Gary A.

More information

Risk-Based Model for Identifying Highway Rail Grade Crossing Blackspots

Risk-Based Model for Identifying Highway Rail Grade Crossing Blackspots RiskBased Model for Identifying Highway Rail Grade Crossing Blackspots Frank F. Saccomanno, Liping Fu, and Luis F. MirandaMoreno A riskbased model is presented for identifying highway rail grade crossing

More information

Quantile POD for Hit-Miss Data

Quantile POD for Hit-Miss Data Quantile POD for Hit-Miss Data Yew-Meng Koh a and William Q. Meeker a a Center for Nondestructive Evaluation, Department of Statistics, Iowa State niversity, Ames, Iowa 50010 Abstract. Probability of detection

More information

A Nonparametric Approach Using Dirichlet Process for Hierarchical Generalized Linear Mixed Models

A Nonparametric Approach Using Dirichlet Process for Hierarchical Generalized Linear Mixed Models Journal of Data Science 8(2010), 43-59 A Nonparametric Approach Using Dirichlet Process for Hierarchical Generalized Linear Mixed Models Jing Wang Louisiana State University Abstract: In this paper, we

More information

Downloaded from:

Downloaded from: Camacho, A; Kucharski, AJ; Funk, S; Breman, J; Piot, P; Edmunds, WJ (2014) Potential for large outbreaks of Ebola virus disease. Epidemics, 9. pp. 70-8. ISSN 1755-4365 DOI: https://doi.org/10.1016/j.epidem.2014.09.003

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

Hot Spot Identification using frequency of distinct crash types rather than total crashes

Hot Spot Identification using frequency of distinct crash types rather than total crashes Australasian Transport Research Forum 010 Proceedings 9 September 1 October 010, Canberra, Australia Publication website: http://www.patrec.org/atrf.aspx Hot Spot Identification using frequency of distinct

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

SUPPLEMENT TO MARKET ENTRY COSTS, PRODUCER HETEROGENEITY, AND EXPORT DYNAMICS (Econometrica, Vol. 75, No. 3, May 2007, )

SUPPLEMENT TO MARKET ENTRY COSTS, PRODUCER HETEROGENEITY, AND EXPORT DYNAMICS (Econometrica, Vol. 75, No. 3, May 2007, ) Econometrica Supplementary Material SUPPLEMENT TO MARKET ENTRY COSTS, PRODUCER HETEROGENEITY, AND EXPORT DYNAMICS (Econometrica, Vol. 75, No. 3, May 2007, 653 710) BY SANGHAMITRA DAS, MARK ROBERTS, AND

More information

Simultaneous inference for multiple testing and clustering via a Dirichlet process mixture model

Simultaneous inference for multiple testing and clustering via a Dirichlet process mixture model Simultaneous inference for multiple testing and clustering via a Dirichlet process mixture model David B Dahl 1, Qianxing Mo 2 and Marina Vannucci 3 1 Texas A&M University, US 2 Memorial Sloan-Kettering

More information

Evaluation of Road Safety in Portugal: A Case Study Analysis. Instituto Superior Técnico

Evaluation of Road Safety in Portugal: A Case Study Analysis. Instituto Superior Técnico Evaluation of Road Safety in Portugal: A Case Study Analysis Ana Fernandes José Neves Instituto Superior Técnico OUTLINE Objectives Methodology Results Road environments Expected number of road accidents

More information