BAYESIAN TRAVEL TIME RELIABILITY MODELS

Size: px

Start display at page:

Download "BAYESIAN TRAVEL TIME RELIABILITY MODELS"

John Wheeler
5 years ago
Views:

1 BAYESIAN TRAVEL TIME RELIABILITY MODELS Morgan State University The Pennsylvania State University University of Maryland University of Virginia Virginia Polytechnic Institute & State University West Virginia University The Pennsylvania State University The Thomas D. Larson Pennsylvania Transportation Institute Transportation Research Building University Park, PA Phone: Fax:

2 Bayesian Travel Time Reliability Models By Feng Guo, Dengfeng Zhang, and Hesham Rakha Mid-Atlantic Universities Transportation Center Final Report Virginia Tech Transportation Institute, Department of Statistics, Virginia Polytechnic Institute and State University DISCLAIMER The contents of this report reflect the views of the authors, who are responsible for the facts and the accuracy of the information presented herein. This document is disseminated under the sponsorship of the U.S. Department of Transportation s University Transportation Centers Program, in the interest of information exchange. The U.S. Government assumes no liability for the contents or use thereof. June 30, 2015

3 1. Report No. 2. Government Accession No. 3. Recipient s Catalog No. 4. Title and Subtitle Bayesian Travel Time Reliability Models 7. Author(s) Feng Guo, Dengfeng Zhang, and Hesham Rakha 9. Performing Organization Name and Address Virginia Tech Transportation Institute 3500 Transportation Research Plaza Blacksburg, VA Sponsoring Agency Name and Address US Department of Transportation Research & Innovative Technology Admin UTC Program, RDT New Jersey Ave., SE Washington, DC Supplementary Notes 5. Report Date June 30, 2015, 6. Performing Organization Code Virginia Tech 8. Performing Organization Report No. 10. Work Unit No. (TRAIS) 11. Contract or Grant No. 13. Type of Report and Period Final Report, 6/2012-6/ Sponsoring Agency Code 16. Abstract Travel time reliability is a stochastic process affected by multiple factors, with traffic volume being the most important one. This study built up and advanced the multi-state models by proposing regressions on the proportions and distribution parameters for underlying traffic states. The Bayesian analysis provides valid credible intervals for each parameter without asymptotic assumption. Two alternative approaches were proposed and evaluated. The first approach is a Bayesian multi-state travel time regression model which provides a regression for key model parameters to traffic volume; the second approach is a hidden Markov regression which not only provides a link between key model parameters and traffic volume, but also incorporates the dependency structure among traffic volume in adjacent time windows. Both approaches provide advanced methodology for modeling traffic time reliability under complex stochastic scenarios. t 17. Key Words Traffic simulation, traffic modeling, driver behavior, car following 19. Security Classif. (of this report) 20. Security Classif. (of this page) 21. No. of Pages 22. Price 47

4 ABSTRACT Travel time reliability is a stochastic process affected by multiple factors, with traffic volume being the most important one. This study built up and advanced the multi-state models by proposing regressions on the proportions and distribution parameters for underlying traffic states. The Bayesian analysis provides valid credible intervals for each parameter without asymptotic assumption. Two alternative approaches were proposed and evaluated. The first approach is a Bayesian multi-state travel time regression model which provides a regression for key model parameters to traffic volume; the second approach is a hidden Markov regression which not only provides a link between key model parameters and traffic volume, but also incorporates the dependency structure among traffic volume in adjacent time windows. Both approaches provide advanced methodology for modeling traffic time reliability under complex stochastic scenarios.

5 Contents 1 Introduction 1 2 Travel Time Reliability: The Bayesian Multi-state Travel Time Regression Model Introduction and model specification Model Fitting using Markov Chain Monte Carlo Algorithm Model Model Simulation Study Model comparison Simulation Evaluation for Model Robustness of Misspecified θ s Model Application to Field-collected Data Summary Travel Time Reliability: Hidden Markov Model Introduction Autocorrelation Theoretical Background Model Specification Model Estimation Bootstrap and Confidence Interval Determine the Number of Components Goodness of Fit Prediction Simulation Study No Covariate With Covariate Application for Field-Collected Data Summary Summary 54 ii

6 List of Figures 2.1 Illustration of Data Collection Average Traffic Volume by Hour of a Day Probability of Congested State versus Traffic Volume in Simulation Studies Model 1 vs. Model 2: Coverage Probabilities Comparison in Five Settings Model 2: Coverage Probabilities Comparison Misspecified and True Model Comparison Theoretical, Misspecified and True Model Comparison Parameters Estimates under Different θ ss Probability in Congested State and Traffic Volume: Real Data Autocorrelation Comparison Box-Cox Transformation Hidden Markov Model: An Illustration Illustration of Two States Markov Chain Hidden Markov Model: Flow Chart Confidence Interval by Profile Likelihood Hidden Markov Model: Estimation HMM vs. Traditional HMM vs. Traditional HMM vs. Traditional HMM vs. Traditional % C.I. of HMM Illustration of Low Sampling Rate Illustration of Potential Improvement Histogram of the Log Likelihood Ratio χ 2 and Empirical Distributions Illustration of Three States Markov Chain Residual Check iii

7 List of Tables 2.1 Variance of Priors Models 1 and 2: Average of Posterior Means Comparison Models 1 and 2: Coverage Probabilities Comparison Model 2 between Settings 2 and 3: Coverage Probabilities Comparison More Results of Model 2: Coverage Probabilities Misspecified Models: Average of Posterior Means Comparison Misspecified Models: Coverage Probabilities Comparison Results from Real Data with Different θ ss HMM vs. Traditional: No Covariate HMM vs. Traditional: No Covariate Parameter Estimation of HMM Kolmogorov-Smirnov Test Result Parameter Estimation for Real Data iv

8 Chapter 1 Introduction The objective of this study is to develop Bayesian multi-state travel time reliability models for evaluating travel time uncertainty under various traffic conditions. The reliability of travel time is a key performance index of transportation system and has been a major transportation research area. Reliability is one of the four key focus areas of the Second Strategic Highway Research Plan (SHRP2). The Federal Highway Administration (FHWA) defines travel time reliability as consistency or dependability in travel times, as measured from day-to-day or across different times of day. Understanding the nature of travel time reliability will help individual travelers for trip planing and trip decision making, as well as facilitating transportation management agencies to improve the efficiency of transportation system. Travel time is affected by multiple factors such as traffic condition, weather, and incidents. Many of these factors are random in nature and stochastic models should be used to quantify the uncertainty associated with travel time. Traditionally, uni-mode distributions have been adopted for travel time reliability modeling and the log-normal distribution has been the most popular model (Emam and Ai-Deek 2006, Tu et al. 2008). A number of candidate distributions have been discussed and compared: lognormal, gamma, Weibull and exponential distributions. However, these approaches could not accommodate the high level of heterogeneity commonly presented in the travel time data. Thus the single-mode distributions usually yield poor model fitting under complex travel conditions, especially during peak hours of a day (Guo et al. 2010). Compared to single-mode distributions, mixture distributions can accommodate data with multiple modes and are flexible in modeling data generated from complex systems (Fowlkes 1979).The multi-state travel time reliability model has been demonstrated to provide superior data fitting, scientifically sound interpretation, as well as close relationship with the underline traffic flow characteristics (Guo et al. 2012, Park et al. (2010)). The advantages of mixture normal and mixture lognormal have been demonstrated in the application of fieldcollected data (Guo et al. 2012). This study is based on the multi-state travel time model 1

9 2 framework. One of the most attractive features of the multi-state model is its capability to associate travel time distribution with underlying traffic conditions. Park et al. (2010) showed that travel time states are related to the fundamental diagram, i.e., traffic flow, speed, and density. Two levels of uncertainty can be quantitatively assessed: the probability of a given traffic condition, for example, congested or free-flow and the variation of travel time within each traffic condition. Besides the free-flow and congested states, the model can also accommodate delay caused by traffic incidents (Park et al. (2011)). However, one of the most important factors affecting travel time, the traffic volume, has yet to be incorporated into the multi-state model. This study advanced the previous methods using two alternative approaches to incorporate the influence of traffic volume: Bayesian mixed-effect travel time regression model and hidden Markov model. The traffic volume, defined as the number of vehicles traveling through a specific segment of the road within a specific time period, plays an essential role in the present research. The study extended the multi-state travel time model by incorporating the effects from traffic volume. The proposed models were applied to field data collected along a section of the I-35 freeway in San Antonio, Texas (hereinafter I-35 data). The study covers a sixteen kilometer section with an average daily traffic volume around 150,000 vehicles. The travel time was collected when vehicles tagged by a radio frequency device passed the identification stations on New Braunfels Ave. (Station no. 42) and O Connor Rd. (Station no. 49). We set the time period to summarize the traffic volume as one hour, and collected the traffic volume from [0:00, 0:59] to [23:00, 23:59] for more than 20 weekdays. We proposed several Bayesian multi-state regression models to incorporate traffic volume into the estimation of probability encountering congested traffic state. The model was fitted using the Markov Chain Monte Carlo (MCMC) algorithms, which enable us to obtain the posterior distribution of models parameters as well as the uncertainty of estimation (Lenk and DeSarbo 2000). As adopted the probit link function, which is more convenient in the Bayesian context compared to logit function because the corresponding Gibbs sampler is easier to implement (Geweke and Keane 1997). The Bayesian multi-state regression models discussed above is based on the assumption that all of the observations are conditionally independent. The independence assumption is typically not satisfied for travel time among periods close to each other. We proposed a hidden Markov model to incorporate the dependency structure among travel time data collected in adjacent time units (Baum and Petrie 1966). The Hidden Markov model can be seen as a mixture model which relaxes the independence assumption (Qi et al. 2007). It is able to incorporate the dependency structure of observations, and also include the traditional mixture model as a special case (Scott 2002). Hidden Markov models have bee applied in a wide variety of applications, including speech

10 recognition (Rabiner 1989), biometrics (Albert 1991), econometrics (Hamilton 1989), and computational biology (Krogh et al. 1994). We developed hidden Markov models for travel time reliability evaluation. The proposed model incorporate the impact of traffic volume in the transition matrix of the Markov process. The results show that the hidden Markov model outperforms traditional mixture models. 3

11 Chapter 2 Travel Time Reliability: The Bayesian Multi-state Travel Time Regression Model 2.1 Introduction and model specification Travel time of vehicles contains substantial variability. The Federal Highway Administration has formally defined Travel time reliability as consistency or dependability in travel times, as measured from day-to-day or across different times of day. Understanding the nature of travel time reliability will help individual travelers for trip planing and trip decision making, as well as facilitating transportation management agencies to improve the efficiency of transportation system. The multi-state model has been developed for modeling travel time reliability, and one of the most attractive features is its capability to associate travel time with underlying traffic conditions. In the Gaussian mixture model, the travel time variable, Y, is assumed to follow a two-component mixture distribution with density function: f(y λ, µ 1, µ 2, σ 2 1, σ 2 2) = λf N (y µ 1, σ 2 1) + (1 λ)f N (y µ 2, σ 2 2) where f N represents the density function of a normal distribution with mean µ i and variance σ 2 i. Without loss of generality, we assume that µ 1 < µ 2. Under this condition, µ 1 and µ 2 indicate the mean travel time under free-flow state and congested state. Subsequently, λ and 1 λ are the probability of free-flow state and congested state. σ 2 1 and σ 2 2 represent the variance of travel time under the free-flow state and the congested state. The probability of the free-flow state, denoted by λ, has support in (0, 1). To link λ with 4

12 5 traffic volume x, a common approach is to use the logit link function: λ log( 1 λ ) = β 0 + β 1 x, or more general, λ log( 1 λ ) = Xβ, where covariates matrix X contains 1 s as the first column, and β is a vector of regression coefficients. The traffic volume is defined as the number of vehicles traveling through a specific segment of road within a given time period. An alternative for the logit link function is the probit link function, which is the inverse of standard normal cumulative distribution function: Φ 1 (1 λ) = Xβ For Bayesian models, the probit function is preferred due to its ease in Markov Chain Monte Carlo simulation to generate the posterior distribution. In the probit model, a latent variable w i R is introduced for each observation to indicate which group the observation belongs to: { Group1 if w y i i < 0 Group 2 otherwise Assume the latent variable w i N(X i β, 1), where X i is the i th row in matrix X. It can be shown that: λ = 1 Φ(Xβ) = P (w < 0 µ = Xβ, σ 2 = 1) This setting establishes the relationship between the proportion of two latent groups and the covariate(s). The likelihood function is correspondingly f N (y µ 1, σ 2 1) I(w<0) f N (y µ 2, σ 2 2) I(w 0). As shown by Guo et al. (2012), the variability in the mean travel speed in the congested state, µ 2, can be substantial. From an engineering perspective, there exist certain relationships between µ 2 and the traffic volume x i. Two alternative models were proposed to relate µ 2 with traffic volume: (1)µ 2i = θ 0 + θ 1 x i = X i θ (2)µ 2i = θ s µ 1 + θ x i The first model assumes that µ 1 and µ 2 are estimated independently. The second model assumes that the intercept is proportional to µ 1 with a predetermined scale parameter θ s. With proper selection of θ s, the second model can ensure that the estimated main travel time for the free flow and congested condition are sufficiently separated. Following the convention of the Bayesian approach, we use the precision parameter ψ j to denote the inverse of the variance of the two components (i.e. 1/σ 2 j, j=1,2). Two levels of uncertainty are quantitatively assessed in the proposed model. The first level of uncertainty is the probability of a given traffic condition, for example, congested or free-flow; the second level of uncertainty is the variation of travel time for each traffic condition.

13 6 To complete the Bayesian model setup, the following non-informative priors are adopted according to Yang and Berger (1996): π(µ 1 ) 1, π(β 0 ) 1, π(β 1 ) 1, π(θ 0 ) 1, π(θ 1 ) 1, π(ψ 1 ) 1/ψ 1, π(ψ 2 ) 1/ψ 2 It is desirable that a Bayesian model not be sensitive to the choice of prior distributions. Several alternative priors, such as the normal distribution with different variance, are tested as shown in (Table 2.1). As can be seen, the results are not significantly influenced and they are quite similar to that from non-informative priors, most likely due to the large sample size. Therefore, non-informative priors are used in the model. Table 2.1: Variance of Priors σβ σθ Model Fitting using Markov Chain Monte Carlo Algorithm The conclusions of the study are based on the posterior distribution of the parameters as shown below: f(µ 1, ψ, β, θ, w X, y) f(y µ 1, ψ, β, θ, w, X)f(µ 1, ψ, β, θ, w X) f(y µ 1, ψ, w, θ)f(w X, β)f(µ 1, ψ, β, θ X) f(y µ 1, ψ, w, θ)f(w X, β)π(µ 1 )π(ψ 1 )π(ψ 2 )π(β)π(θ), were f(y µ 1, ψ, w, θ) is the density function of multi-state normal distribution: f N (y µ 1, 1/ψ 1 ) I(w<0) f N (y Xθ, 1/ψ 2 ) I(w 0), and f(w X, β) is the multivariate normal with mean Xβ and covariance matrix I. Since there is no closed form solution for the above posterior distribution, simulation-based Markov Chain Monte Carlo algorithm is used to estimate the posterior distribution. The MCMC algorithm samples posterior distribution from full condition distribution for each parameter. The conditional distributions are developed in the following subsection Model 1 The full conditional distribution for each parameter in Model 1 is shown below.

14 7 1. The full conditional for w: f(w...) n (f N (y i µ 1, 1/ψ 1 )I(w i < 0) + f N (y i X i θ, 1/ψ 2 )I(w i 0))f N (w i X i β, 1) i=1 This is the multi-state truncated normal. Define a = f N (y i µ 1, 1/ψ 1 ), b = f N (y i X i θ, 1/ψ 2 ), then with probability a, w a+b i is sampled from f N (w i X i β, 1) truncated at w i < 0; with probability b, w a+b i is sampled from f N (w i X i β, 1) truncated at w i The full conditional for µ 1 : f(µ 1...) i:w i <0 N( n (f N (y i µ 1, 1/ψ 1 )I(w i < 0) + f N (y i X i θ, 1/ψ 2 )I(w i 0)) i=1 f N (y i µ 1, 1/ψ 1 ) i:w i <0 y i 1, ) n 1 n 1 ψ 1 This is a univariate normal distribution. n 1 is the number of w is that are smaller than 0. Corresponding to the model assumption µ 1 < µ 2, we will right truncate this distribution at min(x i θ). 3. The full conditional for ψ 1 : f(ψ 1...) ψ 1 1 n f N (y i µ 1, 1/ψ 1 ) I(wi<0) f N (y i X i θ, 1/ψ 2 ) I(w i 0) i=1 ψ n exp( 1 2 ψ 1 (y i µ 1 ) 2 ) i:w i <0 This is the Gamma distribution with shape parameter n 1 2 and rate parameter 1 2 µ 1 ) The full conditional for ψ 2 : f(ψ 2...) ψ 1 2 n f N (y i µ 1, 1/ψ 1 ) I(wi<0) f N (y i X i θ, 1/ψ 2 ) I(w i 0) i=1 ψ n exp( 1 2 ψ 2 (y i X i θ) 2 ) i:w i 0 i:w i <0 (y i where n 2 is the number of w is that are greater than or equal to 0. This is the Gamma distribution with shape parameter n 2 2 and rate parameter 1 2 i:w i 0 (y i X i θ) 2.

15 8 5. The full conditional for β: f(β...) n f(w i X i, β) i=1 n i=1 exp( (w i X i β) 2 ) 2 This is the bivariate normal distribution with mean (X T X) 1 X T w and covariance matrix (X T X) The full conditional for θ: n f(θ...) f N (y i µ 1, 1/ψ 1 ) I(wi<0) f N (y i X i θ, 1/ψ 2 ) I(w i 0) i:w i 0 i=1 f N (y i X i θ, 1/ψ 2 ) Define: Σ + is the n 2 n 2 diagonal matrix with the diagonal elements being 1/ψ 2 s. X + is the submatrix of X such columns i that w i 0 y + is the subvector of y such elements i that w i 0, then: f(θ...) Σ exp( 2 (y + X + θ) T Σ 1 + (y + X + θ)) N((X+Σ T 1 + X + ) 1 X+Σ T 1 + y +, (X+Σ T 1 + X + ) 1 ) This is the bivariate normal with mean (X T +Σ 1 X + ) 1 X T +Σ 1 y + and covariance matrix (X T +Σ 1 X + ) Model 2 Compared to model 1, this model has one fewer parameter and the full conditional distributions have been changed accordingly. 1. The full conditional for w: n f(w...) (f N (y i µ 1, 1/ψ 1 )I(w i < 0) + f N (y i θ s µ 1 + θ x i, 1/ψ 2 )I(w i 0))f N (w i X i β, 1) i=1 2. The full conditional for µ 1 : n f(µ 1...) f N (y i µ 1, 1/ψ 1 )I(w i < 0) + f N (y i θ s µ 1 + θ x i, 1/ψ 2 ) N( ψ 1 i=1 i:w i <0 y i + θ s ψ 2 i:w i 0 (y i θ x i ) 1, ) n 1 ψ 1 + θ s n 2 ψ 2 n 1 ψ 1 + θ s n 2 ψ 2

16 9 This is still a univariate normal distribution, but the parameters are different compared with model The full conditional for ψ 1 : f(ψ 1...) ψ 1 1 n f N (y i µ 1, 1/ψ 1 ) I(wi<0) f N (y i θ s µ 1 + θ x i, 1/ψ 2 ) I(w i 0) i=1 ψ n exp( 1 2 ψ 1 4. The full conditional for ψ 2 : f(ψ 2...) ψ 1 2 (y i µ 1 ) 2 ) i:w i <0 n f N (y i µ 1, 1/ψ 1 ) I(wi<0) f N (y i θ s µ 1 + θ x i, 1/ψ 2 ) I(w i 0) i=1 ψ n exp( 1 2 ψ 2 5. The full conditional for β: (y i θ s µ 1 θ x i ) 2 ) i:w i 0 f(β...) n f(w i X i, β) i=1 n i=1 exp( (w i X i β) 2 ) 2 This is the bivariate normal distribution with mean (X T X) 1 X T w and covariance matrix (X T X) The full conditional for θ: n f(θ...) f N (y i µ 1, 1/ψ 1 )I(w i < 0) + f N (y i θ s µ 1 + θ x i, 1/ψ 2 ) i:w i 0 i=1 f N (y i θ s µ 1 + θ x i, 1/ψ 2 ) i:w N( i 0 (y i θ s µ 1 )x i, i:w i 0 x2 i 2.3 Simulation Study 1 ψ 2 i:w i 0 x2 i We conducted a simulation study to examine the proposed models based on the data set collected on Interstate I-35 near San Antonio, Texas (Guo et al. 2010). The study corridor )

17 10 covered a 16 kilometer section with an average daily traffic volume around 150,000 vehicles. The travel time was collected when vehicles tagged using a radio frequency device passed the automatic vehicle identification stations on New Braunfels Avenue (Station no. 42) and OConnor Road (station no. 49). Figure 2.1 illustrates data collection sites setup. Figure 2.1: Illustration of Data Collection The traffic volume is the number of vehicles traveling through the road segment during a specific time period. The analysis unit is set to one hour, and the hourly traffic volume from [0:00, 0:59] to [23:00, 23:59] is calculated. Although hourly traffic volume is adopted in this analysis, in theory any time unit can be used if there are sufficient data within the time unit. The data set contains 237 distinct hours of observations. We average the traffic volume by the hours of a day. Figure 2.2 illustrates the range of average traffic volume by hour. Figure 2.2: Average Traffic Volume by Hour of a Day In the original data only vehicles equipped with electronic tags were counted. These vehicles account for a proportion of the total traffic. In order to estimate the real traffic volume, we simulated new data sets according to the shape of the original data and extended it by a

18 11 scale: Y ij : Simulated traffic volume of hour i in day j. Y ij = [c µ i + ɛ ij ] +, ɛ ij N(0, d 2 ) X ij : Original traffic volume of hour i in day j. i=0...23, j= or 11 k µ i : Average traffic volume of hour i. µ i = X ik Numbe of days Based on historical data and engineering judgment, we selected d=100 and c=50. The traffic volume and the travel time data sets were generated according to the following procedure. For a given model and a set of predetermined parameters, the simulation study is conducted as follows: 1. Set n=number of simulations we plan to run. 2. For (i in 1:n){ Generate data Do{ Markov Chain Monte Carlo }While convergence Record if the 95% credible intervals cover the true values } For each round simulation, we ran more than 5,000 MCMC iterations and ensured convergence of the MCMC chains. The inference statistics, such as posterior mean, 95% credible interval, and coverage probabilities are calculated. The analysis focuses on the comparison between Model 1 and Model 2, model performance, and robustness under misspecified parameters Model comparison The main difference between Model 1 and Model 2 is that Model 2 provides a mechanism to control the difference between free-flow mean travel time µ 1 and the baseline of congested mean travel time µ 2 via the parameter θ s. The θ s could be determined via engineering expertise and preliminary data analysis. It is tempting to set θ s = 1, which corresponding to the scenario that intercept for congested state equals to the free flow travel time. However, initial analyses indicate that the model identifiability issue rises when θ s is too close to 1. The identifiability issue is caused by the fact that when θ s is small, the mean travel time of the two component distribution, i.e., free flow and congested flow are too close to each other. To provide sufficient separation between the two component distributions, we set the minimum θ s values at θ s = 1.2 in the simulation study The values of µ 1, ψ 1, ψ 2, β 0 and β 1 are set according to historical data. Figure 2.3 shows the relationship between probability in the congested state and the traffic volume.

19 12 Figure 2.3: Probability of Congested State versus Traffic Volume in Simulation Studies Five different settings of θ 0 (θ s ) and θ 1 are evaluated. The results are summarized in Table 2.2. As can be seen, the point estimates of the parameters generally are very close to the true values. Model 2 seems to be slightly better than Model 1, but the difference is minimal. One key criterion for evaluating model performance is the coverage probability. Ideally the coverage probability of the posterior credible intervals should cover the true model parameter at the nominal significant level, i.e., for 1,000 simulation, 95 % of times (950 out of 1000) the resulting 95% posterior credible intervals should cover the true model parameter. As can be seen in Table 2.3 and Figure 2.4, the coverage probability for parameter µ 1 is close to 0.95 in most cases. However, the coverage probability for β 0, β 1 and θ 1 could be quite off, especially when the two component distributions are similar to each other. Overall performance of Model 2 is better than Model 1, especially when the two component distributions are similar to each other.

20 13 Table 2.2: Models 1 and 2: Average of Posterior Means Comparison µ 1 ψ 1 ψ 2 β 0 β 1 θ 0 (θ s ) θ 1 Setting e (1.2) 0.6 Model e Model e Setting e (1.3) 0.6 Model e Model e Setting e (1.3) 0.3 Model e Model e Setting e (1.4) 0.3 Model e Model e Setting e (1.5) 0.3 Model e Model e Table 2.3: Models 1 and 2: Coverage Probabilities Comparison µ 1 ψ 1 ψ 2 β 0 β 1 θ 0 (θ s ) θ 1 Setting e (1.2) 0.6 Model Model Setting e (1.3) 0.6 Model Model Setting e (1.3) 0.3 Model Model Setting e (1.4) 0.3 Model Model Setting e (1.5) 0.3 Model Model

21 14 Figure 2.4: Model 1 vs. Model 2: Coverage Probabilities Comparison in Five Settings Figure 2.4 shows additional investigation for Model 2 under Settings 2 and 3, in which the value of θ s is set to 1.3. Since the value of θ 1 plays an important role in the coverage probability, we selected two more points between 0.3 and 0.6. In general, when θ 1 increases, the coverage probabilities are closer to the target 95% rate. Table 2.4: Model 2 between Settings 2 and 3: Coverage Probabilities Comparison Value of θ 1 µ 1 ψ 1 ψ 2 β 0 β 1 θ Simulation Evaluation for Model 2 To evaluate the robustness of Model 2, we conducted a more complicated simulation study based on the combinations of various parameters. The following parameters are fixed among all simulation setups: µ 1 = 500, ψ 1 = 0.01, β 0 = 3, ψ 2 =

22 15 For the parameters of interest, we tested multiple levels for each parameter. β 1 {0.004, , 0.005} θ {0.3, 0.6} θ s {1.2, 1.3, 1.4, 1.5} β 1 represents the relationship between proportion of congested state and the traffic volume. θ indicates the relationship between travel time under congested state and the traffic volume. There are 24 different settings for simulation. The outputs are summarized in Table 2.5. The results show that the coverage probabilities are generally good when the two components are well separated. The model performance can be more clearly identified in Figure 2.5. The dashed line denotes the 95% significant level for reference. As can be seen, the larger θ and θ s are, the higher the coverage probabilities will be. Table 2.5: More Results of Model 2: Coverage Probabilities ID β 1 θ θ s Cov of µ 1 Cov of ψ 1 Cov of ψ 2 Cov of β 0 Cov of β 1 Cov of θ

23 16 Figure 2.5: Model 2: Coverage Probabilities Comparison Robustness of Misspecified θ s The parameter θ s is one of the key parameters of Model 2 and is predetermined instead of estimated from data. The true value of θ s is typically unknown. Therefore, the robustness of the model with respect to misspecification is critical for applications. For example, will the results change substantially if the true value of θ s is 1.2 but was misspecified based on θ s = 1.3. We evaluated this issue with four different settings. The results of the simulation are shown in Table 2.6. As can be seen, the point estimates of the parameters are generally stable. In the case of misspecified θ s, the model would generally overestimate the mean of component 2. From Table 2.7, it can be concluded that when the two components are not well separated (i.e., θ s and θ are small), the misspecified models can

24 17 sometimes have better coverage of the regression coefficients for the mixture proportion (ψ 1, β 0 and β 1 ). Table 2.6: Misspecified Models: Average of Posterior Means Comparison µ 1 ψ 1 ψ 2 β 0 β 1 θ s θ 1 µ 2 Setting e True Model e Misspecified e Misspecified e Setting e True Model e Misspecified e Setting e True Model e Misspecified e Misspecified e Setting e True Model e Misspecified e Note: The µ 2 is estimated by θ s µ 1 + θ 1 X Table 2.7: Misspecified Models: Coverage Probabilities Comparison µ 1 ψ 1 ψ 2 β 0 β 1 θ s θ 1 Setting e True Model Misspecified Misspecified Setting e True Model Misspecified Setting e True Model Misspecified Misspecified Setting e True Model Misspecified Figure 2.6 shows the coverage probabilities comparison between true and misspecified models in four different settings. The dashed line is used to denote 95% for reference. It can be

25 18 observed that the true models are superior in estimating θ 1 and ψ 2, while the misspecified models perform better in ψ 1, β 0 and β 1. Figure 2.6: Misspecified and True Model Comparison Although misspecified models are generally not good at estimating θ 1, it is the mean value of µ 2 : µ 2 = θ s θ 0 + θ 1 x that is the of ultimate interest. In order to evaluate the influence of misspecified θ s on µ 2, we evaluated the relationship between traffic volume and the corresponding µ 2 under theoretical result, true model estimate, and misspecified model estimate. Figure 2.7 shows that the misspecified model estimates are close to the theoretical results when the traffic volume is high, which directly links to the congested state. Therefore, the application of these models is still robust when θ s is misspecified.

26 19 Figure 2.7: Theoretical, Misspecified and True Model Comparison 2.4 Model Application to Field-collected Data We applied Model 2 to the field-collected data collected on I-35, near San Antonio, Texax, as introduced before. Models with different values of θ s were fitted. The results are shown in Table 2.8.

27 20 Table 2.8: Results from Real Data with Different θ ss Parameter θ s = 1 θ s = 1.1 θ s = 1.2 θ s = 1.3 θ s = 1.4 µ ψ ψ e e e e e-6 β β θ µ Note: The µ 2 is estimated by θ s µ 1 + θ X Figure 2.8 shows the relationship between θ s and other critical model parameters. Both the means and the standard deviations of the two components are quite stable with respect to the change of θ s.

28 21 Figure 2.8: Parameters Estimates under Different θ ss Figure 2.9 shows the relationship between probability in congested state and traffic volume under different settings of θ s. As can be seen, the difference is pronounced near traffic volume It should be noted that the traffic volume here is a subset of the actual traffic volume thus should be interpreted as an indicator of traffic condition.

29 22 Figure 2.9: Probability in Congested State and Traffic Volume: Real Data 2.5 Summary The multi-state model provides a flexible and efficient framework for modeling travel time reliability, especially under complex traffic conditions. Guo et al. (2012) illustrated that the multi-state model outperforms single-state models in congested or near-congested traffic

30 23 conditions and the advantage is substantial in high traffic volume conditions. The objective of this study is to quantitatively evaluate the influence of traffic volume on the mixture of two components. The study advances the multi-state models by proposing regressions on the proportions and distribution parameters for underlying traffic states. The Bayesian analysis also provides feasible credible intervals for each parameter without asymptotic assumption. Previous studies usually modeled the travel time independently without establishing the relationship between travel time and important transportation statistics such as traffic volume. The models developed can also be easily extended to include more covariates in either linear or nonlinear forms. The application results indicate that there is a negative relationship between the proportion of free-flow state and the traffic volume, which confirms the statement raised by Guo et al. (2012) that for low traffic volume conditions, there might only exist one travel time state and single-state models will be sufficient. The estimation for the congested state indicates that the travel time under such condition exhibits substantial variability and is positively related with traffic volume, which also verifies the phenomenon found by Guo et al. (2012). There are several potential extensions to the current research. Current research only includes lognormal and normal distributions. A number of other distributions, e.g., Gamma and extreme value distributions, can also be investigated. One of the assumptions for the existing Bayesian mixture model is that all the observations are independent, which could be relaxed in the Hidden Markov model.

31 Chapter 3 Travel Time Reliability: Hidden Markov Model 3.1 Introduction The Bayesian mixture regression model discussed in the previous chapter is based on the assumption that the observations are independent. In most cases, the travel time of vehicles were collected chronologically, thus the travel time in adjacent time periods is most likely to be corrected because of the continuity of traffic flow. Although it is possible to apply autocorrelated error terms to handle this problem (Cochrane and Orcutt 1949), the interpretation regarding this scenario is unclear. In order to accommodate the dependency structure of the data, we adopt a gentle methodology: the Hidden Markov Model (HMM). The basic concept of the hidden Markov model was introduced by Baum and Petrie (1966). It can be shown that the traditional mixture model as a special case of HMM (Scott 2002). Hidden Markov models are popular in a wide variety of applications including (Couvreur 1996), speech recognition (Rabiner 1989), biometrics (Albert 1991), econometrics (Hamilton 1989), computational biology (Krogh et al. 1994), fault detection (Smyth 1994) and many other areas. 3.2 Autocorrelation As an exploratory analysis, we treat the I-35 data as a time series and calculate autocorrelation among time windows separated by different lags. Autocorrelation is a measure of similarity between observations with certain time lags (Wiener 1930). 24

32 25 For a sequence {X t }, the autocorrelation is defined as: ACF (t, s) = E(X t µ t )(X s µ s ) σ t σ s By assuming that {X t } is second-order stationary (Wold 1938), the autocorrelation can be written as: ACF (s) = E(X t µ)(x t+s µ) σ 2 For an independent sequence, it is easy to see that ACF (s) should be small regardless the value of s. For a sequence as {X t : X t = 0.5 X t 1 + ɛ t }, the ACF (s) will be quite large when s=1 and decreases gradually as s increases. Figure 3.1 shows two plots with independent data and the field collected data. The x-axis is the time lag, while the y-axis is the ACF. The plot on the left shows the ACF of the independent sequence while the plot on the right is estimated from the I-35 data. It is clear that the observed travel time is not an independent data set. Figure 3.1: Autocorrelation Comparison A formal test of the autocorrelation is the Durbin Watson test (Durbin and Watson 1950). The Durbin Watson statistic is defined as: T t=2 d = (X t X t 1 ) 2 T t=1 X2 t

33 26 The value of d is between 0 and 4. If the Durbin Watson statistic is substantially less than 2, there might be positive correlation. On the other hand, there might be negative correlation. Under the normal assumption, the null distribution of the Durbin Watson statistic is a linear combination of chi-squared variables. To satisfy the normal assumption, we use Box-Cox transformation (Choongrak 1959): x. Figure 3.2 indicates that λ = 4 is the optimal choice. x λ 1 λ Figure 3.2: Box-Cox Transformation The Durbin Watson statistic from the (transformed) travel time is , which yields a p-value close to zero. This confirms the high positive autocorrelation among the travel time data. 3.3 Theoretical Background Model Specification A hidden Markov model consists of two sequences: the observed sequence {x t }, t = 1, 2,..., n and the latent state sequence {s t }, t = 1, 2,..., n. Given the s t, the distribution of observed

34 27 data x t is fully determined by the value of s t. For example, if we denote the travel time in seconds as {x t }, t = 1, 2,..., n, then the sequence s t could be defined as: { 1 if the road is under free-flow s t = 2 if the road is under congestion Figure 3.3: Hidden Markov Model: An Illustration The two values of s t represent the two travel time states, which correspond to the two components in a mixture distribution. Given the state, the observed data x t follows one of the distributions: { f(x Θ1 ), if s f(x t s t ) = t = 1 f(x Θ 2 ), if s t = 2 The form of the distribution f(x Θ) could be normal, Gamma, Poisson, multinomial, or others. For example, x t s t = 1 N(1000, ) and x t s t = 2 N(500, 30 2 ). The term Hidden indicates that {s t } is a latent sequence which cannot be observed. Secondly, the term Markov indicates an important property of {s t }: P (s t s t 1,..., s 1 ) = P (s t s t 1 ), t 2 Thus, {s t } is a Markov chain and has its transition probability matrix. sequence, the transition matrix is as follow: ( ) P11 P P = 12, P 21 P 22 For a two-state

35 28 where P ij is the probability that P (s t+1 = j s t = i). It can be shown that if {s t } is a trivial Markov Chain, i.e. i.i.d., then the hidden Markov model is equivalent to a traditional mixture model. Figure 3.4 is an illustration of the two-state Markov chain for traffic states. Figure 3.4: Illustration of Two States Markov Chain The basic properties of a Markov chain are listed below: Irreducible: It is possible to get to any state from any state; Aperiodic: A state i has period k (k > 2) if {n : p (n) ii > 0} = {k d : d 1}. If none of the states is periodic, the chain is aperiodic; Positive recurrent: A state i is positive recurrent if the expected time that state i returns to itself is finite. If every state in an irreducible chain is positive recurrent, there exists a unique stationary distribution π that satisfies: πp = π If an irreducible chain is positive recurrent and aperiodic, it is said to have a limiting distribution φ: lim n + p(n) ij = φ j, where p (n) ij = P (s i+n = j s i = i)

36 29 A limiting distribution, when it exists, is always a stationary distribution, but the converse is not true. The stationary or limiting distribution can be used to address the long-term behavior of a Markov chain. For example, suppose a hidden Markov model has such transition matrix: P = ( By solving: { π 1 = 0.8π π 2 π 1 + π 2 = 1 The solution for the above equations is π 1 = 1/3, π 2 = 2/3. Roughly speaking, 1/3 of the observations will be in state 1 while 2/3 of the observations will be in state 2 in a long-term run. This relates the hidden Markov model with the traditional mixture model. One of the primary interests for travel time uncertainty research is to evaluate the influence of traffic volume. In the HMM framework, we build regression models on transition probabilities using traffic volume data. Hereafter we use y to denote the observed data and x as the covariate. When the HMM has only two states, the transition matrix can be modeled in the style of logistic regression models. The transition probability matrix is a 2 2 matrix. Due to the constraints that P 11 + P 12 = P 21 + P 22 = 1, the matrix has two free parameters. Chung et al. (2007) discussed a similar model. For each row of the transition matrix, two logistic regression models with one covariate can be used: ), log( P 12 P 11 ) = β 0,1 + β 1,1 x log( P 22 P 21 ) = β 0,2 + β 1,2 x When the Markov chain has more than two states, a multinomial logistic regression model can be be applied. The first column is typically chosen as baseline. For example, the three states model is: log( P 12 ) = β 0,1 + β 1,1 x log( P 13 ) = β 0,2 + β 1,2 x P 11 P 11 log( P 22 P 21 ) = β 0,3 + β 1,3 x log( P 23 P 21 ) = β 0,4 + β 1,4 x log( P 32 P 31 ) = β 0,5 + β 1,5 x log( P 33 P 31 ) = β 0,6 + β 1,6 x Figure 3.5 illustrates the basic infrastructure of the HMM. Both the historical data (traffic volume and travel time) and the observed data (real-time traffic volume) can be used to do prediction.

37 30 Figure 3.5: Hidden Markov Model: Flow Chart Model Estimation There are several methods to estimate the parameters in the HMM. One of the popular methods is EM algorithm (Baum et al. 1970, Bilmes 1998). Alternatives include the Viterbi and Gradient algorithm (Rabiner 1989). Bayesian method (Jean-Luc and Chin-Hui 1991) has also been proposed. There are some existing software packages specifically for fitting the HMM in R (Visser and Speekenbrink 2010). Although a Bayesian approach to HMM analysis does show some advantages in complex models, Rydn (2008) claimed that the results are generally similar, and it is sufficient to use EM algorithm in most practical problems. If we define: L k (t) = P (s t = k X) H k,l (t) = P (s t = k, s t+1 = l X) The L k (t) is the conditional probability of being at state k at time t given the entire observed sequence X. The H k,l (t) is the conditional probability of being at state k at time t and being at state l at time t + 1 given the entire observed sequence X.

38 31 The initial probabilities of state k (k = 1,..., M) can be estimated by: P (s 1 = k) T L k (t), t=1 M P (s 1 = k) = 1 The EM algorithm can be described as follows (Li and Gray 2000): E step Compute L k (t) and H k,l (t) under current parameter values. M step k=1 µ k = T t=1 L k(t)x t T t=1 L k(t) Σ k : Covariance = T t=1 L k(t)(x t µ k )(x t µ k ) T T t=1 L k(t) P (s t+1 = l s t = k): Transition probability = T 1 t=1 H k,l(t) T 1 t=1 L k(t) The forward-backward algorithm can be used in the estimation (E) step,. Define: a k (x 1,..., x t ) = P (x 1,..., x t, s t = k) b k (x t+1,..., x T ) = P (x t+1,..., x T s t = k) The forward algorithm is: a k (x 1 ) = P (s 1 = k) f k (x 1 ) M a k (x 1,..., x t ) = f k (x t ) a i (x 1,..., x t 1 )p ik where f k is the probability density of component k, p ik is the transition probability. The backward algorithm is: b k (x T +1,..., x T ) = 1 (Arbitrary setting) M b k (x t+1,..., x T ) = p ki f i (x t+1 )b i (x t+2,..., x T ) i=1 i=1

39 32 Then L k (t) and H k,l (t) can be estimated as: L k (t) = a k(x 1,..., x t )b k (x T +1,..., x T ) M i=1 a i(x 1,..., x t )b i (x T +1,..., x T ) H k,l (t) = M i=1 a k (x 1,..., x t )p kl f l (x t+1 )b k (x t+1,..., x T ) M j=1 a i(x 1,..., x t )p ij f j (x t+1 )b k (x t+1,..., x T ) The function L k (t) can be used to estimate the state to which an observation belongs. However, the estimation is based on individual observation can cause unwanted issues. For example, it is possible that the result shows that s t = 1 and s t+1 = 2; however, p 12 = 0, which makes the entire sequence meaningless. The Viterbi algorithm (Viterbi 1967) might be applied to obtain the sequence with largest posterior probability Bootstrap and Confidence Interval Confidence interval is the focus of classical statistical inference. Visser et al. (2000) proposed several ways to obtain the confidence interval of hidden Markov models: finite approximation of Hessian, profile likelihood, and bootstrap. He claimed that the results from first one are usually too narrow. Therefore, we evaluated the other two methods. The profile likelihood method is based on profile likelihood ratio and χ 2 distribution. The basic idea is to evaluate the change of the log-likelihood caused by a single parameter by treating all the other parameters as nuisance (Meeker and Escobar 1995). A profile likelihood for parameter β is defined as the likelihood function that all the other parameters are fixed at their MLEs: P L(β) = max L(β; δ) δ Suppose the MLE of β is ˆβ, it can be shown that: 2 (logp L(β) logp L( ˆβ)) χ 2 (1) asymptotically. Based on the χ 2 (1) distribution, we may derive the lower and upper bounds of the confidence interval easily. Figure 3.6 is an intuitive illustration, where Bm is the MLE while Bu and Bl are the upper and lower bounds.

40 33 Figure 3.6: Confidence Interval by Profile Likelihood The bootstrap idea is a popular technique to obtain confidence interval (Efron and Tibshirani 1994). However, a naive resampling method is not appropriate for the hidden Markov model because that will break the dependency structure of the original data. Parametric bootstrap can be applied to handle this issue. There are generally three ways to do parametric bootstrap with hidden Markov model. 1. Based on parameter estimation. 2. Based on original data. 3. Mixture of 1 and 2. For the first approach, the parameters are estimated by original data and a new data set will be simulated solely based on the parameter estimates. The basic assumption for this method is that the model is correctly specified. For the second approach, after model fitting the residuals are to be collected: r i = y i ŷ i The sampling with replacement is done within the set of residuals and new observations are generated as follows: yi new = ŷ i + ri new, ri new {r 1, r 2,...r N } The sample size of original data should be sufficiently large.

41 34 For the third approach, we assume that the random errors are i.i.d. normal: y new i = ŷ i + ri new, ri new N(0, σ 2 ) However, it is worth noting that the variance of the error terms might contain substantial heterogeneity. For example, suppose there are two states in a hidden Markov model. It is highly possible that the distributions in the two states have different variance structures. Therefore, adjustments must be applied for methods 2 and 3 (Bandeen-Roche et al. 1997, Wang et al. 2005): 1. Assign each observation to a group based on posterior probability. 2. Within each group, do the resampling of residuals. 3. Repeat 1 and 2. By bootstrapping, new data sets can be generated and parameter estimates will be evaluated. After that, there are two ways to generate a 95% confidence interval: either by 1.96 ˆσ β or by empirical 2.5% and 97.5% quantiles. The results should be close for sufficiently large data sets Determine the Number of Components A challenging issue in the hidden Markov model is to choose the proper number of components. several criteria and procedures have been proposed. In this section, we will present three general approaches to address this issue: likelihood ratio test, criteria-based model selection, and cross-validation. The likelihood ratio test (Neyman and Pearson, 1933) has been well known as an efficient way of model selection. Under certain regularity conditions, the log likelihood ratio under null hypothesis can be tested through the χ 2 test. However, it has been shown that two mixture distributions with different numbers of components cannot satisfy those regularity conditions (Wolfe, 1971). Wolfe claimed that a modified version of likelihood ratio test could possibly be applied: H 0 : n = c 0 H 1 : n = c 1, (c 1 > c 0 ) 2 N (N 1 d c 1 2 )(logl(c 0) logl(c 1 )) χ 2 2d ( ) c 1 c 0 where N is the sample size, n is the number of components, and d = c 1 c 0. The assumption that c 1 > c 0 is based on the statistics version of Occam s razor : We always prefer a simple model that might work. Unless there is strong evidence to support that a more complicated model is significantly better, we will stick to the simple model. Wolfe s approach only

42 35 provides a rough approximation but was easy to implement in those years when computing resources were limited. McLachlan (1987) proposed that bootstrap can be applied to obtain the approximate distribution of the log likelihood ratio test statistic under null hypothesis. The premise is to generate random samples by a mixture distribution with c 0 components, calculate the log likelihood ratio test statistics, and then establish the empirical distribution based on the observed test statistics. The generated empirical distribution can be used to calculate the p-value of the original data. The criteria-based model selection method has also gained popularity to asses the number of components in mixture models. The likelihood function, interpreted as a measure of goodness of model fitting, could not be used as a criterion to select the number of components in a mixture model due to its tendency to choose more complicated models (Biernacki et al. 2000). A number of criteria have been discussed to handle this issue. Usually these criteria add a penalty term along with the likelihood to represent the trade-off between model complexity and utility, for example, the AIC criteria (Akaike 1974). Hurvich and Tsai (1989) suggested the use of AICc (corrected AIC) instead of AIC, since AIC tends to overfit the data. The AICc is defined as: (k + 1) AICc = 2 logl + 2k (1 + N k 1 ) where k is the number of parameters. Another popular criterion is BIC (Schwarz 1978): BIC = 2 logl + k log(n) BIC generally adds more penalty on the model complexity compared to AIC, and it has been shown that BIC is equivalent to the Minimum Description Length(MDL) criterion (Rissanen 1978). Another useful criterion is Minimum Message Length (MML). MML (Wallace and Boulton 1968) was derived from the perspective of information theory. The process of modeling is considered as encoding the data and the model parameters can be considered as the extra cost of the encoding. Therefore, the length of an encoded message can be described as: Length(θ, Y ) = Length(θ) + Length(Y θ) If Length(θ) is short, then the model is simple but correspondingly Length(Y θ) will be long. The cross-validation approach was proposed by Celeux and Durand (2008). It was based on half-sampling. Celeux showed that if we pick the odd numbers of observations or the even numbers of observations in a data set generated by hidden Markov model, the result is still a

43 36 hidden Markov chain. Therefore, we may simply use the odd subset of the original sample to fit the model, and calculate the likelihood of the even subset of the original sample. The likelihood can be used as a criterion to proceed model selection Goodness of Fit Assessing the goodness of fit of a given hidden Markov model is an important topic. As in regular regression models, the residuals can be used. However, due to the inherent heterogeneity of the hidden Markov model, the residuals must be adjusted by classes, as we have seen in the previous section. Wang et al. (2005) showed that the class-adjusted residuals are asymptotically equivalent to the distributions of residuals from the latent classes. Zucchini and MacDonald (2009) proposed a different approach using the pseudo-residual. The pseudo-residual is defined as the probability of seeing a less extreme response than observed given all observations except that at time t: u t = P (Y t y t y i, i t) For well-fitted models, the pseudo residuals should be approximately U nif orm[0, 1] distributed. MacKay Altman (2004) provides an intuitive graphical approach similar to the Q-Q plot. By plotting the estimated distribution against the empirical distribution, the lack of fit can be detected with high probability for a large sample size. The estimated distribution is given by: K F (y θ) = π i F i (y θ i ) i=1 If the model is correctly specified, the plot of empirical against estimated distributions should be close to a straight line Prediction Predicting the future travel time from historical data in hidden Markov model can be conducted according to the following Markov property. Based on model specification, f(y t s t ) = { N(µ1, σ1), 2 if s t = 1 N(µ 2, σ2), 2 if s t = 2 We have: E(y t s t ) = { µ1, if s t = 1 µ 2, if s t = 2

44 37 If the model contains regression in the mean parameter, { N(µ1, σ f(y t s t, x t ) = 1), 2 if s t = 1 N(θ 0 + θ 1 x t, σ2), 2 if s t = 2 we have: E(y t s t, x t ) = { µ1, if s t = 1 θ 0 + θ 1 x t, if s t = 2 Given the states, the expected value of y t can be used as the predicted value of travel time. However, the state is unobservable so our prediction is actually the expected future travel time, E(y t y 1,...y t 1, x 1,..., x t 1 ). The key of this problem is to predict s t using historical data. Assume the initial distribution for the Markov chain s t is: ( ) ( ) P (s0 = 1) p0 A = = P (s 0 = 2) 1 p 0 The transition matrix is: Then the distribution of s 1 is: The distribution of s 2 is: and so on. ( ) P11 P T = 12 P 21 P 22 A T A T 2 The marginal distribution of s t at any time t can be estimated through Markov property. The transition matrix is estimated by the data. The initial distribution could be either set manually or estimated from the last observed travel time in the previous time period. If the transition matrix is modeled through regression: It is straightforward to show that: log( P 12 P 11 ) = β 0,1 + β 1,1 x log( P 22 P 21 ) = β 0,2 + β 1,2 x P 11 = exp(β 0,1 + β 1,1 x) exp(β 0,1 + β 1,1 x) P 12 = exp(β 0,1 + β 1,1 x) + 1 P 21 = exp(β 0,2 + β 1,2 x) exp(β 0,2 + β 1,2 x) P 22 = exp(β 0,2 + β 1,2 x) + 1

45 38 Consider a simple example: Suppose that during the time interval [7:00-7:59], the traffic volume is 8. The transition matrix is: log( P 12 P 11 ) = x log( P 22 ) = x P 21 ( ) T (8) = Since the distribution of first vehicle is unknown, we might use the non-informative prior: ( ) 0.5 A = 0.5 To predict the state of the first vehicle in the next time interval, i.e. s 8, it can be shown that: ( ) A T (8) 7 = That is, y 8 has 99.4% probability to be in the free-flow state and µ 1 can be used asa predicted value. Figure 3.7 is an overview of the prediction procedures in the hidden Markov model. Figure 3.7: Hidden Markov Model: Estimation

46 Simulation Study No Covariate First, consider a simple case in which the covariate is not present in the model. We simulate 1,000 data sets, each with 5,000 observations, according to the transition matrix (values are based on estimates from real data): ( ) Each data set will be fitted by both traditional and hidden Markov models. As can be seen from Table 3.1, the confidence interval estimates of HMM are slightly narrower. Table 3.1: HMM vs. Traditional: No Covariate 1 Name True Traditional 95% C.I. HMM 95% C.I. µ (578.4, 581.2) (578.4, 581.1) µ (989.8, ) (1001.2,1062.5) σ (40.0, 42.1) 40.9 (40.1, 42.1) σ (348.4, 389.0) (350.3, 388.9)

47 40 Figure 3.8: HMM vs. Traditional 1 Figure 3.8 indicates that the log-likelihoods of HMM models are larger than that of traditional models. The mean difference in log-likelihood is around 951, which is a substantial difference. We also tried another set of parameters and Table 3.2 also implies that HMM can generate slightly narrower confidence intervals. Table 3.2: HMM vs. Traditional: No Covariate 2 Name True Traditional 95% C.I. HMM 95% C.I. µ (578.7,581.0) (578.8,581.2) µ (708.1,793.4) (710.1,788.0) σ (40.1, 42.1) 41.0 ( ) σ (341.3,396.3) (340.6,394.0)

48 With Covariate When the covariate is considered in the model, the simulation also indicated that the HMM is superior to the traditional mixture model. We simulated 500 data sets, each with 5,000 observations, according to the parameters setting (values are based on estimates from real data): { N(log(500), σ1 = 0.07), if s log(y) t = 1 N(log(1000), σ 2 = 0.31), if s t = 2 log( P 12 P 11 ) = x log( P 22 P 21 ) = x Due to computing issues, we use log transform of the original data and the log likelihood values will be changed accordingly. Figure 3.9 indicates that the log-likelihoods of HMM models are larger than that of traditional models. The mean difference in log-likelihood is around 997, which is a substantial difference.

49 42 Figure 3.9: HMM vs. Traditional 2 Figures 3.10 and 3.11 clearly demonstrate the advantage of HMM. Both the mean estimates and the variance estimates from HMM are superior.

50 Figure 3.10: HMM vs. Traditional 3 43

51 44 Figure 3.11: HMM vs. Traditional 4 Figure 3.12 illustrates the 95% confidence intervals of several parameters estimates of hidden markov model. The estimates from different samples are relatively symmetric and centered at the true values.

52 45 Figure 3.12: 95% C.I. of HMM Table 3.3 provides the numbers in Figure 3.12.

53 46 Table 3.3: Parameter Estimation of HMM Name True 95% C.I. µ (499,501) µ (977.7,1026.7) σ (0.070, 0.071) σ (0.28,0.32) β 0,1-6 (-7.04,-5.07) β 1,1 0.1 (0.03,0.16) β 0,2 0.6 (-0.56,1.52) β 1, (0.09, 0.23) 3.5 Application for Field-Collected Data The proposed model was applied to the data collected from the I-35 near San Antonio, Texas. The actual travel time of each vehicle was measured when vehicles equipped with ratio frequency tags passed automatic vehicle identification (AVI) stations. The AVI data collection approach collects accurate travel time but only measured the travel time of vehicles equipped with electronic ratio frequency equipment. Therefore, the collected data represent the actual traffic flow by a scale. Figure 3.13 illustrates the sampling scheme and the observations can be considered as proportional to the actual traffic flow. In order to predict the travel time with higher precision, the sampling rate could be scaled up (Figure 3.14) but the basic modeling steps are identical. Figure 3.13: Illustration of Low Sampling Rate

47 Figure 3.14: Illustration of Potential Improvement The number of hidden states in real data can be determined through the likelihood ratio test.

54 47 Figure 3.14: Illustration of Potential Improvement The number of hidden states in real data can be determined through the likelihood ratio test. We first evaluate whether two states are sufficient to depict the hidden structure from the data. Three or more states will be considered if two-state model does not provide sufficient fitting: H 0 : n = 2 H 1 : n = 3 Since the log likelihood ratio does not follow χ 2 distribution, the bootstrap sampling method was adopted. Figure 3.15 shows the histogram of the log likelihood ratio from 500 samples.

Travel Time Reliability Modeling

Travel Time Reliability Modeling The Pennsylvania State University University of Maryland University of Virginia Virginia Polytechnic Institute & State University West Virginia University The Pennsylvania