BAYESIAN TRAVEL TIME RELIABILITY MODELS

Size: px
Start display at page:

Download "BAYESIAN TRAVEL TIME RELIABILITY MODELS"

Transcription

1 BAYESIAN TRAVEL TIME RELIABILITY MODELS Morgan State University The Pennsylvania State University University of Maryland University of Virginia Virginia Polytechnic Institute & State University West Virginia University The Pennsylvania State University The Thomas D. Larson Pennsylvania Transportation Institute Transportation Research Building University Park, PA Phone: Fax:

2 Bayesian Travel Time Reliability Models By Feng Guo, Dengfeng Zhang, and Hesham Rakha Mid-Atlantic Universities Transportation Center Final Report Virginia Tech Transportation Institute, Department of Statistics, Virginia Polytechnic Institute and State University DISCLAIMER The contents of this report reflect the views of the authors, who are responsible for the facts and the accuracy of the information presented herein. This document is disseminated under the sponsorship of the U.S. Department of Transportation s University Transportation Centers Program, in the interest of information exchange. The U.S. Government assumes no liability for the contents or use thereof. June 30, 2015

3 1. Report No. 2. Government Accession No. 3. Recipient s Catalog No. 4. Title and Subtitle Bayesian Travel Time Reliability Models 7. Author(s) Feng Guo, Dengfeng Zhang, and Hesham Rakha 9. Performing Organization Name and Address Virginia Tech Transportation Institute 3500 Transportation Research Plaza Blacksburg, VA Sponsoring Agency Name and Address US Department of Transportation Research & Innovative Technology Admin UTC Program, RDT New Jersey Ave., SE Washington, DC Supplementary Notes 5. Report Date June 30, 2015, 6. Performing Organization Code Virginia Tech 8. Performing Organization Report No. 10. Work Unit No. (TRAIS) 11. Contract or Grant No. 13. Type of Report and Period Final Report, 6/2012-6/ Sponsoring Agency Code 16. Abstract Travel time reliability is a stochastic process affected by multiple factors, with traffic volume being the most important one. This study built up and advanced the multi-state models by proposing regressions on the proportions and distribution parameters for underlying traffic states. The Bayesian analysis provides valid credible intervals for each parameter without asymptotic assumption. Two alternative approaches were proposed and evaluated. The first approach is a Bayesian multi-state travel time regression model which provides a regression for key model parameters to traffic volume; the second approach is a hidden Markov regression which not only provides a link between key model parameters and traffic volume, but also incorporates the dependency structure among traffic volume in adjacent time windows. Both approaches provide advanced methodology for modeling traffic time reliability under complex stochastic scenarios. t 17. Key Words Traffic simulation, traffic modeling, driver behavior, car following 19. Security Classif. (of this report) 20. Security Classif. (of this page) 21. No. of Pages 22. Price 47

4 ABSTRACT Travel time reliability is a stochastic process affected by multiple factors, with traffic volume being the most important one. This study built up and advanced the multi-state models by proposing regressions on the proportions and distribution parameters for underlying traffic states. The Bayesian analysis provides valid credible intervals for each parameter without asymptotic assumption. Two alternative approaches were proposed and evaluated. The first approach is a Bayesian multi-state travel time regression model which provides a regression for key model parameters to traffic volume; the second approach is a hidden Markov regression which not only provides a link between key model parameters and traffic volume, but also incorporates the dependency structure among traffic volume in adjacent time windows. Both approaches provide advanced methodology for modeling traffic time reliability under complex stochastic scenarios.

5 Contents 1 Introduction 1 2 Travel Time Reliability: The Bayesian Multi-state Travel Time Regression Model Introduction and model specification Model Fitting using Markov Chain Monte Carlo Algorithm Model Model Simulation Study Model comparison Simulation Evaluation for Model Robustness of Misspecified θ s Model Application to Field-collected Data Summary Travel Time Reliability: Hidden Markov Model Introduction Autocorrelation Theoretical Background Model Specification Model Estimation Bootstrap and Confidence Interval Determine the Number of Components Goodness of Fit Prediction Simulation Study No Covariate With Covariate Application for Field-Collected Data Summary Summary 54 ii

6 List of Figures 2.1 Illustration of Data Collection Average Traffic Volume by Hour of a Day Probability of Congested State versus Traffic Volume in Simulation Studies Model 1 vs. Model 2: Coverage Probabilities Comparison in Five Settings Model 2: Coverage Probabilities Comparison Misspecified and True Model Comparison Theoretical, Misspecified and True Model Comparison Parameters Estimates under Different θ ss Probability in Congested State and Traffic Volume: Real Data Autocorrelation Comparison Box-Cox Transformation Hidden Markov Model: An Illustration Illustration of Two States Markov Chain Hidden Markov Model: Flow Chart Confidence Interval by Profile Likelihood Hidden Markov Model: Estimation HMM vs. Traditional HMM vs. Traditional HMM vs. Traditional HMM vs. Traditional % C.I. of HMM Illustration of Low Sampling Rate Illustration of Potential Improvement Histogram of the Log Likelihood Ratio χ 2 and Empirical Distributions Illustration of Three States Markov Chain Residual Check iii

7 List of Tables 2.1 Variance of Priors Models 1 and 2: Average of Posterior Means Comparison Models 1 and 2: Coverage Probabilities Comparison Model 2 between Settings 2 and 3: Coverage Probabilities Comparison More Results of Model 2: Coverage Probabilities Misspecified Models: Average of Posterior Means Comparison Misspecified Models: Coverage Probabilities Comparison Results from Real Data with Different θ ss HMM vs. Traditional: No Covariate HMM vs. Traditional: No Covariate Parameter Estimation of HMM Kolmogorov-Smirnov Test Result Parameter Estimation for Real Data iv

8 Chapter 1 Introduction The objective of this study is to develop Bayesian multi-state travel time reliability models for evaluating travel time uncertainty under various traffic conditions. The reliability of travel time is a key performance index of transportation system and has been a major transportation research area. Reliability is one of the four key focus areas of the Second Strategic Highway Research Plan (SHRP2). The Federal Highway Administration (FHWA) defines travel time reliability as consistency or dependability in travel times, as measured from day-to-day or across different times of day. Understanding the nature of travel time reliability will help individual travelers for trip planing and trip decision making, as well as facilitating transportation management agencies to improve the efficiency of transportation system. Travel time is affected by multiple factors such as traffic condition, weather, and incidents. Many of these factors are random in nature and stochastic models should be used to quantify the uncertainty associated with travel time. Traditionally, uni-mode distributions have been adopted for travel time reliability modeling and the log-normal distribution has been the most popular model (Emam and Ai-Deek 2006, Tu et al. 2008). A number of candidate distributions have been discussed and compared: lognormal, gamma, Weibull and exponential distributions. However, these approaches could not accommodate the high level of heterogeneity commonly presented in the travel time data. Thus the single-mode distributions usually yield poor model fitting under complex travel conditions, especially during peak hours of a day (Guo et al. 2010). Compared to single-mode distributions, mixture distributions can accommodate data with multiple modes and are flexible in modeling data generated from complex systems (Fowlkes 1979).The multi-state travel time reliability model has been demonstrated to provide superior data fitting, scientifically sound interpretation, as well as close relationship with the underline traffic flow characteristics (Guo et al. 2012, Park et al. (2010)). The advantages of mixture normal and mixture lognormal have been demonstrated in the application of fieldcollected data (Guo et al. 2012). This study is based on the multi-state travel time model 1

9 2 framework. One of the most attractive features of the multi-state model is its capability to associate travel time distribution with underlying traffic conditions. Park et al. (2010) showed that travel time states are related to the fundamental diagram, i.e., traffic flow, speed, and density. Two levels of uncertainty can be quantitatively assessed: the probability of a given traffic condition, for example, congested or free-flow and the variation of travel time within each traffic condition. Besides the free-flow and congested states, the model can also accommodate delay caused by traffic incidents (Park et al. (2011)). However, one of the most important factors affecting travel time, the traffic volume, has yet to be incorporated into the multi-state model. This study advanced the previous methods using two alternative approaches to incorporate the influence of traffic volume: Bayesian mixed-effect travel time regression model and hidden Markov model. The traffic volume, defined as the number of vehicles traveling through a specific segment of the road within a specific time period, plays an essential role in the present research. The study extended the multi-state travel time model by incorporating the effects from traffic volume. The proposed models were applied to field data collected along a section of the I-35 freeway in San Antonio, Texas (hereinafter I-35 data). The study covers a sixteen kilometer section with an average daily traffic volume around 150,000 vehicles. The travel time was collected when vehicles tagged by a radio frequency device passed the identification stations on New Braunfels Ave. (Station no. 42) and O Connor Rd. (Station no. 49). We set the time period to summarize the traffic volume as one hour, and collected the traffic volume from [0:00, 0:59] to [23:00, 23:59] for more than 20 weekdays. We proposed several Bayesian multi-state regression models to incorporate traffic volume into the estimation of probability encountering congested traffic state. The model was fitted using the Markov Chain Monte Carlo (MCMC) algorithms, which enable us to obtain the posterior distribution of models parameters as well as the uncertainty of estimation (Lenk and DeSarbo 2000). As adopted the probit link function, which is more convenient in the Bayesian context compared to logit function because the corresponding Gibbs sampler is easier to implement (Geweke and Keane 1997). The Bayesian multi-state regression models discussed above is based on the assumption that all of the observations are conditionally independent. The independence assumption is typically not satisfied for travel time among periods close to each other. We proposed a hidden Markov model to incorporate the dependency structure among travel time data collected in adjacent time units (Baum and Petrie 1966). The Hidden Markov model can be seen as a mixture model which relaxes the independence assumption (Qi et al. 2007). It is able to incorporate the dependency structure of observations, and also include the traditional mixture model as a special case (Scott 2002). Hidden Markov models have bee applied in a wide variety of applications, including speech

10 recognition (Rabiner 1989), biometrics (Albert 1991), econometrics (Hamilton 1989), and computational biology (Krogh et al. 1994). We developed hidden Markov models for travel time reliability evaluation. The proposed model incorporate the impact of traffic volume in the transition matrix of the Markov process. The results show that the hidden Markov model outperforms traditional mixture models. 3

11 Chapter 2 Travel Time Reliability: The Bayesian Multi-state Travel Time Regression Model 2.1 Introduction and model specification Travel time of vehicles contains substantial variability. The Federal Highway Administration has formally defined Travel time reliability as consistency or dependability in travel times, as measured from day-to-day or across different times of day. Understanding the nature of travel time reliability will help individual travelers for trip planing and trip decision making, as well as facilitating transportation management agencies to improve the efficiency of transportation system. The multi-state model has been developed for modeling travel time reliability, and one of the most attractive features is its capability to associate travel time with underlying traffic conditions. In the Gaussian mixture model, the travel time variable, Y, is assumed to follow a two-component mixture distribution with density function: f(y λ, µ 1, µ 2, σ 2 1, σ 2 2) = λf N (y µ 1, σ 2 1) + (1 λ)f N (y µ 2, σ 2 2) where f N represents the density function of a normal distribution with mean µ i and variance σ 2 i. Without loss of generality, we assume that µ 1 < µ 2. Under this condition, µ 1 and µ 2 indicate the mean travel time under free-flow state and congested state. Subsequently, λ and 1 λ are the probability of free-flow state and congested state. σ 2 1 and σ 2 2 represent the variance of travel time under the free-flow state and the congested state. The probability of the free-flow state, denoted by λ, has support in (0, 1). To link λ with 4

12 5 traffic volume x, a common approach is to use the logit link function: λ log( 1 λ ) = β 0 + β 1 x, or more general, λ log( 1 λ ) = Xβ, where covariates matrix X contains 1 s as the first column, and β is a vector of regression coefficients. The traffic volume is defined as the number of vehicles traveling through a specific segment of road within a given time period. An alternative for the logit link function is the probit link function, which is the inverse of standard normal cumulative distribution function: Φ 1 (1 λ) = Xβ For Bayesian models, the probit function is preferred due to its ease in Markov Chain Monte Carlo simulation to generate the posterior distribution. In the probit model, a latent variable w i R is introduced for each observation to indicate which group the observation belongs to: { Group1 if w y i i < 0 Group 2 otherwise Assume the latent variable w i N(X i β, 1), where X i is the i th row in matrix X. It can be shown that: λ = 1 Φ(Xβ) = P (w < 0 µ = Xβ, σ 2 = 1) This setting establishes the relationship between the proportion of two latent groups and the covariate(s). The likelihood function is correspondingly f N (y µ 1, σ 2 1) I(w<0) f N (y µ 2, σ 2 2) I(w 0). As shown by Guo et al. (2012), the variability in the mean travel speed in the congested state, µ 2, can be substantial. From an engineering perspective, there exist certain relationships between µ 2 and the traffic volume x i. Two alternative models were proposed to relate µ 2 with traffic volume: (1)µ 2i = θ 0 + θ 1 x i = X i θ (2)µ 2i = θ s µ 1 + θ x i The first model assumes that µ 1 and µ 2 are estimated independently. The second model assumes that the intercept is proportional to µ 1 with a predetermined scale parameter θ s. With proper selection of θ s, the second model can ensure that the estimated main travel time for the free flow and congested condition are sufficiently separated. Following the convention of the Bayesian approach, we use the precision parameter ψ j to denote the inverse of the variance of the two components (i.e. 1/σ 2 j, j=1,2). Two levels of uncertainty are quantitatively assessed in the proposed model. The first level of uncertainty is the probability of a given traffic condition, for example, congested or free-flow; the second level of uncertainty is the variation of travel time for each traffic condition.

13 6 To complete the Bayesian model setup, the following non-informative priors are adopted according to Yang and Berger (1996): π(µ 1 ) 1, π(β 0 ) 1, π(β 1 ) 1, π(θ 0 ) 1, π(θ 1 ) 1, π(ψ 1 ) 1/ψ 1, π(ψ 2 ) 1/ψ 2 It is desirable that a Bayesian model not be sensitive to the choice of prior distributions. Several alternative priors, such as the normal distribution with different variance, are tested as shown in (Table 2.1). As can be seen, the results are not significantly influenced and they are quite similar to that from non-informative priors, most likely due to the large sample size. Therefore, non-informative priors are used in the model. Table 2.1: Variance of Priors σβ σθ Model Fitting using Markov Chain Monte Carlo Algorithm The conclusions of the study are based on the posterior distribution of the parameters as shown below: f(µ 1, ψ, β, θ, w X, y) f(y µ 1, ψ, β, θ, w, X)f(µ 1, ψ, β, θ, w X) f(y µ 1, ψ, w, θ)f(w X, β)f(µ 1, ψ, β, θ X) f(y µ 1, ψ, w, θ)f(w X, β)π(µ 1 )π(ψ 1 )π(ψ 2 )π(β)π(θ), were f(y µ 1, ψ, w, θ) is the density function of multi-state normal distribution: f N (y µ 1, 1/ψ 1 ) I(w<0) f N (y Xθ, 1/ψ 2 ) I(w 0), and f(w X, β) is the multivariate normal with mean Xβ and covariance matrix I. Since there is no closed form solution for the above posterior distribution, simulation-based Markov Chain Monte Carlo algorithm is used to estimate the posterior distribution. The MCMC algorithm samples posterior distribution from full condition distribution for each parameter. The conditional distributions are developed in the following subsection Model 1 The full conditional distribution for each parameter in Model 1 is shown below.

14 7 1. The full conditional for w: f(w...) n (f N (y i µ 1, 1/ψ 1 )I(w i < 0) + f N (y i X i θ, 1/ψ 2 )I(w i 0))f N (w i X i β, 1) i=1 This is the multi-state truncated normal. Define a = f N (y i µ 1, 1/ψ 1 ), b = f N (y i X i θ, 1/ψ 2 ), then with probability a, w a+b i is sampled from f N (w i X i β, 1) truncated at w i < 0; with probability b, w a+b i is sampled from f N (w i X i β, 1) truncated at w i The full conditional for µ 1 : f(µ 1...) i:w i <0 N( n (f N (y i µ 1, 1/ψ 1 )I(w i < 0) + f N (y i X i θ, 1/ψ 2 )I(w i 0)) i=1 f N (y i µ 1, 1/ψ 1 ) i:w i <0 y i 1, ) n 1 n 1 ψ 1 This is a univariate normal distribution. n 1 is the number of w is that are smaller than 0. Corresponding to the model assumption µ 1 < µ 2, we will right truncate this distribution at min(x i θ). 3. The full conditional for ψ 1 : f(ψ 1...) ψ 1 1 n f N (y i µ 1, 1/ψ 1 ) I(wi<0) f N (y i X i θ, 1/ψ 2 ) I(w i 0) i=1 ψ n exp( 1 2 ψ 1 (y i µ 1 ) 2 ) i:w i <0 This is the Gamma distribution with shape parameter n 1 2 and rate parameter 1 2 µ 1 ) The full conditional for ψ 2 : f(ψ 2...) ψ 1 2 n f N (y i µ 1, 1/ψ 1 ) I(wi<0) f N (y i X i θ, 1/ψ 2 ) I(w i 0) i=1 ψ n exp( 1 2 ψ 2 (y i X i θ) 2 ) i:w i 0 i:w i <0 (y i where n 2 is the number of w is that are greater than or equal to 0. This is the Gamma distribution with shape parameter n 2 2 and rate parameter 1 2 i:w i 0 (y i X i θ) 2.

15 8 5. The full conditional for β: f(β...) n f(w i X i, β) i=1 n i=1 exp( (w i X i β) 2 ) 2 This is the bivariate normal distribution with mean (X T X) 1 X T w and covariance matrix (X T X) The full conditional for θ: n f(θ...) f N (y i µ 1, 1/ψ 1 ) I(wi<0) f N (y i X i θ, 1/ψ 2 ) I(w i 0) i:w i 0 i=1 f N (y i X i θ, 1/ψ 2 ) Define: Σ + is the n 2 n 2 diagonal matrix with the diagonal elements being 1/ψ 2 s. X + is the submatrix of X such columns i that w i 0 y + is the subvector of y such elements i that w i 0, then: f(θ...) Σ exp( 2 (y + X + θ) T Σ 1 + (y + X + θ)) N((X+Σ T 1 + X + ) 1 X+Σ T 1 + y +, (X+Σ T 1 + X + ) 1 ) This is the bivariate normal with mean (X T +Σ 1 X + ) 1 X T +Σ 1 y + and covariance matrix (X T +Σ 1 X + ) Model 2 Compared to model 1, this model has one fewer parameter and the full conditional distributions have been changed accordingly. 1. The full conditional for w: n f(w...) (f N (y i µ 1, 1/ψ 1 )I(w i < 0) + f N (y i θ s µ 1 + θ x i, 1/ψ 2 )I(w i 0))f N (w i X i β, 1) i=1 2. The full conditional for µ 1 : n f(µ 1...) f N (y i µ 1, 1/ψ 1 )I(w i < 0) + f N (y i θ s µ 1 + θ x i, 1/ψ 2 ) N( ψ 1 i=1 i:w i <0 y i + θ s ψ 2 i:w i 0 (y i θ x i ) 1, ) n 1 ψ 1 + θ s n 2 ψ 2 n 1 ψ 1 + θ s n 2 ψ 2

16 9 This is still a univariate normal distribution, but the parameters are different compared with model The full conditional for ψ 1 : f(ψ 1...) ψ 1 1 n f N (y i µ 1, 1/ψ 1 ) I(wi<0) f N (y i θ s µ 1 + θ x i, 1/ψ 2 ) I(w i 0) i=1 ψ n exp( 1 2 ψ 1 4. The full conditional for ψ 2 : f(ψ 2...) ψ 1 2 (y i µ 1 ) 2 ) i:w i <0 n f N (y i µ 1, 1/ψ 1 ) I(wi<0) f N (y i θ s µ 1 + θ x i, 1/ψ 2 ) I(w i 0) i=1 ψ n exp( 1 2 ψ 2 5. The full conditional for β: (y i θ s µ 1 θ x i ) 2 ) i:w i 0 f(β...) n f(w i X i, β) i=1 n i=1 exp( (w i X i β) 2 ) 2 This is the bivariate normal distribution with mean (X T X) 1 X T w and covariance matrix (X T X) The full conditional for θ: n f(θ...) f N (y i µ 1, 1/ψ 1 )I(w i < 0) + f N (y i θ s µ 1 + θ x i, 1/ψ 2 ) i:w i 0 i=1 f N (y i θ s µ 1 + θ x i, 1/ψ 2 ) i:w N( i 0 (y i θ s µ 1 )x i, i:w i 0 x2 i 2.3 Simulation Study 1 ψ 2 i:w i 0 x2 i We conducted a simulation study to examine the proposed models based on the data set collected on Interstate I-35 near San Antonio, Texas (Guo et al. 2010). The study corridor )

17 10 covered a 16 kilometer section with an average daily traffic volume around 150,000 vehicles. The travel time was collected when vehicles tagged using a radio frequency device passed the automatic vehicle identification stations on New Braunfels Avenue (Station no. 42) and OConnor Road (station no. 49). Figure 2.1 illustrates data collection sites setup. Figure 2.1: Illustration of Data Collection The traffic volume is the number of vehicles traveling through the road segment during a specific time period. The analysis unit is set to one hour, and the hourly traffic volume from [0:00, 0:59] to [23:00, 23:59] is calculated. Although hourly traffic volume is adopted in this analysis, in theory any time unit can be used if there are sufficient data within the time unit. The data set contains 237 distinct hours of observations. We average the traffic volume by the hours of a day. Figure 2.2 illustrates the range of average traffic volume by hour. Figure 2.2: Average Traffic Volume by Hour of a Day In the original data only vehicles equipped with electronic tags were counted. These vehicles account for a proportion of the total traffic. In order to estimate the real traffic volume, we simulated new data sets according to the shape of the original data and extended it by a

18 11 scale: Y ij : Simulated traffic volume of hour i in day j. Y ij = [c µ i + ɛ ij ] +, ɛ ij N(0, d 2 ) X ij : Original traffic volume of hour i in day j. i=0...23, j= or 11 k µ i : Average traffic volume of hour i. µ i = X ik Numbe of days Based on historical data and engineering judgment, we selected d=100 and c=50. The traffic volume and the travel time data sets were generated according to the following procedure. For a given model and a set of predetermined parameters, the simulation study is conducted as follows: 1. Set n=number of simulations we plan to run. 2. For (i in 1:n){ Generate data Do{ Markov Chain Monte Carlo }While convergence Record if the 95% credible intervals cover the true values } For each round simulation, we ran more than 5,000 MCMC iterations and ensured convergence of the MCMC chains. The inference statistics, such as posterior mean, 95% credible interval, and coverage probabilities are calculated. The analysis focuses on the comparison between Model 1 and Model 2, model performance, and robustness under misspecified parameters Model comparison The main difference between Model 1 and Model 2 is that Model 2 provides a mechanism to control the difference between free-flow mean travel time µ 1 and the baseline of congested mean travel time µ 2 via the parameter θ s. The θ s could be determined via engineering expertise and preliminary data analysis. It is tempting to set θ s = 1, which corresponding to the scenario that intercept for congested state equals to the free flow travel time. However, initial analyses indicate that the model identifiability issue rises when θ s is too close to 1. The identifiability issue is caused by the fact that when θ s is small, the mean travel time of the two component distribution, i.e., free flow and congested flow are too close to each other. To provide sufficient separation between the two component distributions, we set the minimum θ s values at θ s = 1.2 in the simulation study The values of µ 1, ψ 1, ψ 2, β 0 and β 1 are set according to historical data. Figure 2.3 shows the relationship between probability in the congested state and the traffic volume.

19 12 Figure 2.3: Probability of Congested State versus Traffic Volume in Simulation Studies Five different settings of θ 0 (θ s ) and θ 1 are evaluated. The results are summarized in Table 2.2. As can be seen, the point estimates of the parameters generally are very close to the true values. Model 2 seems to be slightly better than Model 1, but the difference is minimal. One key criterion for evaluating model performance is the coverage probability. Ideally the coverage probability of the posterior credible intervals should cover the true model parameter at the nominal significant level, i.e., for 1,000 simulation, 95 % of times (950 out of 1000) the resulting 95% posterior credible intervals should cover the true model parameter. As can be seen in Table 2.3 and Figure 2.4, the coverage probability for parameter µ 1 is close to 0.95 in most cases. However, the coverage probability for β 0, β 1 and θ 1 could be quite off, especially when the two component distributions are similar to each other. Overall performance of Model 2 is better than Model 1, especially when the two component distributions are similar to each other.

20 13 Table 2.2: Models 1 and 2: Average of Posterior Means Comparison µ 1 ψ 1 ψ 2 β 0 β 1 θ 0 (θ s ) θ 1 Setting e (1.2) 0.6 Model e Model e Setting e (1.3) 0.6 Model e Model e Setting e (1.3) 0.3 Model e Model e Setting e (1.4) 0.3 Model e Model e Setting e (1.5) 0.3 Model e Model e Table 2.3: Models 1 and 2: Coverage Probabilities Comparison µ 1 ψ 1 ψ 2 β 0 β 1 θ 0 (θ s ) θ 1 Setting e (1.2) 0.6 Model Model Setting e (1.3) 0.6 Model Model Setting e (1.3) 0.3 Model Model Setting e (1.4) 0.3 Model Model Setting e (1.5) 0.3 Model Model

21 14 Figure 2.4: Model 1 vs. Model 2: Coverage Probabilities Comparison in Five Settings Figure 2.4 shows additional investigation for Model 2 under Settings 2 and 3, in which the value of θ s is set to 1.3. Since the value of θ 1 plays an important role in the coverage probability, we selected two more points between 0.3 and 0.6. In general, when θ 1 increases, the coverage probabilities are closer to the target 95% rate. Table 2.4: Model 2 between Settings 2 and 3: Coverage Probabilities Comparison Value of θ 1 µ 1 ψ 1 ψ 2 β 0 β 1 θ Simulation Evaluation for Model 2 To evaluate the robustness of Model 2, we conducted a more complicated simulation study based on the combinations of various parameters. The following parameters are fixed among all simulation setups: µ 1 = 500, ψ 1 = 0.01, β 0 = 3, ψ 2 =

22 15 For the parameters of interest, we tested multiple levels for each parameter. β 1 {0.004, , 0.005} θ {0.3, 0.6} θ s {1.2, 1.3, 1.4, 1.5} β 1 represents the relationship between proportion of congested state and the traffic volume. θ indicates the relationship between travel time under congested state and the traffic volume. There are 24 different settings for simulation. The outputs are summarized in Table 2.5. The results show that the coverage probabilities are generally good when the two components are well separated. The model performance can be more clearly identified in Figure 2.5. The dashed line denotes the 95% significant level for reference. As can be seen, the larger θ and θ s are, the higher the coverage probabilities will be. Table 2.5: More Results of Model 2: Coverage Probabilities ID β 1 θ θ s Cov of µ 1 Cov of ψ 1 Cov of ψ 2 Cov of β 0 Cov of β 1 Cov of θ

23 16 Figure 2.5: Model 2: Coverage Probabilities Comparison Robustness of Misspecified θ s The parameter θ s is one of the key parameters of Model 2 and is predetermined instead of estimated from data. The true value of θ s is typically unknown. Therefore, the robustness of the model with respect to misspecification is critical for applications. For example, will the results change substantially if the true value of θ s is 1.2 but was misspecified based on θ s = 1.3. We evaluated this issue with four different settings. The results of the simulation are shown in Table 2.6. As can be seen, the point estimates of the parameters are generally stable. In the case of misspecified θ s, the model would generally overestimate the mean of component 2. From Table 2.7, it can be concluded that when the two components are not well separated (i.e., θ s and θ are small), the misspecified models can

24 17 sometimes have better coverage of the regression coefficients for the mixture proportion (ψ 1, β 0 and β 1 ). Table 2.6: Misspecified Models: Average of Posterior Means Comparison µ 1 ψ 1 ψ 2 β 0 β 1 θ s θ 1 µ 2 Setting e True Model e Misspecified e Misspecified e Setting e True Model e Misspecified e Setting e True Model e Misspecified e Misspecified e Setting e True Model e Misspecified e Note: The µ 2 is estimated by θ s µ 1 + θ 1 X Table 2.7: Misspecified Models: Coverage Probabilities Comparison µ 1 ψ 1 ψ 2 β 0 β 1 θ s θ 1 Setting e True Model Misspecified Misspecified Setting e True Model Misspecified Setting e True Model Misspecified Misspecified Setting e True Model Misspecified Figure 2.6 shows the coverage probabilities comparison between true and misspecified models in four different settings. The dashed line is used to denote 95% for reference. It can be

25 18 observed that the true models are superior in estimating θ 1 and ψ 2, while the misspecified models perform better in ψ 1, β 0 and β 1. Figure 2.6: Misspecified and True Model Comparison Although misspecified models are generally not good at estimating θ 1, it is the mean value of µ 2 : µ 2 = θ s θ 0 + θ 1 x that is the of ultimate interest. In order to evaluate the influence of misspecified θ s on µ 2, we evaluated the relationship between traffic volume and the corresponding µ 2 under theoretical result, true model estimate, and misspecified model estimate. Figure 2.7 shows that the misspecified model estimates are close to the theoretical results when the traffic volume is high, which directly links to the congested state. Therefore, the application of these models is still robust when θ s is misspecified.

26 19 Figure 2.7: Theoretical, Misspecified and True Model Comparison 2.4 Model Application to Field-collected Data We applied Model 2 to the field-collected data collected on I-35, near San Antonio, Texax, as introduced before. Models with different values of θ s were fitted. The results are shown in Table 2.8.

27 20 Table 2.8: Results from Real Data with Different θ ss Parameter θ s = 1 θ s = 1.1 θ s = 1.2 θ s = 1.3 θ s = 1.4 µ ψ ψ e e e e e-6 β β θ µ Note: The µ 2 is estimated by θ s µ 1 + θ X Figure 2.8 shows the relationship between θ s and other critical model parameters. Both the means and the standard deviations of the two components are quite stable with respect to the change of θ s.

28 21 Figure 2.8: Parameters Estimates under Different θ ss Figure 2.9 shows the relationship between probability in congested state and traffic volume under different settings of θ s. As can be seen, the difference is pronounced near traffic volume It should be noted that the traffic volume here is a subset of the actual traffic volume thus should be interpreted as an indicator of traffic condition.

29 22 Figure 2.9: Probability in Congested State and Traffic Volume: Real Data 2.5 Summary The multi-state model provides a flexible and efficient framework for modeling travel time reliability, especially under complex traffic conditions. Guo et al. (2012) illustrated that the multi-state model outperforms single-state models in congested or near-congested traffic

30 23 conditions and the advantage is substantial in high traffic volume conditions. The objective of this study is to quantitatively evaluate the influence of traffic volume on the mixture of two components. The study advances the multi-state models by proposing regressions on the proportions and distribution parameters for underlying traffic states. The Bayesian analysis also provides feasible credible intervals for each parameter without asymptotic assumption. Previous studies usually modeled the travel time independently without establishing the relationship between travel time and important transportation statistics such as traffic volume. The models developed can also be easily extended to include more covariates in either linear or nonlinear forms. The application results indicate that there is a negative relationship between the proportion of free-flow state and the traffic volume, which confirms the statement raised by Guo et al. (2012) that for low traffic volume conditions, there might only exist one travel time state and single-state models will be sufficient. The estimation for the congested state indicates that the travel time under such condition exhibits substantial variability and is positively related with traffic volume, which also verifies the phenomenon found by Guo et al. (2012). There are several potential extensions to the current research. Current research only includes lognormal and normal distributions. A number of other distributions, e.g., Gamma and extreme value distributions, can also be investigated. One of the assumptions for the existing Bayesian mixture model is that all the observations are independent, which could be relaxed in the Hidden Markov model.

31 Chapter 3 Travel Time Reliability: Hidden Markov Model 3.1 Introduction The Bayesian mixture regression model discussed in the previous chapter is based on the assumption that the observations are independent. In most cases, the travel time of vehicles were collected chronologically, thus the travel time in adjacent time periods is most likely to be corrected because of the continuity of traffic flow. Although it is possible to apply autocorrelated error terms to handle this problem (Cochrane and Orcutt 1949), the interpretation regarding this scenario is unclear. In order to accommodate the dependency structure of the data, we adopt a gentle methodology: the Hidden Markov Model (HMM). The basic concept of the hidden Markov model was introduced by Baum and Petrie (1966). It can be shown that the traditional mixture model as a special case of HMM (Scott 2002). Hidden Markov models are popular in a wide variety of applications including (Couvreur 1996), speech recognition (Rabiner 1989), biometrics (Albert 1991), econometrics (Hamilton 1989), computational biology (Krogh et al. 1994), fault detection (Smyth 1994) and many other areas. 3.2 Autocorrelation As an exploratory analysis, we treat the I-35 data as a time series and calculate autocorrelation among time windows separated by different lags. Autocorrelation is a measure of similarity between observations with certain time lags (Wiener 1930). 24

32 25 For a sequence {X t }, the autocorrelation is defined as: ACF (t, s) = E(X t µ t )(X s µ s ) σ t σ s By assuming that {X t } is second-order stationary (Wold 1938), the autocorrelation can be written as: ACF (s) = E(X t µ)(x t+s µ) σ 2 For an independent sequence, it is easy to see that ACF (s) should be small regardless the value of s. For a sequence as {X t : X t = 0.5 X t 1 + ɛ t }, the ACF (s) will be quite large when s=1 and decreases gradually as s increases. Figure 3.1 shows two plots with independent data and the field collected data. The x-axis is the time lag, while the y-axis is the ACF. The plot on the left shows the ACF of the independent sequence while the plot on the right is estimated from the I-35 data. It is clear that the observed travel time is not an independent data set. Figure 3.1: Autocorrelation Comparison A formal test of the autocorrelation is the Durbin Watson test (Durbin and Watson 1950). The Durbin Watson statistic is defined as: T t=2 d = (X t X t 1 ) 2 T t=1 X2 t

33 26 The value of d is between 0 and 4. If the Durbin Watson statistic is substantially less than 2, there might be positive correlation. On the other hand, there might be negative correlation. Under the normal assumption, the null distribution of the Durbin Watson statistic is a linear combination of chi-squared variables. To satisfy the normal assumption, we use Box-Cox transformation (Choongrak 1959): x. Figure 3.2 indicates that λ = 4 is the optimal choice. x λ 1 λ Figure 3.2: Box-Cox Transformation The Durbin Watson statistic from the (transformed) travel time is , which yields a p-value close to zero. This confirms the high positive autocorrelation among the travel time data. 3.3 Theoretical Background Model Specification A hidden Markov model consists of two sequences: the observed sequence {x t }, t = 1, 2,..., n and the latent state sequence {s t }, t = 1, 2,..., n. Given the s t, the distribution of observed

34 27 data x t is fully determined by the value of s t. For example, if we denote the travel time in seconds as {x t }, t = 1, 2,..., n, then the sequence s t could be defined as: { 1 if the road is under free-flow s t = 2 if the road is under congestion Figure 3.3: Hidden Markov Model: An Illustration The two values of s t represent the two travel time states, which correspond to the two components in a mixture distribution. Given the state, the observed data x t follows one of the distributions: { f(x Θ1 ), if s f(x t s t ) = t = 1 f(x Θ 2 ), if s t = 2 The form of the distribution f(x Θ) could be normal, Gamma, Poisson, multinomial, or others. For example, x t s t = 1 N(1000, ) and x t s t = 2 N(500, 30 2 ). The term Hidden indicates that {s t } is a latent sequence which cannot be observed. Secondly, the term Markov indicates an important property of {s t }: P (s t s t 1,..., s 1 ) = P (s t s t 1 ), t 2 Thus, {s t } is a Markov chain and has its transition probability matrix. sequence, the transition matrix is as follow: ( ) P11 P P = 12, P 21 P 22 For a two-state

35 28 where P ij is the probability that P (s t+1 = j s t = i). It can be shown that if {s t } is a trivial Markov Chain, i.e. i.i.d., then the hidden Markov model is equivalent to a traditional mixture model. Figure 3.4 is an illustration of the two-state Markov chain for traffic states. Figure 3.4: Illustration of Two States Markov Chain The basic properties of a Markov chain are listed below: Irreducible: It is possible to get to any state from any state; Aperiodic: A state i has period k (k > 2) if {n : p (n) ii > 0} = {k d : d 1}. If none of the states is periodic, the chain is aperiodic; Positive recurrent: A state i is positive recurrent if the expected time that state i returns to itself is finite. If every state in an irreducible chain is positive recurrent, there exists a unique stationary distribution π that satisfies: πp = π If an irreducible chain is positive recurrent and aperiodic, it is said to have a limiting distribution φ: lim n + p(n) ij = φ j, where p (n) ij = P (s i+n = j s i = i)

36 29 A limiting distribution, when it exists, is always a stationary distribution, but the converse is not true. The stationary or limiting distribution can be used to address the long-term behavior of a Markov chain. For example, suppose a hidden Markov model has such transition matrix: P = ( By solving: { π 1 = 0.8π π 2 π 1 + π 2 = 1 The solution for the above equations is π 1 = 1/3, π 2 = 2/3. Roughly speaking, 1/3 of the observations will be in state 1 while 2/3 of the observations will be in state 2 in a long-term run. This relates the hidden Markov model with the traditional mixture model. One of the primary interests for travel time uncertainty research is to evaluate the influence of traffic volume. In the HMM framework, we build regression models on transition probabilities using traffic volume data. Hereafter we use y to denote the observed data and x as the covariate. When the HMM has only two states, the transition matrix can be modeled in the style of logistic regression models. The transition probability matrix is a 2 2 matrix. Due to the constraints that P 11 + P 12 = P 21 + P 22 = 1, the matrix has two free parameters. Chung et al. (2007) discussed a similar model. For each row of the transition matrix, two logistic regression models with one covariate can be used: ), log( P 12 P 11 ) = β 0,1 + β 1,1 x log( P 22 P 21 ) = β 0,2 + β 1,2 x When the Markov chain has more than two states, a multinomial logistic regression model can be be applied. The first column is typically chosen as baseline. For example, the three states model is: log( P 12 ) = β 0,1 + β 1,1 x log( P 13 ) = β 0,2 + β 1,2 x P 11 P 11 log( P 22 P 21 ) = β 0,3 + β 1,3 x log( P 23 P 21 ) = β 0,4 + β 1,4 x log( P 32 P 31 ) = β 0,5 + β 1,5 x log( P 33 P 31 ) = β 0,6 + β 1,6 x Figure 3.5 illustrates the basic infrastructure of the HMM. Both the historical data (traffic volume and travel time) and the observed data (real-time traffic volume) can be used to do prediction.

37 30 Figure 3.5: Hidden Markov Model: Flow Chart Model Estimation There are several methods to estimate the parameters in the HMM. One of the popular methods is EM algorithm (Baum et al. 1970, Bilmes 1998). Alternatives include the Viterbi and Gradient algorithm (Rabiner 1989). Bayesian method (Jean-Luc and Chin-Hui 1991) has also been proposed. There are some existing software packages specifically for fitting the HMM in R (Visser and Speekenbrink 2010). Although a Bayesian approach to HMM analysis does show some advantages in complex models, Rydn (2008) claimed that the results are generally similar, and it is sufficient to use EM algorithm in most practical problems. If we define: L k (t) = P (s t = k X) H k,l (t) = P (s t = k, s t+1 = l X) The L k (t) is the conditional probability of being at state k at time t given the entire observed sequence X. The H k,l (t) is the conditional probability of being at state k at time t and being at state l at time t + 1 given the entire observed sequence X.

38 31 The initial probabilities of state k (k = 1,..., M) can be estimated by: P (s 1 = k) T L k (t), t=1 M P (s 1 = k) = 1 The EM algorithm can be described as follows (Li and Gray 2000): E step Compute L k (t) and H k,l (t) under current parameter values. M step k=1 µ k = T t=1 L k(t)x t T t=1 L k(t) Σ k : Covariance = T t=1 L k(t)(x t µ k )(x t µ k ) T T t=1 L k(t) P (s t+1 = l s t = k): Transition probability = T 1 t=1 H k,l(t) T 1 t=1 L k(t) The forward-backward algorithm can be used in the estimation (E) step,. Define: a k (x 1,..., x t ) = P (x 1,..., x t, s t = k) b k (x t+1,..., x T ) = P (x t+1,..., x T s t = k) The forward algorithm is: a k (x 1 ) = P (s 1 = k) f k (x 1 ) M a k (x 1,..., x t ) = f k (x t ) a i (x 1,..., x t 1 )p ik where f k is the probability density of component k, p ik is the transition probability. The backward algorithm is: b k (x T +1,..., x T ) = 1 (Arbitrary setting) M b k (x t+1,..., x T ) = p ki f i (x t+1 )b i (x t+2,..., x T ) i=1 i=1

39 32 Then L k (t) and H k,l (t) can be estimated as: L k (t) = a k(x 1,..., x t )b k (x T +1,..., x T ) M i=1 a i(x 1,..., x t )b i (x T +1,..., x T ) H k,l (t) = M i=1 a k (x 1,..., x t )p kl f l (x t+1 )b k (x t+1,..., x T ) M j=1 a i(x 1,..., x t )p ij f j (x t+1 )b k (x t+1,..., x T ) The function L k (t) can be used to estimate the state to which an observation belongs. However, the estimation is based on individual observation can cause unwanted issues. For example, it is possible that the result shows that s t = 1 and s t+1 = 2; however, p 12 = 0, which makes the entire sequence meaningless. The Viterbi algorithm (Viterbi 1967) might be applied to obtain the sequence with largest posterior probability Bootstrap and Confidence Interval Confidence interval is the focus of classical statistical inference. Visser et al. (2000) proposed several ways to obtain the confidence interval of hidden Markov models: finite approximation of Hessian, profile likelihood, and bootstrap. He claimed that the results from first one are usually too narrow. Therefore, we evaluated the other two methods. The profile likelihood method is based on profile likelihood ratio and χ 2 distribution. The basic idea is to evaluate the change of the log-likelihood caused by a single parameter by treating all the other parameters as nuisance (Meeker and Escobar 1995). A profile likelihood for parameter β is defined as the likelihood function that all the other parameters are fixed at their MLEs: P L(β) = max L(β; δ) δ Suppose the MLE of β is ˆβ, it can be shown that: 2 (logp L(β) logp L( ˆβ)) χ 2 (1) asymptotically. Based on the χ 2 (1) distribution, we may derive the lower and upper bounds of the confidence interval easily. Figure 3.6 is an intuitive illustration, where Bm is the MLE while Bu and Bl are the upper and lower bounds.

40 33 Figure 3.6: Confidence Interval by Profile Likelihood The bootstrap idea is a popular technique to obtain confidence interval (Efron and Tibshirani 1994). However, a naive resampling method is not appropriate for the hidden Markov model because that will break the dependency structure of the original data. Parametric bootstrap can be applied to handle this issue. There are generally three ways to do parametric bootstrap with hidden Markov model. 1. Based on parameter estimation. 2. Based on original data. 3. Mixture of 1 and 2. For the first approach, the parameters are estimated by original data and a new data set will be simulated solely based on the parameter estimates. The basic assumption for this method is that the model is correctly specified. For the second approach, after model fitting the residuals are to be collected: r i = y i ŷ i The sampling with replacement is done within the set of residuals and new observations are generated as follows: yi new = ŷ i + ri new, ri new {r 1, r 2,...r N } The sample size of original data should be sufficiently large.

41 34 For the third approach, we assume that the random errors are i.i.d. normal: y new i = ŷ i + ri new, ri new N(0, σ 2 ) However, it is worth noting that the variance of the error terms might contain substantial heterogeneity. For example, suppose there are two states in a hidden Markov model. It is highly possible that the distributions in the two states have different variance structures. Therefore, adjustments must be applied for methods 2 and 3 (Bandeen-Roche et al. 1997, Wang et al. 2005): 1. Assign each observation to a group based on posterior probability. 2. Within each group, do the resampling of residuals. 3. Repeat 1 and 2. By bootstrapping, new data sets can be generated and parameter estimates will be evaluated. After that, there are two ways to generate a 95% confidence interval: either by 1.96 ˆσ β or by empirical 2.5% and 97.5% quantiles. The results should be close for sufficiently large data sets Determine the Number of Components A challenging issue in the hidden Markov model is to choose the proper number of components. several criteria and procedures have been proposed. In this section, we will present three general approaches to address this issue: likelihood ratio test, criteria-based model selection, and cross-validation. The likelihood ratio test (Neyman and Pearson, 1933) has been well known as an efficient way of model selection. Under certain regularity conditions, the log likelihood ratio under null hypothesis can be tested through the χ 2 test. However, it has been shown that two mixture distributions with different numbers of components cannot satisfy those regularity conditions (Wolfe, 1971). Wolfe claimed that a modified version of likelihood ratio test could possibly be applied: H 0 : n = c 0 H 1 : n = c 1, (c 1 > c 0 ) 2 N (N 1 d c 1 2 )(logl(c 0) logl(c 1 )) χ 2 2d ( ) c 1 c 0 where N is the sample size, n is the number of components, and d = c 1 c 0. The assumption that c 1 > c 0 is based on the statistics version of Occam s razor : We always prefer a simple model that might work. Unless there is strong evidence to support that a more complicated model is significantly better, we will stick to the simple model. Wolfe s approach only

42 35 provides a rough approximation but was easy to implement in those years when computing resources were limited. McLachlan (1987) proposed that bootstrap can be applied to obtain the approximate distribution of the log likelihood ratio test statistic under null hypothesis. The premise is to generate random samples by a mixture distribution with c 0 components, calculate the log likelihood ratio test statistics, and then establish the empirical distribution based on the observed test statistics. The generated empirical distribution can be used to calculate the p-value of the original data. The criteria-based model selection method has also gained popularity to asses the number of components in mixture models. The likelihood function, interpreted as a measure of goodness of model fitting, could not be used as a criterion to select the number of components in a mixture model due to its tendency to choose more complicated models (Biernacki et al. 2000). A number of criteria have been discussed to handle this issue. Usually these criteria add a penalty term along with the likelihood to represent the trade-off between model complexity and utility, for example, the AIC criteria (Akaike 1974). Hurvich and Tsai (1989) suggested the use of AICc (corrected AIC) instead of AIC, since AIC tends to overfit the data. The AICc is defined as: (k + 1) AICc = 2 logl + 2k (1 + N k 1 ) where k is the number of parameters. Another popular criterion is BIC (Schwarz 1978): BIC = 2 logl + k log(n) BIC generally adds more penalty on the model complexity compared to AIC, and it has been shown that BIC is equivalent to the Minimum Description Length(MDL) criterion (Rissanen 1978). Another useful criterion is Minimum Message Length (MML). MML (Wallace and Boulton 1968) was derived from the perspective of information theory. The process of modeling is considered as encoding the data and the model parameters can be considered as the extra cost of the encoding. Therefore, the length of an encoded message can be described as: Length(θ, Y ) = Length(θ) + Length(Y θ) If Length(θ) is short, then the model is simple but correspondingly Length(Y θ) will be long. The cross-validation approach was proposed by Celeux and Durand (2008). It was based on half-sampling. Celeux showed that if we pick the odd numbers of observations or the even numbers of observations in a data set generated by hidden Markov model, the result is still a

43 36 hidden Markov chain. Therefore, we may simply use the odd subset of the original sample to fit the model, and calculate the likelihood of the even subset of the original sample. The likelihood can be used as a criterion to proceed model selection Goodness of Fit Assessing the goodness of fit of a given hidden Markov model is an important topic. As in regular regression models, the residuals can be used. However, due to the inherent heterogeneity of the hidden Markov model, the residuals must be adjusted by classes, as we have seen in the previous section. Wang et al. (2005) showed that the class-adjusted residuals are asymptotically equivalent to the distributions of residuals from the latent classes. Zucchini and MacDonald (2009) proposed a different approach using the pseudo-residual. The pseudo-residual is defined as the probability of seeing a less extreme response than observed given all observations except that at time t: u t = P (Y t y t y i, i t) For well-fitted models, the pseudo residuals should be approximately U nif orm[0, 1] distributed. MacKay Altman (2004) provides an intuitive graphical approach similar to the Q-Q plot. By plotting the estimated distribution against the empirical distribution, the lack of fit can be detected with high probability for a large sample size. The estimated distribution is given by: K F (y θ) = π i F i (y θ i ) i=1 If the model is correctly specified, the plot of empirical against estimated distributions should be close to a straight line Prediction Predicting the future travel time from historical data in hidden Markov model can be conducted according to the following Markov property. Based on model specification, f(y t s t ) = { N(µ1, σ1), 2 if s t = 1 N(µ 2, σ2), 2 if s t = 2 We have: E(y t s t ) = { µ1, if s t = 1 µ 2, if s t = 2

44 37 If the model contains regression in the mean parameter, { N(µ1, σ f(y t s t, x t ) = 1), 2 if s t = 1 N(θ 0 + θ 1 x t, σ2), 2 if s t = 2 we have: E(y t s t, x t ) = { µ1, if s t = 1 θ 0 + θ 1 x t, if s t = 2 Given the states, the expected value of y t can be used as the predicted value of travel time. However, the state is unobservable so our prediction is actually the expected future travel time, E(y t y 1,...y t 1, x 1,..., x t 1 ). The key of this problem is to predict s t using historical data. Assume the initial distribution for the Markov chain s t is: ( ) ( ) P (s0 = 1) p0 A = = P (s 0 = 2) 1 p 0 The transition matrix is: Then the distribution of s 1 is: The distribution of s 2 is: and so on. ( ) P11 P T = 12 P 21 P 22 A T A T 2 The marginal distribution of s t at any time t can be estimated through Markov property. The transition matrix is estimated by the data. The initial distribution could be either set manually or estimated from the last observed travel time in the previous time period. If the transition matrix is modeled through regression: It is straightforward to show that: log( P 12 P 11 ) = β 0,1 + β 1,1 x log( P 22 P 21 ) = β 0,2 + β 1,2 x P 11 = exp(β 0,1 + β 1,1 x) exp(β 0,1 + β 1,1 x) P 12 = exp(β 0,1 + β 1,1 x) + 1 P 21 = exp(β 0,2 + β 1,2 x) exp(β 0,2 + β 1,2 x) P 22 = exp(β 0,2 + β 1,2 x) + 1

45 38 Consider a simple example: Suppose that during the time interval [7:00-7:59], the traffic volume is 8. The transition matrix is: log( P 12 P 11 ) = x log( P 22 ) = x P 21 ( ) T (8) = Since the distribution of first vehicle is unknown, we might use the non-informative prior: ( ) 0.5 A = 0.5 To predict the state of the first vehicle in the next time interval, i.e. s 8, it can be shown that: ( ) A T (8) 7 = That is, y 8 has 99.4% probability to be in the free-flow state and µ 1 can be used asa predicted value. Figure 3.7 is an overview of the prediction procedures in the hidden Markov model. Figure 3.7: Hidden Markov Model: Estimation

46 Simulation Study No Covariate First, consider a simple case in which the covariate is not present in the model. We simulate 1,000 data sets, each with 5,000 observations, according to the transition matrix (values are based on estimates from real data): ( ) Each data set will be fitted by both traditional and hidden Markov models. As can be seen from Table 3.1, the confidence interval estimates of HMM are slightly narrower. Table 3.1: HMM vs. Traditional: No Covariate 1 Name True Traditional 95% C.I. HMM 95% C.I. µ (578.4, 581.2) (578.4, 581.1) µ (989.8, ) (1001.2,1062.5) σ (40.0, 42.1) 40.9 (40.1, 42.1) σ (348.4, 389.0) (350.3, 388.9)

47 40 Figure 3.8: HMM vs. Traditional 1 Figure 3.8 indicates that the log-likelihoods of HMM models are larger than that of traditional models. The mean difference in log-likelihood is around 951, which is a substantial difference. We also tried another set of parameters and Table 3.2 also implies that HMM can generate slightly narrower confidence intervals. Table 3.2: HMM vs. Traditional: No Covariate 2 Name True Traditional 95% C.I. HMM 95% C.I. µ (578.7,581.0) (578.8,581.2) µ (708.1,793.4) (710.1,788.0) σ (40.1, 42.1) 41.0 ( ) σ (341.3,396.3) (340.6,394.0)

48 With Covariate When the covariate is considered in the model, the simulation also indicated that the HMM is superior to the traditional mixture model. We simulated 500 data sets, each with 5,000 observations, according to the parameters setting (values are based on estimates from real data): { N(log(500), σ1 = 0.07), if s log(y) t = 1 N(log(1000), σ 2 = 0.31), if s t = 2 log( P 12 P 11 ) = x log( P 22 P 21 ) = x Due to computing issues, we use log transform of the original data and the log likelihood values will be changed accordingly. Figure 3.9 indicates that the log-likelihoods of HMM models are larger than that of traditional models. The mean difference in log-likelihood is around 997, which is a substantial difference.

49 42 Figure 3.9: HMM vs. Traditional 2 Figures 3.10 and 3.11 clearly demonstrate the advantage of HMM. Both the mean estimates and the variance estimates from HMM are superior.

50 Figure 3.10: HMM vs. Traditional 3 43

51 44 Figure 3.11: HMM vs. Traditional 4 Figure 3.12 illustrates the 95% confidence intervals of several parameters estimates of hidden markov model. The estimates from different samples are relatively symmetric and centered at the true values.

52 45 Figure 3.12: 95% C.I. of HMM Table 3.3 provides the numbers in Figure 3.12.

53 46 Table 3.3: Parameter Estimation of HMM Name True 95% C.I. µ (499,501) µ (977.7,1026.7) σ (0.070, 0.071) σ (0.28,0.32) β 0,1-6 (-7.04,-5.07) β 1,1 0.1 (0.03,0.16) β 0,2 0.6 (-0.56,1.52) β 1, (0.09, 0.23) 3.5 Application for Field-Collected Data The proposed model was applied to the data collected from the I-35 near San Antonio, Texas. The actual travel time of each vehicle was measured when vehicles equipped with ratio frequency tags passed automatic vehicle identification (AVI) stations. The AVI data collection approach collects accurate travel time but only measured the travel time of vehicles equipped with electronic ratio frequency equipment. Therefore, the collected data represent the actual traffic flow by a scale. Figure 3.13 illustrates the sampling scheme and the observations can be considered as proportional to the actual traffic flow. In order to predict the travel time with higher precision, the sampling rate could be scaled up (Figure 3.14) but the basic modeling steps are identical. Figure 3.13: Illustration of Low Sampling Rate

54 47 Figure 3.14: Illustration of Potential Improvement The number of hidden states in real data can be determined through the likelihood ratio test. We first evaluate whether two states are sufficient to depict the hidden structure from the data. Three or more states will be considered if two-state model does not provide sufficient fitting: H 0 : n = 2 H 1 : n = 3 Since the log likelihood ratio does not follow χ 2 distribution, the bootstrap sampling method was adopted. Figure 3.15 shows the histogram of the log likelihood ratio from 500 samples.

Travel Time Reliability Modeling

Travel Time Reliability Modeling Travel Time Reliability Modeling The Pennsylvania State University University of Maryland University of Virginia Virginia Polytechnic Institute & State University West Virginia University The Pennsylvania

More information

Multistate Model for Travel Time Reliability

Multistate Model for Travel Time Reliability Multistate Model for Travel Time Reliability Feng Guo, Hesham Rakha, and Sangjun Park Travel time unreliability is an important characterization of transportation systems. The appropriate modeling and

More information

Latent Class Model in Transportation Study

Latent Class Model in Transportation Study Latent Class Model in Transportation Study Dengfeng Zhang Dissertation submitted to the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

A Fully Nonparametric Modeling Approach to. BNP Binary Regression A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 15-7th March 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Mixture and composition of kernels. Hybrid algorithms. Examples Overview

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Marginal Specifications and a Gaussian Copula Estimation

Marginal Specifications and a Gaussian Copula Estimation Marginal Specifications and a Gaussian Copula Estimation Kazim Azam Abstract Multivariate analysis involving random variables of different type like count, continuous or mixture of both is frequently required

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Multivariate Survival Analysis

Multivariate Survival Analysis Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in

More information

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

Petr Volf. Model for Difference of Two Series of Poisson-like Count Data

Petr Volf. Model for Difference of Two Series of Poisson-like Count Data Petr Volf Institute of Information Theory and Automation Academy of Sciences of the Czech Republic Pod vodárenskou věží 4, 182 8 Praha 8 e-mail: volf@utia.cas.cz Model for Difference of Two Series of Poisson-like

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Jeffrey N. Rouder Francis Tuerlinckx Paul L. Speckman Jun Lu & Pablo Gomez May 4 008 1 The Weibull regression model

More information

STA 216, GLM, Lecture 16. October 29, 2007

STA 216, GLM, Lecture 16. October 29, 2007 STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural

More information

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr.

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr. Simulation Discrete-Event System Simulation Chapter 8 Input Modeling Purpose & Overview Input models provide the driving force for a simulation model. The quality of the output is no better than the quality

More information

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined

More information

Lecture 8: The Metropolis-Hastings Algorithm

Lecture 8: The Metropolis-Hastings Algorithm 30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:

More information

Fitting Narrow Emission Lines in X-ray Spectra

Fitting Narrow Emission Lines in X-ray Spectra Outline Fitting Narrow Emission Lines in X-ray Spectra Taeyoung Park Department of Statistics, University of Pittsburgh October 11, 2007 Outline of Presentation Outline This talk has three components:

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Analysing geoadditive regression data: a mixed model approach

Analysing geoadditive regression data: a mixed model approach Analysing geoadditive regression data: a mixed model approach Institut für Statistik, Ludwig-Maximilians-Universität München Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005 Spatio-temporal regression

More information

ABC methods for phase-type distributions with applications in insurance risk problems

ABC methods for phase-type distributions with applications in insurance risk problems ABC methods for phase-type with applications problems Concepcion Ausin, Department of Statistics, Universidad Carlos III de Madrid Joint work with: Pedro Galeano, Universidad Carlos III de Madrid Simon

More information

Modeling and Performance Analysis with Discrete-Event Simulation

Modeling and Performance Analysis with Discrete-Event Simulation Simulation Modeling and Performance Analysis with Discrete-Event Simulation Chapter 9 Input Modeling Contents Data Collection Identifying the Distribution with Data Parameter Estimation Goodness-of-Fit

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Reliability Monitoring Using Log Gaussian Process Regression

Reliability Monitoring Using Log Gaussian Process Regression COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data. Fred Mannering University of South Florida

Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data. Fred Mannering University of South Florida Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data Fred Mannering University of South Florida Highway Accidents Cost the lives of 1.25 million people per year Leading cause

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Modelling geoadditive survival data

Modelling geoadditive survival data Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1 Contents Preface to Second Edition Preface to First Edition Abbreviations xv xvii xix PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1 1 The Role of Statistical Methods in Modern Industry and Services

More information

Lecture 16: Mixtures of Generalized Linear Models

Lecture 16: Mixtures of Generalized Linear Models Lecture 16: Mixtures of Generalized Linear Models October 26, 2006 Setting Outline Often, a single GLM may be insufficiently flexible to characterize the data Setting Often, a single GLM may be insufficiently

More information

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline

More information

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Hidden Markov Models for precipitation

Hidden Markov Models for precipitation Hidden Markov Models for precipitation Pierre Ailliot Université de Brest Joint work with Peter Thomson Statistics Research Associates (NZ) Page 1 Context Part of the project Climate-related risks for

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

The Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic

The Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic he Polya-Gamma Gibbs Sampler for Bayesian Logistic Regression is Uniformly Ergodic Hee Min Choi and James P. Hobert Department of Statistics University of Florida August 013 Abstract One of the most widely

More information

Introduction to Bayesian methods in inverse problems

Introduction to Bayesian methods in inverse problems Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK. Contents Introduction

More information

Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies. Abstract

Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies. Abstract Bayesian Estimation of A Distance Functional Weight Matrix Model Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies Abstract This paper considers the distance functional weight

More information

TRAFFIC FLOW MODELING AND FORECASTING THROUGH VECTOR AUTOREGRESSIVE AND DYNAMIC SPACE TIME MODELS

TRAFFIC FLOW MODELING AND FORECASTING THROUGH VECTOR AUTOREGRESSIVE AND DYNAMIC SPACE TIME MODELS TRAFFIC FLOW MODELING AND FORECASTING THROUGH VECTOR AUTOREGRESSIVE AND DYNAMIC SPACE TIME MODELS Kamarianakis Ioannis*, Prastacos Poulicos Foundation for Research and Technology, Institute of Applied

More information

Generalized Method of Moments Estimation

Generalized Method of Moments Estimation Generalized Method of Moments Estimation Lars Peter Hansen March 0, 2007 Introduction Generalized methods of moments (GMM) refers to a class of estimators which are constructed from exploiting the sample

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Tihomir Asparouhov 1, Bengt Muthen 2 Muthen & Muthen 1 UCLA 2 Abstract Multilevel analysis often leads to modeling

More information

Dynamic System Identification using HDMR-Bayesian Technique

Dynamic System Identification using HDMR-Bayesian Technique Dynamic System Identification using HDMR-Bayesian Technique *Shereena O A 1) and Dr. B N Rao 2) 1), 2) Department of Civil Engineering, IIT Madras, Chennai 600036, Tamil Nadu, India 1) ce14d020@smail.iitm.ac.in

More information

Modeling conditional distributions with mixture models: Theory and Inference

Modeling conditional distributions with mixture models: Theory and Inference Modeling conditional distributions with mixture models: Theory and Inference John Geweke University of Iowa, USA Journal of Applied Econometrics Invited Lecture Università di Venezia Italia June 2, 2005

More information

COPYRIGHTED MATERIAL CONTENTS. Preface Preface to the First Edition

COPYRIGHTED MATERIAL CONTENTS. Preface Preface to the First Edition Preface Preface to the First Edition xi xiii 1 Basic Probability Theory 1 1.1 Introduction 1 1.2 Sample Spaces and Events 3 1.3 The Axioms of Probability 7 1.4 Finite Sample Spaces and Combinatorics 15

More information

Bayes methods for categorical data. April 25, 2017

Bayes methods for categorical data. April 25, 2017 Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,

More information

Freeway rear-end collision risk for Italian freeways. An extreme value theory approach

Freeway rear-end collision risk for Italian freeways. An extreme value theory approach XXII SIDT National Scientific Seminar Politecnico di Bari 14 15 SETTEMBRE 2017 Freeway rear-end collision risk for Italian freeways. An extreme value theory approach Gregorio Gecchele Federico Orsini University

More information

Bayesian Analysis of Latent Variable Models using Mplus

Bayesian Analysis of Latent Variable Models using Mplus Bayesian Analysis of Latent Variable Models using Mplus Tihomir Asparouhov and Bengt Muthén Version 2 June 29, 2010 1 1 Introduction In this paper we describe some of the modeling possibilities that are

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Bayesian Dynamic Linear Modelling for. Complex Computer Models

Bayesian Dynamic Linear Modelling for. Complex Computer Models Bayesian Dynamic Linear Modelling for Complex Computer Models Fei Liu, Liang Zhang, Mike West Abstract Computer models may have functional outputs. With no loss of generality, we assume that a single computer

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

A REVIEW AND APPLICATION OF HIDDEN MARKOV MODELS AND DOUBLE CHAIN MARKOV MODELS

A REVIEW AND APPLICATION OF HIDDEN MARKOV MODELS AND DOUBLE CHAIN MARKOV MODELS A REVIEW AND APPLICATION OF HIDDEN MARKOV MODELS AND DOUBLE CHAIN MARKOV MODELS Michael Ryan Hoff A Dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment

More information

Stat 516, Homework 1

Stat 516, Homework 1 Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008 A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. Linear-in-Parameters Models: IV versus Control Functions 2. Correlated

More information

BAYESIAN MODELING OF DYNAMIC SOFTWARE GROWTH CURVE MODELS

BAYESIAN MODELING OF DYNAMIC SOFTWARE GROWTH CURVE MODELS BAYESIAN MODELING OF DYNAMIC SOFTWARE GROWTH CURVE MODELS Zhaohui Liu, Nalini Ravishanker, University of Connecticut Bonnie K. Ray, IBM Watson Research Center Department of Mathematical Sciences, IBM Watson

More information

HANDBOOK OF APPLICABLE MATHEMATICS

HANDBOOK OF APPLICABLE MATHEMATICS HANDBOOK OF APPLICABLE MATHEMATICS Chief Editor: Walter Ledermann Volume VI: Statistics PART A Edited by Emlyn Lloyd University of Lancaster A Wiley-Interscience Publication JOHN WILEY & SONS Chichester

More information

STOCHASTIC MODELING OF ENVIRONMENTAL TIME SERIES. Richard W. Katz LECTURE 5

STOCHASTIC MODELING OF ENVIRONMENTAL TIME SERIES. Richard W. Katz LECTURE 5 STOCHASTIC MODELING OF ENVIRONMENTAL TIME SERIES Richard W Katz LECTURE 5 (1) Hidden Markov Models: Applications (2) Hidden Markov Models: Viterbi Algorithm (3) Non-Homogeneous Hidden Markov Model (1)

More information

A short introduction to INLA and R-INLA

A short introduction to INLA and R-INLA A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk

More information

Index. Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables.

Index. Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables. Index Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables. Adaptive rejection metropolis sampling (ARMS), 98 Adaptive shrinkage, 132 Advanced Photo System (APS), 255 Aggregation

More information

VCMC: Variational Consensus Monte Carlo

VCMC: Variational Consensus Monte Carlo VCMC: Variational Consensus Monte Carlo Maxim Rabinovich, Elaine Angelino, Michael I. Jordan Berkeley Vision and Learning Center September 22, 2015 probabilistic models! sky fog bridge water grass object

More information

CENTER FOR INFRASTRUCTURE ENGINEERING STUDIES

CENTER FOR INFRASTRUCTURE ENGINEERING STUDIES 1 CENTER FOR INFRASTRUCTURE ENGINEERING STUDIES Acquisition of an Engineering Seismograph By Dr. Neil Anderson UTC RE116 University Transportation Center Program at The University of Missouri-Rolla 2 Disclaimer

More information

The STS Surgeon Composite Technical Appendix

The STS Surgeon Composite Technical Appendix The STS Surgeon Composite Technical Appendix Overview Surgeon-specific risk-adjusted operative operative mortality and major complication rates were estimated using a bivariate random-effects logistic

More information

Applied Probability and Stochastic Processes

Applied Probability and Stochastic Processes Applied Probability and Stochastic Processes In Engineering and Physical Sciences MICHEL K. OCHI University of Florida A Wiley-Interscience Publication JOHN WILEY & SONS New York - Chichester Brisbane

More information

Long-Run Covariability

Long-Run Covariability Long-Run Covariability Ulrich K. Müller and Mark W. Watson Princeton University October 2016 Motivation Study the long-run covariability/relationship between economic variables great ratios, long-run Phillips

More information

1 What is a hidden Markov model?

1 What is a hidden Markov model? 1 What is a hidden Markov model? Consider a Markov chain {X k }, where k is a non-negative integer. Suppose {X k } embedded in signals corrupted by some noise. Indeed, {X k } is hidden due to noise and

More information

ST 740: Markov Chain Monte Carlo

ST 740: Markov Chain Monte Carlo ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:

More information

ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS

ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS 1. THE CLASS OF MODELS y t {y s, s < t} p(y t θ t, {y s, s < t}) θ t = θ(s t ) P[S t = i S t 1 = j] = h ij. 2. WHAT S HANDY ABOUT IT Evaluating the

More information

Machine Learning Overview

Machine Learning Overview Machine Learning Overview Sargur N. Srihari University at Buffalo, State University of New York USA 1 Outline 1. What is Machine Learning (ML)? 2. Types of Information Processing Problems Solved 1. Regression

More information

A Bayesian Approach to Phylogenetics

A Bayesian Approach to Phylogenetics A Bayesian Approach to Phylogenetics Niklas Wahlberg Based largely on slides by Paul Lewis (www.eeb.uconn.edu) An Introduction to Bayesian Phylogenetics Bayesian inference in general Markov chain Monte

More information

SUPPLEMENT TO MARKET ENTRY COSTS, PRODUCER HETEROGENEITY, AND EXPORT DYNAMICS (Econometrica, Vol. 75, No. 3, May 2007, )

SUPPLEMENT TO MARKET ENTRY COSTS, PRODUCER HETEROGENEITY, AND EXPORT DYNAMICS (Econometrica, Vol. 75, No. 3, May 2007, ) Econometrica Supplementary Material SUPPLEMENT TO MARKET ENTRY COSTS, PRODUCER HETEROGENEITY, AND EXPORT DYNAMICS (Econometrica, Vol. 75, No. 3, May 2007, 653 710) BY SANGHAMITRA DAS, MARK ROBERTS, AND

More information

Interpretation of Reflection Seismic Data Acquired for Knight Hawk Coal, LLC

Interpretation of Reflection Seismic Data Acquired for Knight Hawk Coal, LLC Interpretation of Reflection Seismic Data Acquired for Knight Hawk Coal, LLC by Neil Anderson Professor & Professional Geologist Missouri University of Science and Technology NUTC R256 Disclaimer The contents

More information

Bayesian Methods with Monte Carlo Markov Chains II

Bayesian Methods with Monte Carlo Markov Chains II Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University hslu@stat.nctu.edu.tw http://tigpbp.iis.sinica.edu.tw/courses.htm 1 Part 3

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

Bayesian Multivariate Logistic Regression

Bayesian Multivariate Logistic Regression Bayesian Multivariate Logistic Regression Sean M. O Brien and David B. Dunson Biostatistics Branch National Institute of Environmental Health Sciences Research Triangle Park, NC 1 Goals Brief review of

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Sequential Monte Carlo Methods for Bayesian Computation

Sequential Monte Carlo Methods for Bayesian Computation Sequential Monte Carlo Methods for Bayesian Computation A. Doucet Kyoto Sept. 2012 A. Doucet (MLSS Sept. 2012) Sept. 2012 1 / 136 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter

More information

F9 F10: Autocorrelation

F9 F10: Autocorrelation F9 F10: Autocorrelation Feng Li Department of Statistics, Stockholm University Introduction In the classic regression model we assume cov(u i, u j x i, x k ) = E(u i, u j ) = 0 What if we break the assumption?

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

6 Markov Chain Monte Carlo (MCMC)

6 Markov Chain Monte Carlo (MCMC) 6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution

More information

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Environmentrics 00, 1 12 DOI: 10.1002/env.XXXX Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Regina Wu a and Cari G. Kaufman a Summary: Fitting a Bayesian model to spatial

More information

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci

More information

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study TECHNICAL REPORT # 59 MAY 2013 Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study Sergey Tarima, Peng He, Tao Wang, Aniko Szabo Division of Biostatistics,

More information

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional

More information