Gaussian Processes for Short-Term Traffic Volume Forecasting

Size: px
Start display at page:

Download "Gaussian Processes for Short-Term Traffic Volume Forecasting"

Transcription

1 Gaussian Processes for Short-Term Traffic Volume Forecasting Yuanchang Xie, Kaiguang Zhao, Ying Sun, and Dawei Chen The accurate modeling and forecasting of traffic flow data such as volume and travel time are critical to intelligent transportation systems. Many forecasting models have been developed for this purpose since the 970s. Recently kernel-based machine learning methods such as support vector machines (SVMs) have gained special attention in traffic flow modeling and other time series analyses because of their outstanding generalization capability and superior nonlinear approximation. In this study, a novel kernel-based machine learning method, the Gaussian processes (GPs) model, was proposed to perform short-term traffic flow forecasting. This GP model was evaluated and compared with SVMs and autoregressive integrated moving average (ARIMA) models based on four sets of traffic volume data collected from three interstate highways in Seattle, Washington. The comparative results showed that the GP and SVM models consistently outperformed the ARIMA model. This study also showed that because the GP model is formulated in a full Bayesian framework, it can allow for explicit probabilistic interpretation of forecasting outputs. This capacity gives the GP an advantage over SVMs to model and forecast traffic flow. The accurate modeling and forecasting of traffic flow data, such as volume, speed, and travel time, are critical to intelligent transportation systems (ITS), especially advanced traveler information systems (ATIS) and advanced traffic management systems (ATMS). Given reliable real-time traffic flow predictions, travelers can choose the best routes dynamically. Also, such information can be used by traffic management personnel to develop proactive traffic control strategies that make better use of the available road network resources. The success of many ATIS and ATMS applications depends largely on the accuracy of the selected traffic flow modeling and forecasting algorithms. Numerous methods have been developed and compared since the 970s to improve the accuracy of traffic flow forecasting. These methods can generally be categorized into the following groups: autoregressive integrated moving average (ARIMA) models ( 3), nonparametric regression (4 5), Kalman filtering theory (6 8), neural networks (9 5), support vector machines (SVMs) (6 7), and hybrid models (8). Of the existing traffic flow forecasting methods, neural networks are the most widely used ones. One major reason is that neural networks have a strong function approx- Y. Xie, Civil and Mechanical Engineering Technology, South Carolina State University, Orangeburg, SC 97. K. Zhao, Spatial Science Lab, and Y. Sun, Department of Statistics, Texas A&M University, College Station, TX D. Chen, School of Transportation, Southeast University, Nanjing, Jiangsu, China. Corresponding author: Y. Xie, yxie@scsu.edu. Transportation Research Record: Journal of the Transportation Research Board, No. 65, Transportation Research Board of the National Academies, Washington, D.C., 00, pp DOI: 0.34/65-08 imation capability and can better model the complicated relationship between historical and future traffic flow data than other methods (9). The application of neural networks does not require an explicit model formulation to be specified, as is usually required. Despite the many attractive features of neural networks, their application is not an easy task. Model training and selection involve tricky decisions with regard to network architectures, type of transfer (activation) functions, learning rate, and number of hidden neurons (0). Cautions must be taken during the training of neural networks to prevent overfitting the training data and to avoid local minima. To address these problems, SVMs have been introduced (6 7). Similar to neural networks, SVMs have superior function approximation capability and do not require the specification of model formulations. In addition, they are developed on the structural risk minimization (SRM) principle (), as opposed to the empirical risk minimization (ERM) principle used in conventional neural networks. Theoretically then, SVMs can better solve the overfitting problem, and they have better generalization capabilities than do conventional neural networks. Another important feature is the SVM capacity to guarantee a globally optimal solution for a given training data set (6 7). A v-support vector machine (v-svm) model was compared with multilayer, feed-forward neural networks by using traffic volume data collected from interstates (I-5, I-90, and I-405) in the Seattle area. The comparison resulted in favor of the v-svm model (6). Gaussian processes (GPs) are another important class of kernelbased learning algorithms that have attracted attention in the machine learning community ( 3). Similar to other popular kernel machines, such as SVMs, GP models are powerful tools to explore implicit relationships between a set of variables based on a training data set, which make GPs especially useful to address difficult nonlinear regression and classification problems (0). A particularly attractive feature of GPs is their formulation in a full Bayesian framework, which allows for explicit probabilistic interpretation of model outputs (). Moreover, GP model parameters (e.g., kernel parameters) can be computed naturally by means of Bayesian learning, as opposed to the grid-searching, trial-and-error method commonly used to optimize classical SVMs (). Hence, some researchers refer to GPs as Bayesian SVMs (4). The superior performance of GPs for difficult supervised learning problems has been demonstrated in many domain-specific applications when compared against conventional methods such as neural networks and other advanced learning algorithms such as SVMs (0, 5). In the study reported on here, a GP regression model was adopted to model and predict traffic volume data. This GP model can be used to forecast travel speed and travel time as well. Like SVMs, GP models are kernel-based machine learning methods and possess many of the same desirable features. More important, they can produce better informative outputs than SVMs and neural networks, which makes it easier to interpret GP prediction results. 69

2 70 Transportation Research Record 65 GAUSSIAN PROCESSES General Formulation of GPs GPs provide a Bayesian paradigm to learn an implicit functional relationship ŷ = f(x) from a given training data set, D = {(x i, y i )} n i=. x i R d represents the vector of observed input variables (i.e., predictors) in a d-dimensional feature space, and y i is the one-dimensional observed target value (i.e., response variable) that is either continuous or discrete. Unlike most classical Bayesian models, GPs directly elicit a prior distribution on the whole function f(x). Specifically, f(x) is treated as a random field and is assumed to be a GP a prior: ( ( ) ) ( ( ) ( )) p f x GP m x, k x, x ( ) where the prior GP is fully specified by a mean function m(x) and a covariance function k(x, x ), and denotes the prior s hyperparameters used to parameterize the covariance function; that is, k(x, x ) = k(x, x ; ). Strictly speaking, a GP model can be treated as a probability distribution defined over functions such that E[ f(x)] = m(x) and Cov[ f(x), f(x )] = k(x, x ) where f(x) and f(x ) are random variables indexed by any pair of x and x. In such a sense, a GP prior can be roughly deemed as a probability distribution for an infinite number of random variables. Furthermore, a collection of function values that are indexed by any finite number of X = [x, x,..., x n ] T, i.e., f(x) = [ f(x ), f(x ),..., f(x n )] T, assumes a multivariate normal distribution ( ( )) = ( ( ) ( )) p f X N m X, K X, X ( ) where the mean vector m(x) and covariance matrix K(X, X) are determined directly from m( ) and k(, ), namely, m(x) = [m(x ), m(x ),..., m(x n )] T and K ij = k(x i, x j ), i, j =,..., n. For ease of presentation but without loss of generality, m(x) = 0 is assumed, because in practice the data can always be centered with respect to the sample mean. In the machine learning term, k(x, x ) is often called a kernel function or simply a kernel rather than a covariance function. As detailed later, kernel functions usually take certain forms that are parameterized by one or more parameters. Accordingly, specifying a GP prior p( f(x) ) GP(m(x), k(x, x )) is to determine a specific type of kernel (covariance function) and the associated values. Once a GP prior p(f ) and a noise model p(y f) are specified, the posterior distribution of f, given the training data D, p(f D, ), can be readily derived by updating the prior p(f ) according to the Bayes theorem: p yf p fx ( ) ( ) (, ) = p( D ) p fd, where the input variables X (i.e., the indices for f) have been made explicit in the prior. The term p(d ) is called marginal likelihood as it is a function of, given D, and the noise model p(y f) is also known as likelihood, which is a function of f for a fixed set of observations y. The p(y f) is introduced because in practice y i is a corrupted version of f(x i ) as the result of certain noises or measurement errors. With the posterior p(f D, ), prediction distribution at a new input x * is obtained by using ( ) = ( ) p f p f d * x *, D, *, f D, f ( 4 ) () 3 By combining Equation 4 and the noise model, the predictive distribution for y * can also be obtained as from which not only the predicted mean but also the associated uncertainty (error-bar) could be computed. In GP modeling it is a collection of function values f(x) not x itself that needs to be Gaussian. In fact, the input variables x are assumed to be distribution-free. In other words, the GP model theoretically can handle data with any kinds of distributions. Interested readers can refer to Rasmussen and Williams (0), MacKay (), and Seeger (3) for more information. GP Regression Model The aforementioned GP models will solve nonlinear regression problems when the response variables y i are continuous and a normal distribution is assumed for the noise model p(y f ). Specifically, y i is subject to independent and identically distributed (i.i.d.) normal errors with a mean of zero and a variance of σ : In such a case, the inference of GP models becomes analytically tractable as a result of the Gaussianity of p(y f ); accordingly, the resultant posterior and predictive distributions as given in Equations 3 through 5 all reduce to normal distributions. For a new input x *, the predictive mean and variance associated with ŷ * = f(x * ) = f * are given by Equations 7 and 8, respectively (0). μ ( ) = ( ) ( ) p y x p y f p f df * *, D, x * * * *, D, () 5 * y f x N 0, σ ( 6) i i i i and = ( ) + ( ) ( ) = ( ) ( ) + f k x, X, σ ( ) * * K X X I y 7 ( ) = ( ) ( ) ( ) + Var f k x x x, * *, k X,, * * K X X I k X x () * where X and y = observed predictors and response variables in D = {(x i, y i )} n i=, I = n n identity matrix, and k(x *, X), with its ith element being k(x *, x i ), = n vector denoting the covariance of f * with f(x), and k(x, x * ) T = k(x *, X). Kernels and Learning Hyperparameters Equations 7 and 8 show that to fit and apply a GP regression model amounts to the choice of a kernel and the specification of its parameters (i.e., hyperparameters). In machine learning, the most commonly used kernels include the polynomial kernel, the radial basis function (RBF), and the automatic relevance determination (ARD) as given by Rasmussen and Williams (0). ( ) = ( + ) σ ( ) 8 T p kpoly xx, ; σ0, Σp, p σ0 xσpx ( 9)

3 Xie, Zhao, Sun, and Chen 7 k RBF kard xx, ; σ0, l,..., ld σ0exp where σ 0, p, l, l i and Σ p are hyperparameters of the corresponding kernels that have been symbolized as in Equations 3 through 5. The is called a hyperparameter because, in the Bayesian framework of GPs, the unknown function f itself is a parameter as the result of the prior p( f ) placed on f. A common hyperparameter of the above kernels is the variance σ 0, which plays the same role as the tradeoff parameter of SVMs. However, a GP kernel and its hyperparameters are more interpretable than those of SVMs because the GP kernel represents the degree of correlation between function values at two inputs. For example, the hyperparameter l in Equation 0, or l i in Equation, refers to a characteristic length that represents a distance in the input space beyond which function values become less relevant. The magnitude of l i in the ARD kernel indicates the inference capability of the ith input variable. Very large values of l i will downplay or eliminate the influences of irrelevant input dimensions. As such, the ARD provides a parameterization scheme for automatic feature reduction, which has proved effective when handling highdimensional problems (5). Most studies have confirmed the superior performance of the RBF kernel (6 7, 0, 5). Therefore, only the RBF kernel was examined, and no comparison between kernels was made in the study reported here. In practice, rather than guess at an initial value for the hyperparameter, it is advantageous to learn its informative value ˆ in favor of the training data D. In the Bayesian formulation of GP models, the posterior of is given by p D p ( ) ( ) ( ) = p( D) p D x x ( xx, ; σ0, l)= σ0 exp ( 0) l ( ) = Then, the optimal value ˆ can be obtained naturally as the maximum a posteriori (MAP) estimate of p( D) (0). Because of the lack of prior knowledge, p(θ) is assumed to be flat (i.e., a noninformative prior). In such a case, the MAP estimate ˆ is pinpointed by maximizing the following marginal likelihood p(d ): ( ) = ( ) ( ) i=... d ( xx, ) l p D p y f p f df ( 3) Such a procedure is also known as Type maximum likelihood. For the GP regression models of Equations 6 through 8, the log marginal likelihood can be expressed as it was in Rasmussen and Williams (0): i i T n log p( D ) = y K( X, X) y log K( X, X) log π ( ) ( ) ( 4) where denotes the determinant of a matrix. The gradient of log p(d θ) with respect to is ( ) The maximization of log p(d θ) with respect to can be implemented with any general gradient-based optimization techniques, and in this study a conjugate gradient optimization method was employed, similar to the one used by Rasmussen and Williams (0). The global optimization may be trapped into local maxima if there are a large number of hyperparameters (e.g., when the ARD kernel is used for feature selection over high-dimensional inputs). As a remedy, it is common practice to perform the optimization multiple times with random initial values and to select the one that yields the highest marginal likelihood. MODEL TESTING AND RESULT ANALYSIS Data Description To facilitate model comparison, the same data set used in Zhang and Xie (6) was used again here. The traffic volume data were obtained from the traffic data acquisition and distribution (TDAD) database maintained by an ITS research group at the University of Washington, Seattle. Specifically, traffic volume data from four detectors located on three interstate highways in the Seattle area were used. The approximate locations of the four detectors are shown in Figure. Detailed information about the four detectors follows. Detector Data Collection Dataset Direction Name Period Southbound ES-088D June 6, 005 to July 3, 005 Eastbound ES-855D 3 Northbound ES-645D 4 Northbound ES-708D A total of four sets of traffic volume data was obtained from these detectors. Each data set contained 8 days of data. The raw traffic volume data were aggregated by using 5-min intervals, and a single day generated 96 data points. The first 4 days of data from each data set are plotted in Figure to show the general trends. It is easy to see that ES-008D ES-855D ES-708D log p( D ) T = ykxx (, ) θ i tr K X, X ( ) (, ) K X X y θi K( X, X) θ i ( 5) FIGURE ES-645D Approximate locations of detectors.

4 7 Transportation Research Record 65 Dataset (Detector ES-088D, Southbound) Dataset (Detector ES-855D, Eastbound) Dataset 3 (Detector ES-645D, Northbound) (c) Dataset 4 (Detector ES-708D, Northbound) (d) FIGURE First 4 days of data from each detector.

5 Xie, Zhao, Sun, and Chen 73 observed traffic volume data v v v 3 v 4 v i v n-3 v n- v n- v n n-l x y x y v v L v L+ v v L v L+ v v L+ v L+ v v L+ v L+3 n-l- v n--l v n-3 v n- v n--l v n- v n- v n--l v n- v n v n-l v n- v n one-step ahead prediction two-step ahead prediction FIGURE 3 Predictions: one-step-ahead and two-step-ahead. Data Sets through 3 showed similar patterns but different traffic volume levels. Their weekday traffic clearly had two peak periods. In Data Set 4, the effect of morning rush hour was not as obvious. Model Fitting The same data sets discussed above were used by Zhang and Xie to evaluate v-svms and to compare them with a multilayer, feedforward neural networks model (6). Since their results showed that the v-svm model consistently outperformed the neural networks model, here it was compared only with the proposed GP model. ARIMA models also were fitted and compared with the GP and v-svm models. Thus three types of models were compared in the study reported here. For all three models, the first 3 weeks of data were used for fitting models; the last week of data was used for prediction tests. These three types of models were compared primarily on the basis of their prediction performance. Both one- and two-step-ahead prediction results were compared. Figure 3 shows the difference between one- and two-step-ahead predictions, where n was the total observed traffic volume data points, and L was the model input length. In Figure 3, v i represents the aggregate traffic count for a 5-min period. By using the first input as one example, both predictions took the same vector x =[v, v,..., v L ] as the input. However, the values to be predicted (outputs) for the one- and two-step-ahead predictions were v L and v L+, respectively. v-svm and GP Models As discussed earlier, to fit the v-svm and GP models conceptually is straightforward. It does not require users to specify an explicit model formulation. Take the one-step-ahead prediction as an example: for each data point to be predicted or modeled as output, the 4 data points immediately preceding it are used as model input. Thus, for a training data set of length 96, it can generate 7 training inputs. The input dimension 4 is determined on the basis of an autocorrelation function (ACF) method (6). For each of the four data sets, the ACF values were evaluated at different time lags. The first time lag, in which the ACF value was zero, was selected as the input dimension. Based on the ACF values, an input dimension of 4 was selected for all four data sets. According to the notations set out earlier in this paper on GPs, the one-step-ahead models to be fitted could be symbolized as yˆ = vol ˆ ( i) = f x = vol i, vol i,..., vol i 4 T ( 6) i ( i [ ( ) ( ) ( )] ) where vol(i) refers to the traffic volume at time step i, and f(x i ) is the implicit model form to be learned by either the v-svm or the GP algorithm. Compared with the GP model, the fitting of the v-svm model requires more effort. A validation data set for model selection usually is needed to help find the appropriate parameters for the v-svm model. For the v-svm model, therefore, the 3-week fitting data were separated further into a training data set (first weeks of data) and a validation data set (week 3 of data). Given the training and fitting data sets, a handy genetic algorithm tool was used to find the optimal parameters for the v-svm model. Details on the parameters to be determined and the genetic algorithm tool can be found in Zhang and Xie (6) and are not replicated here. The GP model was implemented by using a widely accepted package called Gaussian Processes for Machine Learning (GPML) (6). This package is based on the MATrix LABoratory (MATLAB) programming language platform. Customized MATLAB codes have also been developed to process the raw and output data and call functions in the GPML package. The training and testing process of the GP and v-svm models is illustrated in Figure 4. ARIMA Model ARIMA models were also fitted for the four data sets because of their popularity in traffic flow forecasting research ( 3, 8). An auto.arima forecasting program was used in the R Project for Statistical Computing to select the best-fit ARIMA model for each test data set (7). The best models selected by auto.arima and their corresponding Akaike information criterion (AIC), Bayesian information criterion (BIC), and log likelihood values are listed in Table. Both AIC and BIC are commonly accepted criteria for model selection. In general, models Untrained GP/ v-svm Models FIGURE 4 4 th week s input data (x only) First 3 weeks data x (x&y) x y Trained GP/ v-svm Models 4 th week s predicted data Training and evaluation of GP and v-svm models. ŷ

6 74 Transportation Research Record 65 TABLE ARIMA Models for Four Data Sets Data ARIMA Model Set Order AIC BIC Log Likelihood ARIMA(,,) 3, ,98.3 5, ARIMA(5,,4) 30, , ,5.7 3 ARIMA(3,,) 9, ,6.70 4, ARIMA(3,,4) 9, , ,90.3 with lower AIC and BIC values should be selected. Although the collected data sets showed seasonal patterns, seasonal ARIMA models were not selected by the auto.arima program. TABLE 3 Comparison of Two-Step-Ahead Forecasting Result GP v-svm ARIMA Data Set MAPE (%) RMSE Data Set MAPE (%) RMSE Data Set 3 MAPE (%) RMSE Data Set 4 MAPE (%) RMSE Measurements of Effectiveness Mean absolute percentage error (MAPE) and root mean square error (RMSE) are two commonly used criteria to evaluate and compare prediction methods. Adopted for use in this study, MAPE and RMSE are defined below: MAPE = N N k= vol ˆ ( k) vol( k) 00% ( 7) vol k N RMSE = ( vol ˆ ( k) vol( k) ) ( 8) N k= ( ) where vol(k) is the observed traffic volume at time step k and vol(k) is the corresponding predicted traffic volume. Each time step in this study was equivalent to 5 min. N was the size of the testing data set (total number of time steps). Results Analysis and Comparison The one- and two-step-ahead forecasting results are listed in Tables and 3, respectively. It can be easily seen that, for all data sets, the GP TABLE Comparison of One-Step-Ahead Forecasting Result GP v-svm ARIMA Data Set MAPE (%) RMSE Data Set MAPE (%) RMSE Data Set 3 MAPE (%) RMSE Data Set 4 MAPE (%) RMSE and v-svm models performed consistently better than the ARIMA models for one- and two-step-ahead forecasting. In some cases, the improvements in performance were quite significant. The two-stepahead forecasting MAPEs for Data Set for the GP and v-svm models were 0.7% and 0.9%, respectively, while the MAPE for the ARIMA (5,, 4) model was 7.3%. Tables and 3 also show that the superior performance by the GP and v-svm models was more significant in the two-step- than in the one-step-ahead forecasting. The GP and v-svm models performed about equally in terms of MAPE and RMSE in all cases. It was difficult to tell which method was absolutely better on the basis of the MAPE and RMSE values. However, the GP model possesses a desirable feature that distinguishes it from the v-svm model: Not only does it produce point traffic flow estimates, the GP model also generates standard deviations (i.e., error bar) for the predicted traffic flow values. To use the one-step-ahead forecasting result as one example, Figures 5 and 6 show the predicted traffic volumes and their standard deviations for Tuesday s data in Week 4 with the GP model. As Figures 5 and 6 show, the predicted standard deviations seemed not to be directly related to the magnitude of the observed traffic volumes. When the observed traffic volume is at its peak, the corresponding predicted standard deviation may not necessarily be at its largest (see Figure 6). Usually, predicted standard deviations become larger when there are drastic changes in the observed traffic volume data. This additional standard deviation information from the GP model could help traffic control and management personnel better assess the quality and reliability of the predicted data and use the predicted values wisely. From the predicted and observed traffic volumes plotted in Figures 5 and 6, it can be seen that the predicted traffic volumes (solid lines) closely followed the observed traffic flow data (red dots) for all test data sets. The predicted traffic volumes and standard deviations data were combined to create upper and lower boundaries for the four test data sets. Specifically, the upper and lower boundaries were created by adding or subtracting the standard deviations to or from the predicted traffic volumes. The resultant boundaries are shown in Figures 7 and 8. These two figures clearly show that in most cases the observed traffic data points were within the upper and lower boundaries generated from the GP model outputs. This result was encouraging. It confirms that the predicted traffic volume data closely follow the

7 Xie, Zhao, Sun, and Chen 75 (c) (d) FIGURE 5 One-step-ahead predicted volumes: Data Set and Data Set with GP model. Predicted standard deviations: (c) Data Set and (d ) Data Set. (c) (d) FIGURE 6 One-step-ahead predicted volumes: Data Set 3 and Data Set 4 with GP model. Predicted standard deviations: (c) Data Set 3 and (d ) Data Set 4.

8 76 Transportation Research Record 65 FIGURE 7 One-step-ahead prediction boundaries for Data Set and Data Set with GP model. FIGURE 8 One-step-ahead prediction boundaries for Data Set 3 and Data Set 4 with GP model.

9 Xie, Zhao, Sun, and Chen 77 observed traffic trends. In addition, it suggests that standard deviations are credible and can be used to generate useful intervals for predicted traffic volume data. DISCUSSION AND CONCLUSIONS This paper introduces a GP model into traffic flow modeling and forecasting. The proposed GP model was tested on four data sets collected on three interstate highways in Seattle, Washington. Two other promising traffic flow forecasting models, v-svm and ARIMA, were also tested with the same data sets. Their forecasting results were compared with those produced by the GP model. Two types of forecasting were conducted: one- and two-step-ahead forecasting. The results indicated that the GP and v-svm models outperformed the ARIMA model in all cases. They further showed that the GP and v-svm models outperformed the ARIMA models more significantly in the two-step-ahead forecasting than in the one-step. The GP and v-svm models are distribution-free learning algorithms and can be applied to handle many types of data that are not necessarily normally distributed for both classification and regression purposes (0, 5). However, the formulations of the two algorithms are based on very different modeling frameworks. The v-svm model is used to formulate and solve a nonlinear optimization problem, whereas the GP model is based on a full Bayesian framework. Nevertheless, the overall forecasting performances of the GP and v-svm models in this study were similar. This result probably can be explained by the fact that both are kernel-based machine learning methods. The full Bayesian framework enables the GP model to generate standard deviation estimates in addition to the predicted traffic flow volumes. This information could be useful to assess the reliability of the traffic flow predictions and to make better use of the predicted data. Unfortunately, such information cannot be readily obtained from the v-svm model. The estimated standard deviations were further plotted against the observed traffic volume data in Figures 5 and 6. The result suggests that the estimated standard deviations become larger when there are drastic changes in the observed traffic volume data. The predicted traffic volume and standard deviation data from the GP model were combined to generate upper and lower boundaries for each test data set. The result shows that most observed traffic flow data points fall into the predicted upper and lower boundaries (see Figures 7 and 8). The overall forecasting performance of the proposed GP model was satisfying. The model comparison results suggest that the GP model significantly outperforms the commonly used ARIMA model. Previously, the v-svm model has been shown to outperform multilayer, feed-forward neural network models in prediction accuracy and generalization capability (6). The results in this study indicate that the proposed GP model performs slightly better than the v-svm model in most cases and that it can generate useful standard deviation estimates, which the v-svm model cannot. In summary, the GP model offers a promising way to model and forecast traffic flow, and it has emerged as a serious competitor to the v-svm model. FUTURE WORK This study showed that the GP and v-svm models consistently outperformed the ARIMA model on all four data sets. Additional tests on other data sets are necessary to further confirm the superiority of these kernel-based machine learning methods over conventional modeling tools such as ARIMA. In particular, the GP model should be tested on data sets that exhibit clear seasonal patterns with erroneous values or missing data points, and then compared with other models such as the seasonal ARIMA model. ACKNOWLEDGMENTS The authors thank Daniel J. Dailey, University of Washington, Seattle, for permission to use the TDAD database and the James E. Clyburn University Transportation Center, South Carolina State University, Orangeburg, for its financial support. REFERENCES. Williams, B. M., P. K. Durvasula, and D. E. Brown. Urban Freeway Traffic Flow Prediction: Application of Seasonal Autoregressive Integrated Moving Average and Exponential Smoothing Models. In Transportation Research Record 644, TRB, National Research Council, Washington, D.C., 998, pp Ahmed, M. S., and A. R. Cook. Analysis of Freeway Traffic Time- Series Data By Using Box-Jenkins Techniques. In Transportation Research Record 7, TRB, National Research Council, Washington, D.C., 979, pp Nihan, N. L., and K. O. Holmesland. Use of the Box and Jenkins Time Series Technique in Traffic Forecasting. Transportation, Vol. 9, No., 980, pp Davis, G. A., and N. L. Nihan. Nonparametric Regression and Short-Term Freeway Traffic Forecasting. Journal of Transportation Engineering, Vol. 7, No., 99, pp Smith, B. L., and M. J. Demetsky. Traffic Flow Forecasting: Comparison of Modeling Approaches. Journal of Transportation Engineering, Vol. 3, No. 4, 997, pp Okutani, I., and Y. J. Stephanedes. Dynamic Prediction of Traffic Volume Through Kalman Filtering Theory. Transportation Research Part B, Vol. 8, No., 984, pp.. 7. Stathopoulos, A., and M. G. Karlaftis. A Multivariate State Space Approach for Urban Traffic Flow Modeling and Prediction. Transportation Research Part C, Vol., No., 003, pp Xie, Y., Y. Zhang, and Z. Ye. Short-Term Traffic Volume Forecasting Using Kalman Filter with Discrete Wavelet Decomposition. Computer- Aided Civil and Infrastructure Engineering, Vol., No. 5, 007, pp Smith, B. L., and M. J. Demetsky. Short-Term Traffic Flow Prediction: Neural Network Approach. In Transportation Research Record 453, TRB, National Research Council, Washington, D.C., 994, pp Park, B., C. J. Messer, and T. Urbanik II. Short-Term Freeway Traffic Volume Forecasting Using Radial Basis Function Neural Network. In Transportation Research Record 65, TRB, National Research Council, Washington, D.C., 998, pp Yin, H. B., S. C. Wong, J. M. Xu, and C. K. Wong. Urban Traffic Flow Prediction Using a Fuzzy-Neural Approach. Transportation Research Part C, Vol. 0, No., 00, pp Xie, Y., and Y. Zhang. A Wavelet Network Model for Short-Term Traffic Volume Forecasting. Journal of Intelligent Transportation Systems: Technology, Planning, and Operations, Vol. 0, No. 3, 006, pp Van Lint, J. W. C., S. P. Hoogendoorn, and H. J. Van Zuylen. Accurate Freeway Travel Time Prediction with State-Space Neural Networks Under Missing Data. Transportation Research Part C, Vol. 3, Nos. 5 6, 005, pp Park, D., and L. R. Rilett. Forecasting Freeway Link Travel Times with a Multilayer Feedforward Neural Network. Computer-Aided Civil and Infrastructure Engineering, Vol. 4, No. 5, 999, pp

10 78 Transportation Research Record Vlahogianni, E. I., M. G. Karlaftis, and J. C. Golias. Optimized and Meta- Optimized Neural Networks for Short-Term Traffic Flow Prediction: A Genetic Approach. Transportation Research Part C, Vol. 3, No. 3, 005, pp Zhang, Y., and Y. Xie. Forecasting of Short-Term Freeway Volume with v-support Vector Machines, In Transportation Research Record: Journal of the Transportation Research Board, No. 04, Transportation Research Board of the National Academies, Washington, D.C., 007, pp Wu, C. H., J. M. Ho, and D. T. Lee. Travel-Time Prediction with Support Vector Regression. IEEE Transactions on Intelligent Transportation Systems, Vol. 5, No. 4, 004, pp Van Der Voort, M., M. Dougherty, and S. Watson. Combining Kohonen Maps with ARIMA Time Series Models to Forecast Traffic Flow. Transportation Research Part C, Vol. 4, No. 5, 996, pp Hornik, K., M. Stinchcombe, and H. White. Multilayer Feedforward Networks Are Universal Approximators. Neural Networks, Vol., No. 5, 989, pp Rasmussen, C. E., and C. K. I. Williams. Gaussian Processes for Machine Learning. The MIT Press, Cambridge, Mass., Suykens, J. A. K., T. V. Gestel, J. D. Brabanter, B. D. Moor, and J. Vanderwalle. Least Squares Support Vector Machines. World Scientific Publishing Co. Pte. Ltd., Singapore, 00.. MacKay, D. J. C. Gaussian Processes A Replacement for Supervised Neural Networks? Tutorial. Neural Information Processing Systems Foundation, La Jolla, Calif., Seeger, M. Gaussian Processes for Machine Learning. International Journal of Neural Systems, Vol. 4, No., 004, pp Chu, W., S. S. Keerthi, and C. J. Ong. Bayesian Trigonometric Support Vector Classifier. Neural Computation, Vol. 5, No. 9, 003, pp Zhao, K., S. C. Popescu, and X. Zhang. Bayesian Learning with Gaussian Processes for Supervised Classification of Hyperspectral Data. Photogrammetric Engineering & Remote Sensing, Vol. 74, No. 0, 008, pp Documentation for GPML MATLAB Code. process.org/gpml/code/matlab/doc/. Accessed Nov., The R Project for Statistical Computing (Version.9.). r-project.org/. Accessed July 3, 009. The Statistical Methods Committee peer-reviewed this paper.

Short-term traffic volume prediction using neural networks

Short-term traffic volume prediction using neural networks Short-term traffic volume prediction using neural networks Nassim Sohaee Hiranya Garbha Kumar Aswin Sreeram Abstract There are many modeling techniques that can predict the behavior of complex systems,

More information

BAYESIAN CLASSIFICATION OF HIGH DIMENSIONAL DATA WITH GAUSSIAN PROCESS USING DIFFERENT KERNELS

BAYESIAN CLASSIFICATION OF HIGH DIMENSIONAL DATA WITH GAUSSIAN PROCESS USING DIFFERENT KERNELS BAYESIAN CLASSIFICATION OF HIGH DIMENSIONAL DATA WITH GAUSSIAN PROCESS USING DIFFERENT KERNELS Oloyede I. Department of Statistics, University of Ilorin, Ilorin, Nigeria Corresponding Author: Oloyede I.,

More information

Model Selection for Gaussian Processes

Model Selection for Gaussian Processes Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II 1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

Reliability Monitoring Using Log Gaussian Process Regression

Reliability Monitoring Using Log Gaussian Process Regression COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical

More information

Bayesian Support Vector Machines for Feature Ranking and Selection

Bayesian Support Vector Machines for Feature Ranking and Selection Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

FREEWAY SHORT-TERM TRAFFIC FLOW FORECASTING BY CONSIDERING TRAFFIC VOLATILITY DYNAMICS AND MISSING DATA SITUATIONS. A Thesis YANRU ZHANG

FREEWAY SHORT-TERM TRAFFIC FLOW FORECASTING BY CONSIDERING TRAFFIC VOLATILITY DYNAMICS AND MISSING DATA SITUATIONS. A Thesis YANRU ZHANG FREEWAY SHORT-TERM TRAFFIC FLOW FORECASTING BY CONSIDERING TRAFFIC VOLATILITY DYNAMICS AND MISSING DATA SITUATIONS A Thesis by YANRU ZHANG Submitted to the Office of Graduate Studies of Texas A&M University

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Neutron inverse kinetics via Gaussian Processes

Neutron inverse kinetics via Gaussian Processes Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Gaussian Processes (10/16/13)

Gaussian Processes (10/16/13) STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

TRAFFIC FLOW MODELING AND FORECASTING THROUGH VECTOR AUTOREGRESSIVE AND DYNAMIC SPACE TIME MODELS

TRAFFIC FLOW MODELING AND FORECASTING THROUGH VECTOR AUTOREGRESSIVE AND DYNAMIC SPACE TIME MODELS TRAFFIC FLOW MODELING AND FORECASTING THROUGH VECTOR AUTOREGRESSIVE AND DYNAMIC SPACE TIME MODELS Kamarianakis Ioannis*, Prastacos Poulicos Foundation for Research and Technology, Institute of Applied

More information

Gaussian process for nonstationary time series prediction

Gaussian process for nonstationary time series prediction Computational Statistics & Data Analysis 47 (2004) 705 712 www.elsevier.com/locate/csda Gaussian process for nonstationary time series prediction Soane Brahim-Belhouari, Amine Bermak EEE Department, Hong

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Optimization of Gaussian Process Hyperparameters using Rprop

Optimization of Gaussian Process Hyperparameters using Rprop Optimization of Gaussian Process Hyperparameters using Rprop Manuel Blum and Martin Riedmiller University of Freiburg - Department of Computer Science Freiburg, Germany Abstract. Gaussian processes are

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group Nonparmeteric Bayes & Gaussian Processes Baback Moghaddam baback@jpl.nasa.gov Machine Learning Group Outline Bayesian Inference Hierarchical Models Model Selection Parametric vs. Nonparametric Gaussian

More information

Relevance Vector Machines for Earthquake Response Spectra

Relevance Vector Machines for Earthquake Response Spectra 2012 2011 American American Transactions Transactions on on Engineering Engineering & Applied Applied Sciences Sciences. American Transactions on Engineering & Applied Sciences http://tuengr.com/ateas

More information

CHAPTER 5 FORECASTING TRAVEL TIME WITH NEURAL NETWORKS

CHAPTER 5 FORECASTING TRAVEL TIME WITH NEURAL NETWORKS CHAPTER 5 FORECASTING TRAVEL TIME WITH NEURAL NETWORKS 5.1 - Introduction The estimation and predication of link travel times in a road traffic network are critical for many intelligent transportation

More information

Bayesian Support Vector Regression for Traffic Speed Prediction with Error Bars

Bayesian Support Vector Regression for Traffic Speed Prediction with Error Bars Bayesian Support Vector Regression for Traffic Speed Prediction with Error Bars Gaurav Gopi, Justin Dauwels, Muhammad Tayyab Asif, Sridhar Ashwin, Nikola Mitrovic, Umer Rasheed, Patrick Jaillet Abstract

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation COMP 55 Applied Machine Learning Lecture 2: Bayesian optimisation Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material posted

More information

Traffic Flow Forecasting for Urban Work Zones

Traffic Flow Forecasting for Urban Work Zones TRAFFIC FLOW FORECASTING FOR URBAN WORK ZONES 1 Traffic Flow Forecasting for Urban Work Zones Yi Hou, Praveen Edara, Member, IEEE and Carlos Sun Abstract None of numerous existing traffic flow forecasting

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Multiple-step Time Series Forecasting with Sparse Gaussian Processes

Multiple-step Time Series Forecasting with Sparse Gaussian Processes Multiple-step Time Series Forecasting with Sparse Gaussian Processes Perry Groot ab Peter Lucas a Paul van den Bosch b a Radboud University, Model-Based Systems Development, Heyendaalseweg 135, 6525 AJ

More information

How to build an automatic statistician

How to build an automatic statistician How to build an automatic statistician James Robert Lloyd 1, David Duvenaud 1, Roger Grosse 2, Joshua Tenenbaum 2, Zoubin Ghahramani 1 1: Department of Engineering, University of Cambridge, UK 2: Massachusetts

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan 1, M. Koval and P. Parashar 1 Applications of Gaussian

More information

VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms

VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms 03/Feb/2010 VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms Presented by Andriy Temko Department of Electrical and Electronic Engineering Page 2 of

More information

Electric Load Forecasting Using Wavelet Transform and Extreme Learning Machine

Electric Load Forecasting Using Wavelet Transform and Extreme Learning Machine Electric Load Forecasting Using Wavelet Transform and Extreme Learning Machine Song Li 1, Peng Wang 1 and Lalit Goel 1 1 School of Electrical and Electronic Engineering Nanyang Technological University

More information

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes Roger Grosse Roger Grosse CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes 1 / 55 Adminis-Trivia Did everyone get my e-mail

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Time Series I Time Domain Methods

Time Series I Time Domain Methods Astrostatistics Summer School Penn State University University Park, PA 16802 May 21, 2007 Overview Filtering and the Likelihood Function Time series is the study of data consisting of a sequence of DEPENDENT

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Predicting freeway traffic in the Bay Area

Predicting freeway traffic in the Bay Area Predicting freeway traffic in the Bay Area Jacob Baldwin Email: jtb5np@stanford.edu Chen-Hsuan Sun Email: chsun@stanford.edu Ya-Ting Wang Email: yatingw@stanford.edu Abstract The hourly occupancy rate

More information

Robust Building Energy Load Forecasting Using Physically-Based Kernel Models

Robust Building Energy Load Forecasting Using Physically-Based Kernel Models Article Robust Building Energy Load Forecasting Using Physically-Based Kernel Models Anand Krishnan Prakash 1, Susu Xu 2, *,, Ram Rajagopal 3 and Hae Young Noh 2 1 Energy Science, Technology and Policy,

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

Efficient Resonant Frequency Modeling for Dual-Band Microstrip Antennas by Gaussian Process Regression

Efficient Resonant Frequency Modeling for Dual-Band Microstrip Antennas by Gaussian Process Regression 1 Efficient Resonant Frequency Modeling for Dual-Band Microstrip Antennas by Gaussian Process Regression J. P. Jacobs Abstract A methodology based on Gaussian process regression (GPR) for accurately modeling

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Data-Driven Forecasting Algorithms for Building Energy Consumption

Data-Driven Forecasting Algorithms for Building Energy Consumption Data-Driven Forecasting Algorithms for Building Energy Consumption Hae Young Noh a and Ram Rajagopal b a Department of Civil and Environmental Engineering, Carnegie Mellon University, Pittsburgh, PA, 15213,

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

A Hybrid Time-delay Prediction Method for Networked Control System

A Hybrid Time-delay Prediction Method for Networked Control System International Journal of Automation and Computing 11(1), February 2014, 19-24 DOI: 10.1007/s11633-014-0761-1 A Hybrid Time-delay Prediction Method for Networked Control System Zhong-Da Tian Xian-Wen Gao

More information

MODELLING TRAFFIC FLOW ON MOTORWAYS: A HYBRID MACROSCOPIC APPROACH

MODELLING TRAFFIC FLOW ON MOTORWAYS: A HYBRID MACROSCOPIC APPROACH Proceedings ITRN2013 5-6th September, FITZGERALD, MOUTARI, MARSHALL: Hybrid Aidan Fitzgerald MODELLING TRAFFIC FLOW ON MOTORWAYS: A HYBRID MACROSCOPIC APPROACH Centre for Statistical Science and Operational

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Development of Stochastic Artificial Neural Networks for Hydrological Prediction

Development of Stochastic Artificial Neural Networks for Hydrological Prediction Development of Stochastic Artificial Neural Networks for Hydrological Prediction G. B. Kingston, M. F. Lambert and H. R. Maier Centre for Applied Modelling in Water Engineering, School of Civil and Environmental

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

Probabilistic Reasoning in Deep Learning

Probabilistic Reasoning in Deep Learning Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian

More information

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer. University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x

More information

PREDICTING SOLAR GENERATION FROM WEATHER FORECASTS. Chenlin Wu Yuhan Lou

PREDICTING SOLAR GENERATION FROM WEATHER FORECASTS. Chenlin Wu Yuhan Lou PREDICTING SOLAR GENERATION FROM WEATHER FORECASTS Chenlin Wu Yuhan Lou Background Smart grid: increasing the contribution of renewable in grid energy Solar generation: intermittent and nondispatchable

More information

Lecture 9. Time series prediction

Lecture 9. Time series prediction Lecture 9 Time series prediction Prediction is about function fitting To predict we need to model There are a bewildering number of models for data we look at some of the major approaches in this lecture

More information

SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES

SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES JIANG ZHU, SHILIANG SUN Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 20024, P. R. China E-MAIL:

More information

Univariate Short-Term Prediction of Road Travel Times

Univariate Short-Term Prediction of Road Travel Times MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Univariate Short-Term Prediction of Road Travel Times D. Nikovski N. Nishiuma Y. Goto H. Kumazawa TR2005-086 July 2005 Abstract This paper

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

System identification and control with (deep) Gaussian processes. Andreas Damianou

System identification and control with (deep) Gaussian processes. Andreas Damianou System identification and control with (deep) Gaussian processes Andreas Damianou Department of Computer Science, University of Sheffield, UK MIT, 11 Feb. 2016 Outline Part 1: Introduction Part 2: Gaussian

More information

Gaussian Processes for Machine Learning

Gaussian Processes for Machine Learning Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of

More information

Statistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes

Statistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes Statistical Techniques in Robotics (6-83, F) Lecture# (Monday November ) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan Applications of Gaussian Processes (a) Inverse Kinematics

More information

Gaussian Processes. 1 What problems can be solved by Gaussian Processes?

Gaussian Processes. 1 What problems can be solved by Gaussian Processes? Statistical Techniques in Robotics (16-831, F1) Lecture#19 (Wednesday November 16) Gaussian Processes Lecturer: Drew Bagnell Scribe:Yamuna Krishnamurthy 1 1 What problems can be solved by Gaussian Processes?

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts ICML 2015 Scalable Nonparametric Bayesian Inference on Point Processes with Gaussian Processes Machine Learning Research Group and Oxford-Man Institute University of Oxford July 8, 2015 Point Processes

More information

TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA

TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA CHAPTER 6 TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA 6.1. Introduction A time series is a sequence of observations ordered in time. A basic assumption in the time series analysis

More information

Maximum Direction to Geometric Mean Spectral Response Ratios using the Relevance Vector Machine

Maximum Direction to Geometric Mean Spectral Response Ratios using the Relevance Vector Machine Maximum Direction to Geometric Mean Spectral Response Ratios using the Relevance Vector Machine Y. Dak Hazirbaba, J. Tezcan, Q. Cheng Southern Illinois University Carbondale, IL, USA SUMMARY: The 2009

More information

Expectation Propagation for Approximate Bayesian Inference

Expectation Propagation for Approximate Bayesian Inference Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

INFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES

INFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES INFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES SHILIANG SUN Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 20024, China E-MAIL: slsun@cs.ecnu.edu.cn,

More information

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Practical Bayesian Optimization of Machine Learning. Learning Algorithms Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Expectation propagation for signal detection in flat-fading channels

Expectation propagation for signal detection in flat-fading channels Expectation propagation for signal detection in flat-fading channels Yuan Qi MIT Media Lab Cambridge, MA, 02139 USA yuanqi@media.mit.edu Thomas Minka CMU Statistics Department Pittsburgh, PA 15213 USA

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Prediction of double gene knockout measurements

Prediction of double gene knockout measurements Prediction of double gene knockout measurements Sofia Kyriazopoulou-Panagiotopoulou sofiakp@stanford.edu December 12, 2008 Abstract One way to get an insight into the potential interaction between a pair

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

The Variational Gaussian Approximation Revisited

The Variational Gaussian Approximation Revisited The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much

More information

Probabilistic Graphical Models Lecture 20: Gaussian Processes

Probabilistic Graphical Models Lecture 20: Gaussian Processes Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

A Data-Driven Model for Software Reliability Prediction

A Data-Driven Model for Software Reliability Prediction A Data-Driven Model for Software Reliability Prediction Author: Jung-Hua Lo IEEE International Conference on Granular Computing (2012) Young Taek Kim KAIST SE Lab. 9/4/2013 Contents Introduction Background

More information

Gaussian Processes in Machine Learning

Gaussian Processes in Machine Learning Gaussian Processes in Machine Learning November 17, 2011 CharmGil Hong Agenda Motivation GP : How does it make sense? Prior : Defining a GP More about Mean and Covariance Functions Posterior : Conditioning

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

Modeling human function learning with Gaussian processes

Modeling human function learning with Gaussian processes Modeling human function learning with Gaussian processes Thomas L. Griffiths Christopher G. Lucas Joseph J. Williams Department of Psychology University of California, Berkeley Berkeley, CA 94720-1650

More information

Content. Learning. Regression vs Classification. Regression a.k.a. function approximation and Classification a.k.a. pattern recognition

Content. Learning. Regression vs Classification. Regression a.k.a. function approximation and Classification a.k.a. pattern recognition Content Andrew Kusiak Intelligent Systems Laboratory 239 Seamans Center The University of Iowa Iowa City, IA 52242-527 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak Introduction to learning

More information

Gaussian Process Regression Networks

Gaussian Process Regression Networks Gaussian Process Regression Networks Andrew Gordon Wilson agw38@camacuk mlgengcamacuk/andrew University of Cambridge Joint work with David A Knowles and Zoubin Ghahramani June 27, 2012 ICML, Edinburgh

More information

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak Gaussian processes and bayesian optimization Stanisław Jastrzębski kudkudak.github.io kudkudak Plan Goal: talk about modern hyperparameter optimization algorithms Bayes reminder: equivalent linear regression

More information