Gaussian Processes for Short-Term Traffic Volume Forecasting
|
|
- Emil Campbell
- 5 years ago
- Views:
Transcription
1 Gaussian Processes for Short-Term Traffic Volume Forecasting Yuanchang Xie, Kaiguang Zhao, Ying Sun, and Dawei Chen The accurate modeling and forecasting of traffic flow data such as volume and travel time are critical to intelligent transportation systems. Many forecasting models have been developed for this purpose since the 970s. Recently kernel-based machine learning methods such as support vector machines (SVMs) have gained special attention in traffic flow modeling and other time series analyses because of their outstanding generalization capability and superior nonlinear approximation. In this study, a novel kernel-based machine learning method, the Gaussian processes (GPs) model, was proposed to perform short-term traffic flow forecasting. This GP model was evaluated and compared with SVMs and autoregressive integrated moving average (ARIMA) models based on four sets of traffic volume data collected from three interstate highways in Seattle, Washington. The comparative results showed that the GP and SVM models consistently outperformed the ARIMA model. This study also showed that because the GP model is formulated in a full Bayesian framework, it can allow for explicit probabilistic interpretation of forecasting outputs. This capacity gives the GP an advantage over SVMs to model and forecast traffic flow. The accurate modeling and forecasting of traffic flow data, such as volume, speed, and travel time, are critical to intelligent transportation systems (ITS), especially advanced traveler information systems (ATIS) and advanced traffic management systems (ATMS). Given reliable real-time traffic flow predictions, travelers can choose the best routes dynamically. Also, such information can be used by traffic management personnel to develop proactive traffic control strategies that make better use of the available road network resources. The success of many ATIS and ATMS applications depends largely on the accuracy of the selected traffic flow modeling and forecasting algorithms. Numerous methods have been developed and compared since the 970s to improve the accuracy of traffic flow forecasting. These methods can generally be categorized into the following groups: autoregressive integrated moving average (ARIMA) models ( 3), nonparametric regression (4 5), Kalman filtering theory (6 8), neural networks (9 5), support vector machines (SVMs) (6 7), and hybrid models (8). Of the existing traffic flow forecasting methods, neural networks are the most widely used ones. One major reason is that neural networks have a strong function approx- Y. Xie, Civil and Mechanical Engineering Technology, South Carolina State University, Orangeburg, SC 97. K. Zhao, Spatial Science Lab, and Y. Sun, Department of Statistics, Texas A&M University, College Station, TX D. Chen, School of Transportation, Southeast University, Nanjing, Jiangsu, China. Corresponding author: Y. Xie, yxie@scsu.edu. Transportation Research Record: Journal of the Transportation Research Board, No. 65, Transportation Research Board of the National Academies, Washington, D.C., 00, pp DOI: 0.34/65-08 imation capability and can better model the complicated relationship between historical and future traffic flow data than other methods (9). The application of neural networks does not require an explicit model formulation to be specified, as is usually required. Despite the many attractive features of neural networks, their application is not an easy task. Model training and selection involve tricky decisions with regard to network architectures, type of transfer (activation) functions, learning rate, and number of hidden neurons (0). Cautions must be taken during the training of neural networks to prevent overfitting the training data and to avoid local minima. To address these problems, SVMs have been introduced (6 7). Similar to neural networks, SVMs have superior function approximation capability and do not require the specification of model formulations. In addition, they are developed on the structural risk minimization (SRM) principle (), as opposed to the empirical risk minimization (ERM) principle used in conventional neural networks. Theoretically then, SVMs can better solve the overfitting problem, and they have better generalization capabilities than do conventional neural networks. Another important feature is the SVM capacity to guarantee a globally optimal solution for a given training data set (6 7). A v-support vector machine (v-svm) model was compared with multilayer, feed-forward neural networks by using traffic volume data collected from interstates (I-5, I-90, and I-405) in the Seattle area. The comparison resulted in favor of the v-svm model (6). Gaussian processes (GPs) are another important class of kernelbased learning algorithms that have attracted attention in the machine learning community ( 3). Similar to other popular kernel machines, such as SVMs, GP models are powerful tools to explore implicit relationships between a set of variables based on a training data set, which make GPs especially useful to address difficult nonlinear regression and classification problems (0). A particularly attractive feature of GPs is their formulation in a full Bayesian framework, which allows for explicit probabilistic interpretation of model outputs (). Moreover, GP model parameters (e.g., kernel parameters) can be computed naturally by means of Bayesian learning, as opposed to the grid-searching, trial-and-error method commonly used to optimize classical SVMs (). Hence, some researchers refer to GPs as Bayesian SVMs (4). The superior performance of GPs for difficult supervised learning problems has been demonstrated in many domain-specific applications when compared against conventional methods such as neural networks and other advanced learning algorithms such as SVMs (0, 5). In the study reported on here, a GP regression model was adopted to model and predict traffic volume data. This GP model can be used to forecast travel speed and travel time as well. Like SVMs, GP models are kernel-based machine learning methods and possess many of the same desirable features. More important, they can produce better informative outputs than SVMs and neural networks, which makes it easier to interpret GP prediction results. 69
2 70 Transportation Research Record 65 GAUSSIAN PROCESSES General Formulation of GPs GPs provide a Bayesian paradigm to learn an implicit functional relationship ŷ = f(x) from a given training data set, D = {(x i, y i )} n i=. x i R d represents the vector of observed input variables (i.e., predictors) in a d-dimensional feature space, and y i is the one-dimensional observed target value (i.e., response variable) that is either continuous or discrete. Unlike most classical Bayesian models, GPs directly elicit a prior distribution on the whole function f(x). Specifically, f(x) is treated as a random field and is assumed to be a GP a prior: ( ( ) ) ( ( ) ( )) p f x GP m x, k x, x ( ) where the prior GP is fully specified by a mean function m(x) and a covariance function k(x, x ), and denotes the prior s hyperparameters used to parameterize the covariance function; that is, k(x, x ) = k(x, x ; ). Strictly speaking, a GP model can be treated as a probability distribution defined over functions such that E[ f(x)] = m(x) and Cov[ f(x), f(x )] = k(x, x ) where f(x) and f(x ) are random variables indexed by any pair of x and x. In such a sense, a GP prior can be roughly deemed as a probability distribution for an infinite number of random variables. Furthermore, a collection of function values that are indexed by any finite number of X = [x, x,..., x n ] T, i.e., f(x) = [ f(x ), f(x ),..., f(x n )] T, assumes a multivariate normal distribution ( ( )) = ( ( ) ( )) p f X N m X, K X, X ( ) where the mean vector m(x) and covariance matrix K(X, X) are determined directly from m( ) and k(, ), namely, m(x) = [m(x ), m(x ),..., m(x n )] T and K ij = k(x i, x j ), i, j =,..., n. For ease of presentation but without loss of generality, m(x) = 0 is assumed, because in practice the data can always be centered with respect to the sample mean. In the machine learning term, k(x, x ) is often called a kernel function or simply a kernel rather than a covariance function. As detailed later, kernel functions usually take certain forms that are parameterized by one or more parameters. Accordingly, specifying a GP prior p( f(x) ) GP(m(x), k(x, x )) is to determine a specific type of kernel (covariance function) and the associated values. Once a GP prior p(f ) and a noise model p(y f) are specified, the posterior distribution of f, given the training data D, p(f D, ), can be readily derived by updating the prior p(f ) according to the Bayes theorem: p yf p fx ( ) ( ) (, ) = p( D ) p fd, where the input variables X (i.e., the indices for f) have been made explicit in the prior. The term p(d ) is called marginal likelihood as it is a function of, given D, and the noise model p(y f) is also known as likelihood, which is a function of f for a fixed set of observations y. The p(y f) is introduced because in practice y i is a corrupted version of f(x i ) as the result of certain noises or measurement errors. With the posterior p(f D, ), prediction distribution at a new input x * is obtained by using ( ) = ( ) p f p f d * x *, D, *, f D, f ( 4 ) () 3 By combining Equation 4 and the noise model, the predictive distribution for y * can also be obtained as from which not only the predicted mean but also the associated uncertainty (error-bar) could be computed. In GP modeling it is a collection of function values f(x) not x itself that needs to be Gaussian. In fact, the input variables x are assumed to be distribution-free. In other words, the GP model theoretically can handle data with any kinds of distributions. Interested readers can refer to Rasmussen and Williams (0), MacKay (), and Seeger (3) for more information. GP Regression Model The aforementioned GP models will solve nonlinear regression problems when the response variables y i are continuous and a normal distribution is assumed for the noise model p(y f ). Specifically, y i is subject to independent and identically distributed (i.i.d.) normal errors with a mean of zero and a variance of σ : In such a case, the inference of GP models becomes analytically tractable as a result of the Gaussianity of p(y f ); accordingly, the resultant posterior and predictive distributions as given in Equations 3 through 5 all reduce to normal distributions. For a new input x *, the predictive mean and variance associated with ŷ * = f(x * ) = f * are given by Equations 7 and 8, respectively (0). μ ( ) = ( ) ( ) p y x p y f p f df * *, D, x * * * *, D, () 5 * y f x N 0, σ ( 6) i i i i and = ( ) + ( ) ( ) = ( ) ( ) + f k x, X, σ ( ) * * K X X I y 7 ( ) = ( ) ( ) ( ) + Var f k x x x, * *, k X,, * * K X X I k X x () * where X and y = observed predictors and response variables in D = {(x i, y i )} n i=, I = n n identity matrix, and k(x *, X), with its ith element being k(x *, x i ), = n vector denoting the covariance of f * with f(x), and k(x, x * ) T = k(x *, X). Kernels and Learning Hyperparameters Equations 7 and 8 show that to fit and apply a GP regression model amounts to the choice of a kernel and the specification of its parameters (i.e., hyperparameters). In machine learning, the most commonly used kernels include the polynomial kernel, the radial basis function (RBF), and the automatic relevance determination (ARD) as given by Rasmussen and Williams (0). ( ) = ( + ) σ ( ) 8 T p kpoly xx, ; σ0, Σp, p σ0 xσpx ( 9)
3 Xie, Zhao, Sun, and Chen 7 k RBF kard xx, ; σ0, l,..., ld σ0exp where σ 0, p, l, l i and Σ p are hyperparameters of the corresponding kernels that have been symbolized as in Equations 3 through 5. The is called a hyperparameter because, in the Bayesian framework of GPs, the unknown function f itself is a parameter as the result of the prior p( f ) placed on f. A common hyperparameter of the above kernels is the variance σ 0, which plays the same role as the tradeoff parameter of SVMs. However, a GP kernel and its hyperparameters are more interpretable than those of SVMs because the GP kernel represents the degree of correlation between function values at two inputs. For example, the hyperparameter l in Equation 0, or l i in Equation, refers to a characteristic length that represents a distance in the input space beyond which function values become less relevant. The magnitude of l i in the ARD kernel indicates the inference capability of the ith input variable. Very large values of l i will downplay or eliminate the influences of irrelevant input dimensions. As such, the ARD provides a parameterization scheme for automatic feature reduction, which has proved effective when handling highdimensional problems (5). Most studies have confirmed the superior performance of the RBF kernel (6 7, 0, 5). Therefore, only the RBF kernel was examined, and no comparison between kernels was made in the study reported here. In practice, rather than guess at an initial value for the hyperparameter, it is advantageous to learn its informative value ˆ in favor of the training data D. In the Bayesian formulation of GP models, the posterior of is given by p D p ( ) ( ) ( ) = p( D) p D x x ( xx, ; σ0, l)= σ0 exp ( 0) l ( ) = Then, the optimal value ˆ can be obtained naturally as the maximum a posteriori (MAP) estimate of p( D) (0). Because of the lack of prior knowledge, p(θ) is assumed to be flat (i.e., a noninformative prior). In such a case, the MAP estimate ˆ is pinpointed by maximizing the following marginal likelihood p(d ): ( ) = ( ) ( ) i=... d ( xx, ) l p D p y f p f df ( 3) Such a procedure is also known as Type maximum likelihood. For the GP regression models of Equations 6 through 8, the log marginal likelihood can be expressed as it was in Rasmussen and Williams (0): i i T n log p( D ) = y K( X, X) y log K( X, X) log π ( ) ( ) ( 4) where denotes the determinant of a matrix. The gradient of log p(d θ) with respect to is ( ) The maximization of log p(d θ) with respect to can be implemented with any general gradient-based optimization techniques, and in this study a conjugate gradient optimization method was employed, similar to the one used by Rasmussen and Williams (0). The global optimization may be trapped into local maxima if there are a large number of hyperparameters (e.g., when the ARD kernel is used for feature selection over high-dimensional inputs). As a remedy, it is common practice to perform the optimization multiple times with random initial values and to select the one that yields the highest marginal likelihood. MODEL TESTING AND RESULT ANALYSIS Data Description To facilitate model comparison, the same data set used in Zhang and Xie (6) was used again here. The traffic volume data were obtained from the traffic data acquisition and distribution (TDAD) database maintained by an ITS research group at the University of Washington, Seattle. Specifically, traffic volume data from four detectors located on three interstate highways in the Seattle area were used. The approximate locations of the four detectors are shown in Figure. Detailed information about the four detectors follows. Detector Data Collection Dataset Direction Name Period Southbound ES-088D June 6, 005 to July 3, 005 Eastbound ES-855D 3 Northbound ES-645D 4 Northbound ES-708D A total of four sets of traffic volume data was obtained from these detectors. Each data set contained 8 days of data. The raw traffic volume data were aggregated by using 5-min intervals, and a single day generated 96 data points. The first 4 days of data from each data set are plotted in Figure to show the general trends. It is easy to see that ES-008D ES-855D ES-708D log p( D ) T = ykxx (, ) θ i tr K X, X ( ) (, ) K X X y θi K( X, X) θ i ( 5) FIGURE ES-645D Approximate locations of detectors.
4 7 Transportation Research Record 65 Dataset (Detector ES-088D, Southbound) Dataset (Detector ES-855D, Eastbound) Dataset 3 (Detector ES-645D, Northbound) (c) Dataset 4 (Detector ES-708D, Northbound) (d) FIGURE First 4 days of data from each detector.
5 Xie, Zhao, Sun, and Chen 73 observed traffic volume data v v v 3 v 4 v i v n-3 v n- v n- v n n-l x y x y v v L v L+ v v L v L+ v v L+ v L+ v v L+ v L+3 n-l- v n--l v n-3 v n- v n--l v n- v n- v n--l v n- v n v n-l v n- v n one-step ahead prediction two-step ahead prediction FIGURE 3 Predictions: one-step-ahead and two-step-ahead. Data Sets through 3 showed similar patterns but different traffic volume levels. Their weekday traffic clearly had two peak periods. In Data Set 4, the effect of morning rush hour was not as obvious. Model Fitting The same data sets discussed above were used by Zhang and Xie to evaluate v-svms and to compare them with a multilayer, feedforward neural networks model (6). Since their results showed that the v-svm model consistently outperformed the neural networks model, here it was compared only with the proposed GP model. ARIMA models also were fitted and compared with the GP and v-svm models. Thus three types of models were compared in the study reported here. For all three models, the first 3 weeks of data were used for fitting models; the last week of data was used for prediction tests. These three types of models were compared primarily on the basis of their prediction performance. Both one- and two-step-ahead prediction results were compared. Figure 3 shows the difference between one- and two-step-ahead predictions, where n was the total observed traffic volume data points, and L was the model input length. In Figure 3, v i represents the aggregate traffic count for a 5-min period. By using the first input as one example, both predictions took the same vector x =[v, v,..., v L ] as the input. However, the values to be predicted (outputs) for the one- and two-step-ahead predictions were v L and v L+, respectively. v-svm and GP Models As discussed earlier, to fit the v-svm and GP models conceptually is straightforward. It does not require users to specify an explicit model formulation. Take the one-step-ahead prediction as an example: for each data point to be predicted or modeled as output, the 4 data points immediately preceding it are used as model input. Thus, for a training data set of length 96, it can generate 7 training inputs. The input dimension 4 is determined on the basis of an autocorrelation function (ACF) method (6). For each of the four data sets, the ACF values were evaluated at different time lags. The first time lag, in which the ACF value was zero, was selected as the input dimension. Based on the ACF values, an input dimension of 4 was selected for all four data sets. According to the notations set out earlier in this paper on GPs, the one-step-ahead models to be fitted could be symbolized as yˆ = vol ˆ ( i) = f x = vol i, vol i,..., vol i 4 T ( 6) i ( i [ ( ) ( ) ( )] ) where vol(i) refers to the traffic volume at time step i, and f(x i ) is the implicit model form to be learned by either the v-svm or the GP algorithm. Compared with the GP model, the fitting of the v-svm model requires more effort. A validation data set for model selection usually is needed to help find the appropriate parameters for the v-svm model. For the v-svm model, therefore, the 3-week fitting data were separated further into a training data set (first weeks of data) and a validation data set (week 3 of data). Given the training and fitting data sets, a handy genetic algorithm tool was used to find the optimal parameters for the v-svm model. Details on the parameters to be determined and the genetic algorithm tool can be found in Zhang and Xie (6) and are not replicated here. The GP model was implemented by using a widely accepted package called Gaussian Processes for Machine Learning (GPML) (6). This package is based on the MATrix LABoratory (MATLAB) programming language platform. Customized MATLAB codes have also been developed to process the raw and output data and call functions in the GPML package. The training and testing process of the GP and v-svm models is illustrated in Figure 4. ARIMA Model ARIMA models were also fitted for the four data sets because of their popularity in traffic flow forecasting research ( 3, 8). An auto.arima forecasting program was used in the R Project for Statistical Computing to select the best-fit ARIMA model for each test data set (7). The best models selected by auto.arima and their corresponding Akaike information criterion (AIC), Bayesian information criterion (BIC), and log likelihood values are listed in Table. Both AIC and BIC are commonly accepted criteria for model selection. In general, models Untrained GP/ v-svm Models FIGURE 4 4 th week s input data (x only) First 3 weeks data x (x&y) x y Trained GP/ v-svm Models 4 th week s predicted data Training and evaluation of GP and v-svm models. ŷ
6 74 Transportation Research Record 65 TABLE ARIMA Models for Four Data Sets Data ARIMA Model Set Order AIC BIC Log Likelihood ARIMA(,,) 3, ,98.3 5, ARIMA(5,,4) 30, , ,5.7 3 ARIMA(3,,) 9, ,6.70 4, ARIMA(3,,4) 9, , ,90.3 with lower AIC and BIC values should be selected. Although the collected data sets showed seasonal patterns, seasonal ARIMA models were not selected by the auto.arima program. TABLE 3 Comparison of Two-Step-Ahead Forecasting Result GP v-svm ARIMA Data Set MAPE (%) RMSE Data Set MAPE (%) RMSE Data Set 3 MAPE (%) RMSE Data Set 4 MAPE (%) RMSE Measurements of Effectiveness Mean absolute percentage error (MAPE) and root mean square error (RMSE) are two commonly used criteria to evaluate and compare prediction methods. Adopted for use in this study, MAPE and RMSE are defined below: MAPE = N N k= vol ˆ ( k) vol( k) 00% ( 7) vol k N RMSE = ( vol ˆ ( k) vol( k) ) ( 8) N k= ( ) where vol(k) is the observed traffic volume at time step k and vol(k) is the corresponding predicted traffic volume. Each time step in this study was equivalent to 5 min. N was the size of the testing data set (total number of time steps). Results Analysis and Comparison The one- and two-step-ahead forecasting results are listed in Tables and 3, respectively. It can be easily seen that, for all data sets, the GP TABLE Comparison of One-Step-Ahead Forecasting Result GP v-svm ARIMA Data Set MAPE (%) RMSE Data Set MAPE (%) RMSE Data Set 3 MAPE (%) RMSE Data Set 4 MAPE (%) RMSE and v-svm models performed consistently better than the ARIMA models for one- and two-step-ahead forecasting. In some cases, the improvements in performance were quite significant. The two-stepahead forecasting MAPEs for Data Set for the GP and v-svm models were 0.7% and 0.9%, respectively, while the MAPE for the ARIMA (5,, 4) model was 7.3%. Tables and 3 also show that the superior performance by the GP and v-svm models was more significant in the two-step- than in the one-step-ahead forecasting. The GP and v-svm models performed about equally in terms of MAPE and RMSE in all cases. It was difficult to tell which method was absolutely better on the basis of the MAPE and RMSE values. However, the GP model possesses a desirable feature that distinguishes it from the v-svm model: Not only does it produce point traffic flow estimates, the GP model also generates standard deviations (i.e., error bar) for the predicted traffic flow values. To use the one-step-ahead forecasting result as one example, Figures 5 and 6 show the predicted traffic volumes and their standard deviations for Tuesday s data in Week 4 with the GP model. As Figures 5 and 6 show, the predicted standard deviations seemed not to be directly related to the magnitude of the observed traffic volumes. When the observed traffic volume is at its peak, the corresponding predicted standard deviation may not necessarily be at its largest (see Figure 6). Usually, predicted standard deviations become larger when there are drastic changes in the observed traffic volume data. This additional standard deviation information from the GP model could help traffic control and management personnel better assess the quality and reliability of the predicted data and use the predicted values wisely. From the predicted and observed traffic volumes plotted in Figures 5 and 6, it can be seen that the predicted traffic volumes (solid lines) closely followed the observed traffic flow data (red dots) for all test data sets. The predicted traffic volumes and standard deviations data were combined to create upper and lower boundaries for the four test data sets. Specifically, the upper and lower boundaries were created by adding or subtracting the standard deviations to or from the predicted traffic volumes. The resultant boundaries are shown in Figures 7 and 8. These two figures clearly show that in most cases the observed traffic data points were within the upper and lower boundaries generated from the GP model outputs. This result was encouraging. It confirms that the predicted traffic volume data closely follow the
7 Xie, Zhao, Sun, and Chen 75 (c) (d) FIGURE 5 One-step-ahead predicted volumes: Data Set and Data Set with GP model. Predicted standard deviations: (c) Data Set and (d ) Data Set. (c) (d) FIGURE 6 One-step-ahead predicted volumes: Data Set 3 and Data Set 4 with GP model. Predicted standard deviations: (c) Data Set 3 and (d ) Data Set 4.
8 76 Transportation Research Record 65 FIGURE 7 One-step-ahead prediction boundaries for Data Set and Data Set with GP model. FIGURE 8 One-step-ahead prediction boundaries for Data Set 3 and Data Set 4 with GP model.
9 Xie, Zhao, Sun, and Chen 77 observed traffic trends. In addition, it suggests that standard deviations are credible and can be used to generate useful intervals for predicted traffic volume data. DISCUSSION AND CONCLUSIONS This paper introduces a GP model into traffic flow modeling and forecasting. The proposed GP model was tested on four data sets collected on three interstate highways in Seattle, Washington. Two other promising traffic flow forecasting models, v-svm and ARIMA, were also tested with the same data sets. Their forecasting results were compared with those produced by the GP model. Two types of forecasting were conducted: one- and two-step-ahead forecasting. The results indicated that the GP and v-svm models outperformed the ARIMA model in all cases. They further showed that the GP and v-svm models outperformed the ARIMA models more significantly in the two-step-ahead forecasting than in the one-step. The GP and v-svm models are distribution-free learning algorithms and can be applied to handle many types of data that are not necessarily normally distributed for both classification and regression purposes (0, 5). However, the formulations of the two algorithms are based on very different modeling frameworks. The v-svm model is used to formulate and solve a nonlinear optimization problem, whereas the GP model is based on a full Bayesian framework. Nevertheless, the overall forecasting performances of the GP and v-svm models in this study were similar. This result probably can be explained by the fact that both are kernel-based machine learning methods. The full Bayesian framework enables the GP model to generate standard deviation estimates in addition to the predicted traffic flow volumes. This information could be useful to assess the reliability of the traffic flow predictions and to make better use of the predicted data. Unfortunately, such information cannot be readily obtained from the v-svm model. The estimated standard deviations were further plotted against the observed traffic volume data in Figures 5 and 6. The result suggests that the estimated standard deviations become larger when there are drastic changes in the observed traffic volume data. The predicted traffic volume and standard deviation data from the GP model were combined to generate upper and lower boundaries for each test data set. The result shows that most observed traffic flow data points fall into the predicted upper and lower boundaries (see Figures 7 and 8). The overall forecasting performance of the proposed GP model was satisfying. The model comparison results suggest that the GP model significantly outperforms the commonly used ARIMA model. Previously, the v-svm model has been shown to outperform multilayer, feed-forward neural network models in prediction accuracy and generalization capability (6). The results in this study indicate that the proposed GP model performs slightly better than the v-svm model in most cases and that it can generate useful standard deviation estimates, which the v-svm model cannot. In summary, the GP model offers a promising way to model and forecast traffic flow, and it has emerged as a serious competitor to the v-svm model. FUTURE WORK This study showed that the GP and v-svm models consistently outperformed the ARIMA model on all four data sets. Additional tests on other data sets are necessary to further confirm the superiority of these kernel-based machine learning methods over conventional modeling tools such as ARIMA. In particular, the GP model should be tested on data sets that exhibit clear seasonal patterns with erroneous values or missing data points, and then compared with other models such as the seasonal ARIMA model. ACKNOWLEDGMENTS The authors thank Daniel J. Dailey, University of Washington, Seattle, for permission to use the TDAD database and the James E. Clyburn University Transportation Center, South Carolina State University, Orangeburg, for its financial support. REFERENCES. Williams, B. M., P. K. Durvasula, and D. E. Brown. Urban Freeway Traffic Flow Prediction: Application of Seasonal Autoregressive Integrated Moving Average and Exponential Smoothing Models. In Transportation Research Record 644, TRB, National Research Council, Washington, D.C., 998, pp Ahmed, M. S., and A. R. Cook. Analysis of Freeway Traffic Time- Series Data By Using Box-Jenkins Techniques. In Transportation Research Record 7, TRB, National Research Council, Washington, D.C., 979, pp Nihan, N. L., and K. O. Holmesland. Use of the Box and Jenkins Time Series Technique in Traffic Forecasting. Transportation, Vol. 9, No., 980, pp Davis, G. A., and N. L. Nihan. Nonparametric Regression and Short-Term Freeway Traffic Forecasting. Journal of Transportation Engineering, Vol. 7, No., 99, pp Smith, B. L., and M. J. Demetsky. Traffic Flow Forecasting: Comparison of Modeling Approaches. Journal of Transportation Engineering, Vol. 3, No. 4, 997, pp Okutani, I., and Y. J. Stephanedes. Dynamic Prediction of Traffic Volume Through Kalman Filtering Theory. Transportation Research Part B, Vol. 8, No., 984, pp.. 7. Stathopoulos, A., and M. G. Karlaftis. A Multivariate State Space Approach for Urban Traffic Flow Modeling and Prediction. Transportation Research Part C, Vol., No., 003, pp Xie, Y., Y. Zhang, and Z. Ye. Short-Term Traffic Volume Forecasting Using Kalman Filter with Discrete Wavelet Decomposition. Computer- Aided Civil and Infrastructure Engineering, Vol., No. 5, 007, pp Smith, B. L., and M. J. Demetsky. Short-Term Traffic Flow Prediction: Neural Network Approach. In Transportation Research Record 453, TRB, National Research Council, Washington, D.C., 994, pp Park, B., C. J. Messer, and T. Urbanik II. Short-Term Freeway Traffic Volume Forecasting Using Radial Basis Function Neural Network. In Transportation Research Record 65, TRB, National Research Council, Washington, D.C., 998, pp Yin, H. B., S. C. Wong, J. M. Xu, and C. K. Wong. Urban Traffic Flow Prediction Using a Fuzzy-Neural Approach. Transportation Research Part C, Vol. 0, No., 00, pp Xie, Y., and Y. Zhang. A Wavelet Network Model for Short-Term Traffic Volume Forecasting. Journal of Intelligent Transportation Systems: Technology, Planning, and Operations, Vol. 0, No. 3, 006, pp Van Lint, J. W. C., S. P. Hoogendoorn, and H. J. Van Zuylen. Accurate Freeway Travel Time Prediction with State-Space Neural Networks Under Missing Data. Transportation Research Part C, Vol. 3, Nos. 5 6, 005, pp Park, D., and L. R. Rilett. Forecasting Freeway Link Travel Times with a Multilayer Feedforward Neural Network. Computer-Aided Civil and Infrastructure Engineering, Vol. 4, No. 5, 999, pp
10 78 Transportation Research Record Vlahogianni, E. I., M. G. Karlaftis, and J. C. Golias. Optimized and Meta- Optimized Neural Networks for Short-Term Traffic Flow Prediction: A Genetic Approach. Transportation Research Part C, Vol. 3, No. 3, 005, pp Zhang, Y., and Y. Xie. Forecasting of Short-Term Freeway Volume with v-support Vector Machines, In Transportation Research Record: Journal of the Transportation Research Board, No. 04, Transportation Research Board of the National Academies, Washington, D.C., 007, pp Wu, C. H., J. M. Ho, and D. T. Lee. Travel-Time Prediction with Support Vector Regression. IEEE Transactions on Intelligent Transportation Systems, Vol. 5, No. 4, 004, pp Van Der Voort, M., M. Dougherty, and S. Watson. Combining Kohonen Maps with ARIMA Time Series Models to Forecast Traffic Flow. Transportation Research Part C, Vol. 4, No. 5, 996, pp Hornik, K., M. Stinchcombe, and H. White. Multilayer Feedforward Networks Are Universal Approximators. Neural Networks, Vol., No. 5, 989, pp Rasmussen, C. E., and C. K. I. Williams. Gaussian Processes for Machine Learning. The MIT Press, Cambridge, Mass., Suykens, J. A. K., T. V. Gestel, J. D. Brabanter, B. D. Moor, and J. Vanderwalle. Least Squares Support Vector Machines. World Scientific Publishing Co. Pte. Ltd., Singapore, 00.. MacKay, D. J. C. Gaussian Processes A Replacement for Supervised Neural Networks? Tutorial. Neural Information Processing Systems Foundation, La Jolla, Calif., Seeger, M. Gaussian Processes for Machine Learning. International Journal of Neural Systems, Vol. 4, No., 004, pp Chu, W., S. S. Keerthi, and C. J. Ong. Bayesian Trigonometric Support Vector Classifier. Neural Computation, Vol. 5, No. 9, 003, pp Zhao, K., S. C. Popescu, and X. Zhang. Bayesian Learning with Gaussian Processes for Supervised Classification of Hyperspectral Data. Photogrammetric Engineering & Remote Sensing, Vol. 74, No. 0, 008, pp Documentation for GPML MATLAB Code. process.org/gpml/code/matlab/doc/. Accessed Nov., The R Project for Statistical Computing (Version.9.). r-project.org/. Accessed July 3, 009. The Statistical Methods Committee peer-reviewed this paper.
Short-term traffic volume prediction using neural networks
Short-term traffic volume prediction using neural networks Nassim Sohaee Hiranya Garbha Kumar Aswin Sreeram Abstract There are many modeling techniques that can predict the behavior of complex systems,
More informationBAYESIAN CLASSIFICATION OF HIGH DIMENSIONAL DATA WITH GAUSSIAN PROCESS USING DIFFERENT KERNELS
BAYESIAN CLASSIFICATION OF HIGH DIMENSIONAL DATA WITH GAUSSIAN PROCESS USING DIFFERENT KERNELS Oloyede I. Department of Statistics, University of Ilorin, Ilorin, Nigeria Corresponding Author: Oloyede I.,
More informationModel Selection for Gaussian Processes
Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II
1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector
More informationGaussian Process Regression
Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process
More informationReliability Monitoring Using Log Gaussian Process Regression
COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical
More informationBayesian Support Vector Machines for Feature Ranking and Selection
Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationFREEWAY SHORT-TERM TRAFFIC FLOW FORECASTING BY CONSIDERING TRAFFIC VOLATILITY DYNAMICS AND MISSING DATA SITUATIONS. A Thesis YANRU ZHANG
FREEWAY SHORT-TERM TRAFFIC FLOW FORECASTING BY CONSIDERING TRAFFIC VOLATILITY DYNAMICS AND MISSING DATA SITUATIONS A Thesis by YANRU ZHANG Submitted to the Office of Graduate Studies of Texas A&M University
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationNeutron inverse kinetics via Gaussian Processes
Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationGaussian Processes (10/16/13)
STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs
More informationLearning Gaussian Process Models from Uncertain Data
Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada
More informationTRAFFIC FLOW MODELING AND FORECASTING THROUGH VECTOR AUTOREGRESSIVE AND DYNAMIC SPACE TIME MODELS
TRAFFIC FLOW MODELING AND FORECASTING THROUGH VECTOR AUTOREGRESSIVE AND DYNAMIC SPACE TIME MODELS Kamarianakis Ioannis*, Prastacos Poulicos Foundation for Research and Technology, Institute of Applied
More informationGaussian process for nonstationary time series prediction
Computational Statistics & Data Analysis 47 (2004) 705 712 www.elsevier.com/locate/csda Gaussian process for nonstationary time series prediction Soane Brahim-Belhouari, Amine Bermak EEE Department, Hong
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationIntroduction to Gaussian Process
Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationOptimization of Gaussian Process Hyperparameters using Rprop
Optimization of Gaussian Process Hyperparameters using Rprop Manuel Blum and Martin Riedmiller University of Freiburg - Department of Computer Science Freiburg, Germany Abstract. Gaussian processes are
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationNonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group
Nonparmeteric Bayes & Gaussian Processes Baback Moghaddam baback@jpl.nasa.gov Machine Learning Group Outline Bayesian Inference Hierarchical Models Model Selection Parametric vs. Nonparametric Gaussian
More informationRelevance Vector Machines for Earthquake Response Spectra
2012 2011 American American Transactions Transactions on on Engineering Engineering & Applied Applied Sciences Sciences. American Transactions on Engineering & Applied Sciences http://tuengr.com/ateas
More informationCHAPTER 5 FORECASTING TRAVEL TIME WITH NEURAL NETWORKS
CHAPTER 5 FORECASTING TRAVEL TIME WITH NEURAL NETWORKS 5.1 - Introduction The estimation and predication of link travel times in a road traffic network are critical for many intelligent transportation
More informationBayesian Support Vector Regression for Traffic Speed Prediction with Error Bars
Bayesian Support Vector Regression for Traffic Speed Prediction with Error Bars Gaurav Gopi, Justin Dauwels, Muhammad Tayyab Asif, Sridhar Ashwin, Nikola Mitrovic, Umer Rasheed, Patrick Jaillet Abstract
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationCOMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation
COMP 55 Applied Machine Learning Lecture 2: Bayesian optimisation Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material posted
More informationTraffic Flow Forecasting for Urban Work Zones
TRAFFIC FLOW FORECASTING FOR URBAN WORK ZONES 1 Traffic Flow Forecasting for Urban Work Zones Yi Hou, Praveen Edara, Member, IEEE and Carlos Sun Abstract None of numerous existing traffic flow forecasting
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationMultiple-step Time Series Forecasting with Sparse Gaussian Processes
Multiple-step Time Series Forecasting with Sparse Gaussian Processes Perry Groot ab Peter Lucas a Paul van den Bosch b a Radboud University, Model-Based Systems Development, Heyendaalseweg 135, 6525 AJ
More informationHow to build an automatic statistician
How to build an automatic statistician James Robert Lloyd 1, David Duvenaud 1, Roger Grosse 2, Joshua Tenenbaum 2, Zoubin Ghahramani 1 1: Department of Engineering, University of Cambridge, UK 2: Massachusetts
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationStatistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes
Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan 1, M. Koval and P. Parashar 1 Applications of Gaussian
More informationVC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms
03/Feb/2010 VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms Presented by Andriy Temko Department of Electrical and Electronic Engineering Page 2 of
More informationElectric Load Forecasting Using Wavelet Transform and Extreme Learning Machine
Electric Load Forecasting Using Wavelet Transform and Extreme Learning Machine Song Li 1, Peng Wang 1 and Lalit Goel 1 1 School of Electrical and Electronic Engineering Nanyang Technological University
More informationCSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes
CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes Roger Grosse Roger Grosse CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes 1 / 55 Adminis-Trivia Did everyone get my e-mail
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationTime Series I Time Domain Methods
Astrostatistics Summer School Penn State University University Park, PA 16802 May 21, 2007 Overview Filtering and the Likelihood Function Time series is the study of data consisting of a sequence of DEPENDENT
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationPredicting freeway traffic in the Bay Area
Predicting freeway traffic in the Bay Area Jacob Baldwin Email: jtb5np@stanford.edu Chen-Hsuan Sun Email: chsun@stanford.edu Ya-Ting Wang Email: yatingw@stanford.edu Abstract The hourly occupancy rate
More informationRobust Building Energy Load Forecasting Using Physically-Based Kernel Models
Article Robust Building Energy Load Forecasting Using Physically-Based Kernel Models Anand Krishnan Prakash 1, Susu Xu 2, *,, Ram Rajagopal 3 and Hae Young Noh 2 1 Energy Science, Technology and Policy,
More informationTutorial on Gaussian Processes and the Gaussian Process Latent Variable Model
Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,
More informationEfficient Resonant Frequency Modeling for Dual-Band Microstrip Antennas by Gaussian Process Regression
1 Efficient Resonant Frequency Modeling for Dual-Band Microstrip Antennas by Gaussian Process Regression J. P. Jacobs Abstract A methodology based on Gaussian process regression (GPR) for accurately modeling
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationData-Driven Forecasting Algorithms for Building Energy Consumption
Data-Driven Forecasting Algorithms for Building Energy Consumption Hae Young Noh a and Ram Rajagopal b a Department of Civil and Environmental Engineering, Carnegie Mellon University, Pittsburgh, PA, 15213,
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationA Hybrid Time-delay Prediction Method for Networked Control System
International Journal of Automation and Computing 11(1), February 2014, 19-24 DOI: 10.1007/s11633-014-0761-1 A Hybrid Time-delay Prediction Method for Networked Control System Zhong-Da Tian Xian-Wen Gao
More informationMODELLING TRAFFIC FLOW ON MOTORWAYS: A HYBRID MACROSCOPIC APPROACH
Proceedings ITRN2013 5-6th September, FITZGERALD, MOUTARI, MARSHALL: Hybrid Aidan Fitzgerald MODELLING TRAFFIC FLOW ON MOTORWAYS: A HYBRID MACROSCOPIC APPROACH Centre for Statistical Science and Operational
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationDevelopment of Stochastic Artificial Neural Networks for Hydrological Prediction
Development of Stochastic Artificial Neural Networks for Hydrological Prediction G. B. Kingston, M. F. Lambert and H. R. Maier Centre for Applied Modelling in Water Engineering, School of Civil and Environmental
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationDEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures
More informationProbabilistic Reasoning in Deep Learning
Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian
More informationMark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.
University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x
More informationPREDICTING SOLAR GENERATION FROM WEATHER FORECASTS. Chenlin Wu Yuhan Lou
PREDICTING SOLAR GENERATION FROM WEATHER FORECASTS Chenlin Wu Yuhan Lou Background Smart grid: increasing the contribution of renewable in grid energy Solar generation: intermittent and nondispatchable
More informationLecture 9. Time series prediction
Lecture 9 Time series prediction Prediction is about function fitting To predict we need to model There are a bewildering number of models for data we look at some of the major approaches in this lecture
More informationSINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES
SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES JIANG ZHU, SHILIANG SUN Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 20024, P. R. China E-MAIL:
More informationUnivariate Short-Term Prediction of Road Travel Times
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Univariate Short-Term Prediction of Road Travel Times D. Nikovski N. Nishiuma Y. Goto H. Kumazawa TR2005-086 July 2005 Abstract This paper
More informationKernel methods, kernel SVM and ridge regression
Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;
More informationSystem identification and control with (deep) Gaussian processes. Andreas Damianou
System identification and control with (deep) Gaussian processes Andreas Damianou Department of Computer Science, University of Sheffield, UK MIT, 11 Feb. 2016 Outline Part 1: Introduction Part 2: Gaussian
More informationGaussian Processes for Machine Learning
Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of
More informationStatistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes
Statistical Techniques in Robotics (6-83, F) Lecture# (Monday November ) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan Applications of Gaussian Processes (a) Inverse Kinematics
More informationGaussian Processes. 1 What problems can be solved by Gaussian Processes?
Statistical Techniques in Robotics (16-831, F1) Lecture#19 (Wednesday November 16) Gaussian Processes Lecturer: Drew Bagnell Scribe:Yamuna Krishnamurthy 1 1 What problems can be solved by Gaussian Processes?
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More informationICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts
ICML 2015 Scalable Nonparametric Bayesian Inference on Point Processes with Gaussian Processes Machine Learning Research Group and Oxford-Man Institute University of Oxford July 8, 2015 Point Processes
More informationTIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA
CHAPTER 6 TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA 6.1. Introduction A time series is a sequence of observations ordered in time. A basic assumption in the time series analysis
More informationMaximum Direction to Geometric Mean Spectral Response Ratios using the Relevance Vector Machine
Maximum Direction to Geometric Mean Spectral Response Ratios using the Relevance Vector Machine Y. Dak Hazirbaba, J. Tezcan, Q. Cheng Southern Illinois University Carbondale, IL, USA SUMMARY: The 2009
More informationExpectation Propagation for Approximate Bayesian Inference
Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationINFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES
INFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES SHILIANG SUN Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 20024, China E-MAIL: slsun@cs.ecnu.edu.cn,
More informationPractical Bayesian Optimization of Machine Learning. Learning Algorithms
Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that
More informationMODULE -4 BAYEIAN LEARNING
MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationPattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods
Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationComputer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression
Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationExpectation propagation for signal detection in flat-fading channels
Expectation propagation for signal detection in flat-fading channels Yuan Qi MIT Media Lab Cambridge, MA, 02139 USA yuanqi@media.mit.edu Thomas Minka CMU Statistics Department Pittsburgh, PA 15213 USA
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationPrediction of double gene knockout measurements
Prediction of double gene knockout measurements Sofia Kyriazopoulou-Panagiotopoulou sofiakp@stanford.edu December 12, 2008 Abstract One way to get an insight into the potential interaction between a pair
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationThe Variational Gaussian Approximation Revisited
The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much
More informationProbabilistic Graphical Models Lecture 20: Gaussian Processes
Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationLecture 3: Statistical Decision Theory (Part II)
Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical
More informationA Data-Driven Model for Software Reliability Prediction
A Data-Driven Model for Software Reliability Prediction Author: Jung-Hua Lo IEEE International Conference on Granular Computing (2012) Young Taek Kim KAIST SE Lab. 9/4/2013 Contents Introduction Background
More informationGaussian Processes in Machine Learning
Gaussian Processes in Machine Learning November 17, 2011 CharmGil Hong Agenda Motivation GP : How does it make sense? Prior : Defining a GP More about Mean and Covariance Functions Posterior : Conditioning
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of
More informationModeling human function learning with Gaussian processes
Modeling human function learning with Gaussian processes Thomas L. Griffiths Christopher G. Lucas Joseph J. Williams Department of Psychology University of California, Berkeley Berkeley, CA 94720-1650
More informationContent. Learning. Regression vs Classification. Regression a.k.a. function approximation and Classification a.k.a. pattern recognition
Content Andrew Kusiak Intelligent Systems Laboratory 239 Seamans Center The University of Iowa Iowa City, IA 52242-527 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak Introduction to learning
More informationGaussian Process Regression Networks
Gaussian Process Regression Networks Andrew Gordon Wilson agw38@camacuk mlgengcamacuk/andrew University of Cambridge Joint work with David A Knowles and Zoubin Ghahramani June 27, 2012 ICML, Edinburgh
More informationGaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak
Gaussian processes and bayesian optimization Stanisław Jastrzębski kudkudak.github.io kudkudak Plan Goal: talk about modern hyperparameter optimization algorithms Bayes reminder: equivalent linear regression
More information