Transportation Big Data Analytics

Size: px
Start display at page:

Download "Transportation Big Data Analytics"

Transcription

1 Transportation Big Data Analytics Regularization Xiqun (Michael) Chen College of Civil Engineering and Architecture Zhejiang University, Hangzhou, China Fall, 2016 Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 1 / 86

2 Outline 1 Subset Selection Best Subset Selection Stepwise Selection Choosing the Optimal Model 2 Shrinkage Methods Ridge Regression Lasso Selecting the Tuning Parameter 3 Dimension Reduction Methods 4 Applications Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 2 / 86

3 Subset Selection Best Subset Selection Outline 1 Subset Selection Best Subset Selection Stepwise Selection Choosing the Optimal Model 2 Shrinkage Methods Ridge Regression Lasso Selecting the Tuning Parameter 3 Dimension Reduction Methods 4 Applications Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 3 / 86

4 Subset Selection Best Subset Selection Best Subset Selection Fit a separate least squares regression for each possible combination of the p predictors That is, we fit all p models that contain exactly one predictor, all p(p 1)/2 models that contain exactly two predictors, and so forth We then look at all of the resulting models, with the goal of identifying the one that is best The problem of selecting the best model from among the 2 p possibilities considered by best subset selection is not trivial Algorithm 1 Best subset selection 1 Let M 0 denote the null model, which contains no predictors This model simply predicts the sample mean for each observation 2 For k = 1, 2, p: (a) Fit all C k p models that contain exactly k predictors (b) Pick the best among these C k p models, and call it M k Here best is defined as having the smallest RSS, or equivalently largest R 2 3 Select a single best model from among M 0,, M p using cross-validated prediction error, C p (AIC), BIC, or adjusted R 2 Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 4 / 86

5 Subset Selection Best Subset Selection Best Subset Selection For each possible model containing a subset of the all predictors, the red frontier tracks the best model for a given number of predictors, according to RSS and R 2 Model improves as the number of variables increases, however, from the three-variable model on, there is little improvement in RSS and R 2 Residual Sum of Squares 2e+07 4e+07 6e+07 8e+07 R Number of Predictors Number of Predictors Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 5 / 86

6 Subset Selection Stepwise Selection Outline 1 Subset Selection Best Subset Selection Stepwise Selection Choosing the Optimal Model 2 Shrinkage Methods Ridge Regression Lasso Selecting the Tuning Parameter 3 Dimension Reduction Methods 4 Applications Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 6 / 86

7 Subset Selection Stepwise Selection Forward Stepwise Selection Forward stepwise selection is a computationally efficient alternative to best subset selection While the best subset selection procedure considers all 2 p possible models containing subsets of the p predictors, forward stepwise considers a much smaller set of models Forward stepwise selection begins with a model containing no predictors, and then adds predictors to the model, one-at-a-time, until all of the predictors are in the model Algorithm 2 Forward stepwise selection 1 Let M 0 denote the null model, which contains no predictors 2 For k = 0, 1, p 1: (a) Consider all p k models that augment the predictors in M k with one additional predictor (b) Choose the best among these p k models, and call it M k+1 Here best is defined as having smallest RSS or highest R 2 3 Select a single best model from among M 0,, M p using cross-validated prediction error, C p (AIC), BIC, or adjusted R 2 Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 7 / 86

8 Subset Selection Stepwise Selection Backward Stepwise Selection Like forward stepwise selection, backward stepwise selection provides an efficient alternative to best subset selection Unlike forward stepwise selection, it begins with the full least squares model containing all p predictors, and then iteratively removes the least useful predictor, one-at-a-time Algorithm 3 Backward stepwise selection 1 Let M p denote the full model, which contains all p predictors 2 For k = p, p 1, 1: (a) Consider all k models that contain all but one of the predictors in M k, for a total of k 1 predictors Choose the best among these k models, and call it M k 1 Here best is defined as having smallest RSS or highest R 2 3 Select a single best model from among M 0,, M p using cross-validated prediction error, C p (AIC), BIC, or adjusted R 2 Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 8 / 86

9 Subset Selection Choosing the Optimal Model Outline 1 Subset Selection Best Subset Selection Stepwise Selection Choosing the Optimal Model 2 Shrinkage Methods Ridge Regression Lasso Selecting the Tuning Parameter 3 Dimension Reduction Methods 4 Applications Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 9 / 86

10 Subset Selection Choosing the Optimal Model Choosing the Optimal Model The model containing all of the predictors will always have the smallest RSS and the largest R 2, since these quantities are related to the training error Instead, we wish to choose a model with a low test error The training error can be a poor estimate of the test error Therefore, RSS and R 2 are not suitable for selecting the best model among a collection of models with different numbers of predictors Two common approaches 1 Indirectly estimate test error by making an adjustment to the training error to account for the bias due to overfitting 2 Directly estimate the test error, using either a validation set approach or a cross-validation approach Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 10 / 86

11 Subset Selection Choosing the Optimal Model Indirect Estimate Test Errors C p statistic: an unbiased estimate of test MSE For a fitted least squares model containing d predictors, the C p estimate of test MSE is given by C p = 1 n (RSS + 2dˆσ2 ) (1) where ˆσ 2 is an estimate of the variance of the error associated with each response measurement Akaike Information Criterion (AIC) Bayesian Information Criterion (BIC) AIC = 1 nˆσ 2 (RSS + 2dˆσ2 ) (2) BIC = 1 n (RSS + log(n)dˆσ2 ) (3) Adjusted R 2 = 1 RSS/(n d 1) TSS/(n 1) Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 11 / 86 (4)

12 Subset Selection Choosing the Optimal Model Indirect Estimate Test Errors C p statistic: an unbiased estimate of test MSE For a fitted least squares model containing d predictors, the C p estimate of test MSE is given by C p = 1 n (RSS + 2dˆσ2 ) (1) where ˆσ 2 is an estimate of the variance of the error associated with each response measurement Akaike Information Criterion (AIC) Bayesian Information Criterion (BIC) AIC = 1 nˆσ 2 (RSS + 2dˆσ2 ) (2) BIC = 1 n (RSS + log(n)dˆσ2 ) (3) Adjusted R 2 = 1 RSS/(n d 1) TSS/(n 1) Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 11 / 86 (4)

13 Subset Selection Choosing the Optimal Model Indirect Estimate Test Errors C p statistic: an unbiased estimate of test MSE For a fitted least squares model containing d predictors, the C p estimate of test MSE is given by C p = 1 n (RSS + 2dˆσ2 ) (1) where ˆσ 2 is an estimate of the variance of the error associated with each response measurement Akaike Information Criterion (AIC) Bayesian Information Criterion (BIC) AIC = 1 nˆσ 2 (RSS + 2dˆσ2 ) (2) BIC = 1 n (RSS + log(n)dˆσ2 ) (3) Adjusted R 2 = 1 RSS/(n d 1) TSS/(n 1) Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 11 / 86 (4)

14 Subset Selection Choosing the Optimal Model Indirect Estimate Test Errors C p statistic: an unbiased estimate of test MSE For a fitted least squares model containing d predictors, the C p estimate of test MSE is given by C p = 1 n (RSS + 2dˆσ2 ) (1) where ˆσ 2 is an estimate of the variance of the error associated with each response measurement Akaike Information Criterion (AIC) Bayesian Information Criterion (BIC) AIC = 1 nˆσ 2 (RSS + 2dˆσ2 ) (2) BIC = 1 n (RSS + log(n)dˆσ2 ) (3) Adjusted R 2 = 1 RSS/(n d 1) TSS/(n 1) Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 11 / 86 (4)

15 Subset Selection Choosing the Optimal Model Best Subset Selection For least squares models, C p and AIC are proportional to each other, so only C p is displayed C p and BIC are estimates of test MSE BIC shows an increase after four variables are selected The other two plots are rather flat after four variables are included C p BIC Adjusted R Number of Predictors Number of Predictors Number of Predictors Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 12 / 86

16 Subset Selection Choosing the Optimal Model Validation and Cross-Validation Direct estimate of the test error, fewer assumptions about the true underlying model Cross-validation is a very attractive approach for selecting from among a number of models under consideration For d ranging from 1 to 11 The overall best model is shown as a blue cross Square Root of BIC Validation Set Error Cross Validation Error Number of Predictors Number of Predictors Number of Predictors Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 13 / 86

17 Shrinkage Methods Shrinkage Methods Shrinkage Subset selection methods involve using least squares to fit a linear model that contains a subset of the predictors As an alternative, shrinkage can fit a model containing all p predictors using a technique that constrains or regularizes the coefficient estimates, or equivalently, that shrinks the coefficient estimates towards zero The estimated coefficients are shrunken towards zero relative to the least squares estimates This shrinkage (also known as regularization) has the effect of reducing variance Depending on what type of shrinkage is performed, some of the coefficients may be estimated to be exactly zero Hence, shrinkage methods can also perform variable selection Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 14 / 86

18 Shrinkage Methods Ridge Regression Outline 1 Subset Selection Best Subset Selection Stepwise Selection Choosing the Optimal Model 2 Shrinkage Methods Ridge Regression Lasso Selecting the Tuning Parameter 3 Dimension Reduction Methods 4 Applications Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 15 / 86

19 Shrinkage Methods Ridge Regression Ridge Regression Standard Linear Model Y = β 0 + β 1 X β p X p + ϵ (5) Least Squares ( ) 2 n p RSS = y i β 0 β j x ij (6) i=1 j=1 Ridge Regression Objective Function ( ) 2 n p y i β 0 β j x ij + λ βj 2 = RSS + λ i=1 j=1 j=1 j=1 where λ 0 is a tuning parameter, to be determined separately The second terms is called a shrinkage penalty, is small when β are close to zero Ridge regression will produce a different set of coefficient estimates, ˆβ R λ, for each value of λ Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 16 / 86 p p βj 2 (7)

20 Shrinkage Methods Ridge Regression Ridge Regression Standard Linear Model Y = β 0 + β 1 X β p X p + ϵ (5) Least Squares ( ) 2 n p RSS = y i β 0 β j x ij (6) i=1 j=1 Ridge Regression Objective Function ( ) 2 n p y i β 0 β j x ij + λ βj 2 = RSS + λ i=1 j=1 j=1 j=1 where λ 0 is a tuning parameter, to be determined separately The second terms is called a shrinkage penalty, is small when β are close to zero Ridge regression will produce a different set of coefficient estimates, ˆβ R λ, for each value of λ Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 16 / 86 p p βj 2 (7)

21 Shrinkage Methods Ridge Regression Ridge Regression Standard Linear Model Y = β 0 + β 1 X β p X p + ϵ (5) Least Squares ( ) 2 n p RSS = y i β 0 β j x ij (6) i=1 j=1 Ridge Regression Objective Function ( ) 2 n p y i β 0 β j x ij + λ βj 2 = RSS + λ i=1 j=1 j=1 j=1 where λ 0 is a tuning parameter, to be determined separately The second terms is called a shrinkage penalty, is small when β are close to zero Ridge regression will produce a different set of coefficient estimates, ˆβ R λ, for each value of λ Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 16 / 86 p p βj 2 (7)

22 Shrinkage Methods Ridge Regression Why Does Ridge Regression Improve Over Least Squares? Ridge Regression s Advantage Ridge regression s advantage over least squares is rooted in the bias-variance trade-off As λ increases, the flexibility decreases, leading to decreased variance but increased bias Mean Squared Error Mean Squared Error e 01 1e+01 1e λ ˆβ R λ 2/ ˆβ 2 Squared bias (black), variance (green), and test mean squared error (purple) for the ridge regression predictions on a simulated data set, as a function of λ and ˆβ R λ 2/ ˆβ 2 The horizontal dashed lines indicate the minimum possible MSE The purple crosses indicate the smallest MSE Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 17 / 86

23 Shrinkage Methods Lasso Outline 1 Subset Selection Best Subset Selection Stepwise Selection Choosing the Optimal Model 2 Shrinkage Methods Ridge Regression Lasso Selecting the Tuning Parameter 3 Dimension Reduction Methods 4 Applications Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 18 / 86

24 Shrinkage Methods Lasso Lasso Motivation: A Disadvantage of Ridge Regression Ridge regression includes all p predictors in the final model The penalty will shrink all of the coefficients towards zero, but it will not set any of them exactly to zero (unless λ = inf) This may not be a problem for prediction accuracy, but it can create a challenge in model interpretation in settings in which the number of variables p is quite large Lasso Objective Function ( n y i β 0 i=1 ) 2 p β j x ij + λ j=1 selection) when the tuning parameter λ is sufficiently large p β j = RSS + λ j=1 p β j (8) where λ j is the lasso penalty (l 1 norm instead of l 2 norm used in ridge regression) Lasso has the effect of forcing some of the coefficient estimates to be exactly equal to zero (variable j=1 Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 19 / 86

25 Shrinkage Methods Lasso Lasso Motivation: A Disadvantage of Ridge Regression Ridge regression includes all p predictors in the final model The penalty will shrink all of the coefficients towards zero, but it will not set any of them exactly to zero (unless λ = inf) This may not be a problem for prediction accuracy, but it can create a challenge in model interpretation in settings in which the number of variables p is quite large Lasso Objective Function ( n y i β 0 i=1 ) 2 p β j x ij + λ j=1 selection) when the tuning parameter λ is sufficiently large p β j = RSS + λ j=1 p β j (8) where λ j is the lasso penalty (l 1 norm instead of l 2 norm used in ridge regression) Lasso has the effect of forcing some of the coefficient estimates to be exactly equal to zero (variable j=1 Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 19 / 86

26 Shrinkage Methods Lasso Comparison of Lasso with Ridge Regression The standardized lasso coefficients are illustrated as a function of λ and β ˆL λ 1 / ˆβ 1 When λ = 0, then the lasso simply gives the least squares fit; When λ becomes sufficiently large, the lasso gives the null model (coefficients equal zero) Standardized Coefficients Standardized Coefficients Income Limit Rating Student λ ˆβ L λ 1/ ˆβ 1 Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 20 / 86

27 Shrinkage Methods Lasso Example A simple special case with n = p, and X is identical matrix, least square estimate ˆβ j = arg min β p (y j β j ) 2 = y j j=1 Ridge Regression ˆβ R j = arg min β p (y j β j) 2 + λ j=1 p βj 2 = y j/(1 + λ) j=1 Lasso ˆβ L j = arg min β p (y j β j ) 2 + λ j=1 p y j λ/2 if y j > λ/2 β j = y j + λ/2 if y j < λ/2 0 if y j λ/2 j=1 Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 21 / 86

28 Shrinkage Methods Lasso Example A simple special case with n = p, and X is identical matrix, least square estimate ˆβ j = arg min β p (y j β j ) 2 = y j j=1 Ridge Regression ˆβ R j = arg min β p (y j β j) 2 + λ j=1 p βj 2 = y j/(1 + λ) j=1 Lasso ˆβ L j = arg min β p (y j β j ) 2 + λ j=1 p y j λ/2 if y j > λ/2 β j = y j + λ/2 if y j < λ/2 0 if y j λ/2 j=1 Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 21 / 86

29 Shrinkage Methods Lasso Example A simple special case with n = p, and X is identical matrix, least square estimate ˆβ j = arg min β p (y j β j ) 2 = y j j=1 Ridge Regression ˆβ R j = arg min β p (y j β j) 2 + λ j=1 p βj 2 = y j/(1 + λ) j=1 Lasso ˆβ L j = arg min β p (y j β j ) 2 + λ j=1 p y j λ/2 if y j > λ/2 β j = y j + λ/2 if y j < λ/2 0 if y j λ/2 j=1 Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 21 / 86

30 Shrinkage Methods Lasso Comparison of Lasso with Ridge Regression Ridge regression more or less shrinks every dimension of the data by the same proportion; Lasso more or less shrinks all coefficients toward zero by a similar amount, and sufficiently small coefficients are shrunken all the way to zero Coefficient Estimate Ridge Least Squares Coefficient Estimate Lasso Least Squares y j y j Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 22 / 86

31 Shrinkage Methods Selecting the Tuning Parameter Outline 1 Subset Selection Best Subset Selection Stepwise Selection Choosing the Optimal Model 2 Shrinkage Methods Ridge Regression Lasso Selecting the Tuning Parameter 3 Dimension Reduction Methods 4 Applications Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 23 / 86

32 Shrinkage Methods Selecting the Tuning Parameter Selecting the Tuning Parameter λ Grid Search 1 Choose a grid of λ values, and compute the k-fold cross-validation error for each λ; 2 Select the tuning parameter value for which the cross-validation error is smallest; 3 Re-fit the model using all of the available observations and the selected value of λ Cross Validation Error e 03 5e 02 5e 01 5e+00 λ Standardized Coefficients e 03 5e 02 5e 01 5e+00 λ Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 24 / 86

33 Shrinkage Methods Selecting the Tuning Parameter Selecting the Tuning Parameter λ Comparison of lasso coefficients with least square estimates Left: Ten-fold cross-validation MSE for the lasso; Right: Corresponding lasso coefficient estimates are displayed; Vertical dashed lines indicate where the lasso cross-validation error is smallest Cross Validation Error Standardized Coefficients ˆβ L λ 1/ ˆβ ˆβ L λ 1/ ˆβ 1 Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 25 / 86

34 Dimension Reduction Methods Dimension Reduction Methods Motivation Either subset selection or shrinkage methods have controlled variance by using a subset of the original variables, or by shrinking their coefficients toward zero All of these methods are defined using the original predictors, X 1, X 2,, X p All dimension reduction methods work in two steps: 1 The transformed predictors (reduced dimensions) are obtained, and the selection can be achieved in different ways, eg principal components regression and partial least squares 2 The model is fit using these transformed predictors Will be introduced in Unsupervised Learning Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 26 / 86

35 Dimension Reduction Methods Dimension Reduction Methods Motivation Either subset selection or shrinkage methods have controlled variance by using a subset of the original variables, or by shrinking their coefficients toward zero All of these methods are defined using the original predictors, X 1, X 2,, X p All dimension reduction methods work in two steps: 1 The transformed predictors (reduced dimensions) are obtained, and the selection can be achieved in different ways, eg principal components regression and partial least squares 2 The model is fit using these transformed predictors Will be introduced in Unsupervised Learning Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 26 / 86

36 Example 1: Multi-Model Ensemble for Freeway Traffic State Estimations Reference Li, L, Chen, X, and Zhang, L (2014) Multimodel ensemble for traffic state estimations IEEE Transactions on Intelligent Transportation Systems, 15(3), Background and Research Highlights Freeway traffic state estimation is a vital component of traffic management and information systems Inherent randomness of traffic flow and uncertainties in the initial conditions of models, model parameters as well as model structures all influence traffic state estimations Present an ensemble learning framework to appropriately combine estimation results from multiple macroscopic traffic flow models Discuss three weighting algorithms: least square regression, ridge regression and lasso A field test indicates that lasso ensemble best handles various uncertainties and improves Xiqun estimation (Michael) Chen (Zhejiang accuracy University) significantly Transportation Big Data Analytics 27 / 86

37 Popular Traffic State Models Cell Transmission Model (CTM) CTM is a direct discretization of the first-order LWR model It was proposed by Daganzo (1994) via using the Godunov Scheme (Lebacque, 1996) and is now widely used Papageorgiou et al s Model A second-order macroscopic traffic flow model was chosen in Papageorgiou et al (1990), Wang and Papageorgiou (2005), Wang et al (2006,2007,2009) We can update estimations of traffic flow states directly from the linearized system dynamic model; Otherwise, we resort to Unscented Kalman filtering (KF), Extended Kalman filtering (EKF) or particle filtering for help Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 28 / 86

38 Testing Data I Performance Measurement System (PeMS) The loop detector data collected at Vehicle Detection Stations (VDS) , and (denoted by L1, L2 and L3) on Highway SR101 southbound, Hollywood, California L1 L2 L3 PeMS VDS: PM: 2556 mile PeMS VDS: PM: 2455 mile PeMS VDS: PM: 2403 mile Cell De Soto Cell Cell Winnetka Cell 4 Cell 5 Cell km Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 29 / 86

39 Testing Data II Mainline mean speed per lane Speed measurements (veh/h) L1 L2 L3 0 4 AM 5 AM 6 AM 7 AM 8 AM 9 AM 10 AM 11 AM 12 PM Time (h) On-ramp inflow rates On-ramp flow measurements (veh/h) Upstream on-ramp Downstream on-ramp 0 4 AM 5 AM 6 AM 7 AM 8 AM 9 AM 10 AM 11 AM 12 PM Time (h) Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 30 / 86

40 Evaluation Indices Root Mean Square Error (RMSE) RMSE = 1 T T [x i (t) ˆx i (t)] 2 (9) t=1 Normalized Root Mean Squared Error (NRMSE) T t=1 [xi(t) ˆxi(t)]2 NRMSE = T t=1 (x i(t)) 2 100% (10) Symmetric Mean Absolute Percentage Error (SMAPE) SMAPE1 = 1 T x i(t) ˆx i(t) T x t=1 i(t) + ˆx i(t) T t=1 SMAPE2 = x i(t) ˆx i (t) T t=1 [x i(t) + ˆx i (t)] 100% (11) 100% (12) Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 31 / 86

41 Comparison of Ensemble Algorithms: Part I Experiment I Ensemble 10 SMM models with various FD parameter combinations Algorithm I: Least-Square Ensemble Algorithm II: Ridge Regression Ensemble Algorithm III: Lasso Ensemble Estimation errors of ensemble SMM models using integrated weights Ŵ (Experiment I) Variable Error Algorithm I Algorithm II Algorithm III RMSE Density NRMSE % 4417% 4216% SMAPE1-4725% 1694% 1546% SMAPE % 1807% 1656% RMSE Flow NRMSE 1880% 2411% 2851% rate SMAPE1 846% 1024% 1207% SMAPE2 713% 957% 1190% Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 32 / 86

42 Comparison of Ensemble Algorithms: Part II Experiment II (Linear models) Ensemble 20 SMM models with the following improvements A small scale zero mean Gaussian random noise is added to the estimated state vector; The estimation results and the available measurements of density and flow rate are separated Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 33 / 86

43 Comparison of Ensemble Algorithms: Part III Estimation errors of ensemble SMM models using separate weights Ŵρ and Ŵq (Experiment II) Variable Error Algorithm I Algorithm II Algorithm III RMSE Density NRMSE 21098% 8722% 3950% SMAPE1-3029% % 1528% SMAPE % 4066% 1591% Flow rate RMSE NRMSE 2200% 1805% 1889% SMAPE1 892% 785% 841% SMAPE2 818% 702% 797% Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 34 / 86

44 Comparison of Ensemble Algorithms: Part IV Experiment III (Nonlinear models) Ensemble 20 EKF models Stochastic resampling: a small scale Gaussian white noise vector was added to the estimated state vector after the EKF predict and update procedures at each time step; Estimation errors comparison of the single deterministic EKF models and ensemble EKF models using separate weights Ŵρ and Ŵq (Experiment III) (E ) Variable Error EKF Algorithm II Algorithm III RMSE Density NRMSE 4368% 4162% 3896% SMAPE % 1368% SMAPE2 1677% 1172% 1478% RMSE Flow rate NRMSE 2393% 3380% 3373% SMAPE1 999% 1297% 1507% SMAPE2 938% 1648% 1504% Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 35 / 86

45 Comparison of Ensemble Algorithms: Part V Experiment IV Ensemble 10 candidate SMM models (deterministic models without initial condition noises) and 10 candidate EKF model (stochastic models with initial condition noises) Stochastic resampling: a small scale Gaussian white noise vector was added to the estimated state vector after the EKF predict and update procedures at each time step; Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 36 / 86

46 Comparison of Ensemble Algorithms: Part VI Estimation errors of ensemble EKF models using separate weights Ŵρ and Ŵq (Experiment IV) Variable Error EKF Algorithm II Algorithm III RMSE Density NRMSE 4368% 4235% 3648% SMAPE % 1331% SMAPE2 1677% 1663% 1439% RMSE Flow NRMSE 2393% 2180% 2128% rate SMAPE1 999% 929% 918% SMAPE2 938% 849% 861% Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 37 / 86

47 Influence of the Regularization Parameter λ Sensitivity Analysis All experiments show that the lasso ensemble performs best in this case study In practice, it is suggested to adaptively choose the best regularization scalar λ online Sensitivity analysis of regularization coefficient in the Lasso EKF ensemble model (Experiment III, Algorithm III) Variable ENSEMBLE MODEL (EXPERIMENT III, ALGORITHM III) Error λ λ λ λ = 001 max = 01 max = 02 max = 05 max λ λ λ λ Density Flow rate RMSE NRMSE 4769% 4003% 5066% 6644% SMAPE1 1755% 1328% 1598% 2756% SMAPE2 1883% 1451% 1965% 3254% RMSE NRMSE 3031% 2792% 4671% 7564% SMAPE1 1234% 1273% 2399% 5681% SMAPE2 1127% 1221% 2524% 5714% Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 38 / 86

48 Models Estimation Results of Experiments I-IV Using Lasso Ensemble Exp I Density/flow rate estimations L1 Measurement Experiment I Exp II Density/flow rate estimations L1 Measurement Experiment II Density (veh/km) L2 Measurement Experiment I L3 Measurement Experiment I Density (veh/km) L2 Measurement Experiment II L3 Measurement Experiment II 0 4 AM 5 AM 6 AM 7 AM 8 AM 9 AM 10 AM 11 AM 12 PM Time (h) 0 4 AM 5 AM 6 AM 7 AM 8 AM 9 AM 10 AM 11 AM 12 PM Time (h) 2000 L1 Measurement Experiment I L1 Measurement Experiment II 1000 Flow (veh/h) L2 Measurement Experiment I L3 Measurement Experiment I 1000 Flow (veh/h) L2 Measurement Experiment II L3 Measurement Experiment II 1000 Xiqun (Michael) 0 Chen (Zhejiang University) Transportation Big Data Analytics / 86

49 Models Estimation Results of Experiments I-IV Using Lasso Ensemble Exp III Density/flow rate estimations L1 Measurement Experiment III Exp IV Density/flow rate estimations L1 Measurement Experiment IV Density (veh/km) L2 Measurement Experiment III L3 Measurement Experiment III Density (veh/km) L2 Measurement Experiment IV L3 Measurement Experiment IV Flow (veh/h) AM 5 AM 6 AM 7 AM 8 AM 9 AM 10 AM 11 AM 12 PM Time (h) 2000 L1 Measurement Experiment III L2 Measurement Experiment III 2000 L3 Measurement Experiment III Flow (veh/h) AM 5 AM 6 AM 7 AM 8 AM 9 AM 10 AM 11 AM 12 PM Time (h) 2000 L1 Measurement Experiment IV L2 Measurement Experiment IV 2000 L3 Measurement Experiment IV Xiqun (Michael) 0 Chen (Zhejiang University) Transportation Big Data Analytics / 86

50 Example 2: Feature Selection for Prediction I Reference Yang, S, 2013 On feature selection for traffic congestion prediction Transportation Research Part C: Emerging Technologies, 26, pp Objectives Traffic congestion prediction plays an important role in route guidance and traffic management We formulate it as a binary classification problem Through extensive experiments with real-world data, we found that a large number of sensors, usually over 100, are relevant to the prediction task at one sensor, which means wide area correlation and high dimensionality of the data This paper investigates into the feature selection problem for traffic congestion prediction By applying feature selection, the data dimensionality can be reduced remarkably while the performance remains the same Besides, a new traffic jam probability scoring method is proposed to solve the high-dimensional computation into many one-dimensional probabilities and its combination Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 41 / 86

51 Example 2: Feature Selection for Prediction II Data Traffic Management Center of Minnesota Department of Transportation, 30s interval from over 4000 loop detectors located around the Twin Cities Metro freeways for 7 days per week from January 1 to September 22, 2010, where the sensors containing missing values, weekends, and the days with incomplete record are not taken into account Finally, we have a data set of 156 days with the traffic volume data summed per 10-min interval for 4584 sensors We use the first 126 days for learning and the remaining 30 days for testing Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 42 / 86

52 Example 2: Feature Selection for Prediction III Mean prediction precision against feature number 07 Time lag=10 min 07 Time lag=20 min Mean precision Mean precision Feature number Feature number 07 Time lag=60 min 07 Time lag=300 min Mean precision Mean precision Feature number Feature number Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 43 / 86

53 Example 2: Feature Selection for Prediction IV Mean prediction precision and feature number at turning point and end point Time lag 10 min 20 min 60 min 3000 min Turning point Precision 56% 5645% 5617% 5552% #Features End point Precision 6013% 6058% 5989% 5962% #Features 3386 Mean prediction precision against feature number #Features min min min min Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 44 / 86

54 Example 2: Feature Selection for Prediction V Actual traffic volumes with time lag of 20 min following the prediction with all features 600 Sensor 1266 Volume Rank 400 Sensor 1473 Volume Rank Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 45 / 86

55 Example 2: Feature Selection for Prediction VI Precision and recall rate of the two sensors performance Sensor 1266; True Jam number=86; Precision(top 86)=9186% 1 p (l) and r (l) l Sensor 1473; True Jam number=106; Precision(top 106)=6509% 1 p (l) and r (l) l Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 46 / 86

56 Example 2: Feature Selection for Prediction VII Distribution of precision with prediction based on all features (time lag = 20 min) Number of sensors Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 47 / 86

57 Example 2: Feature Selection for Prediction VIII Prediction precision against feature number for two sensors 1 Sensor 1266; Optimal feature number=156 Precision Feature number 1 Sensor 1473; Optimal feature number=26 Precision Feature number Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 48 / 86

58 Example 2: Feature Selection for Prediction IX Distribution of the 888 sensors of interest over the maximum precision 300 Time lag=10 min 300 Time lag=20 min Number of sensors Number of sensors Maximum precision Maximum precision 300 Time lag=60 min 300 Time lag=300 min Number of sensors Number of sensors Maximum precision Maximum precision Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 49 / 86

59 Example 2: Feature Selection for Prediction X Distribution of the 888 sensors of interest over the optimal number of features 150 Time lag=10 min 100 Time lag=20 min Number of sensors Number of sensors Optimal feature number Optimal feature number 150 Time lag=60 min 150 Time lag=300 min Number of sensors Number of sensors Optimal feature number Optimal feature number Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 50 / 86

60 Example 2: Feature Selection for Prediction XI Comparison of the mean precision over the 888 sensors of interest achieved by using the optimal number of features to that using all features Time lag (min) Mean of max precision (%) Mean of precision with all features (%) Distribution of the 888 sensors of interest over the optimal number of features #Features [0, 10) [10, 20) [20, 30) [30, 40) [40, 50) [50, 100) [100, 3386) min min min min Mean prediction precision (%) Time lag (min) Optimal All Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 51 / 86

61 Example 3: Forecasting Urban Travel Times I Reference Haworth, J, Shawe-Taylor, J, Cheng, T and Wang, J, 2014 Local online kernel ridge regression for forecasting of urban travel times Transportation Research Part C: Emerging Technologies, 46, pp Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 52 / 86

62 Example 3: Forecasting Urban Travel Times II Highlights Local online kernel ridge regression (LOKRR) is developed for forecasting urban travel times LOKRR takes into account the time varying characteristics of traffic series through the use of locally defined kernels LOKRR outperforms ARIMA, Elman ANN and SVR in forecasting travel times on London s road network The model is based on regularised linear regression, and clear guidelines are given for parameter training Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 53 / 86

63 Example 3: Forecasting Urban Travel Times III Merits over the standard single kernel approach It allows parameters to vary by time of day, capturing the time varying distribution of traffic data; It allows smaller kernels to be defined that contain only the relevant traffic patterns; It is online, allowing new traffic data to be incorporated as it arrives Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 54 / 86

64 Example 3: Forecasting Urban Travel Times IV Data Unit Travel Times (UTTs, seconds/metre) collected on London s road network, as part of the London Congestion Analysis Project (LCAP) coordinated by Transport for London (TfL) LCAP TTs are observed using automatic number plate recognition (ANPR) technology Individual vehicle TTs are aggregated at 5 min intervals to produce a regularly spaced time series with 288 observations per day Only data collected between 6 AM and 9 PM are used in the analysis (180 observations per day) In total there are 154 days for which data are available, collected between January and July 2011 To test the models, the data are divided into three sets; a training set, a testing set and a validation set, which are 80 days (52%), 37 days (24%) and 37 days (24%) in length, respectively Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 55 / 86

65 Example 3: Forecasting Urban Travel Times V Illustration of variability in traffic data: each line in the plot is the Unit Travel Time (UTT) profile recorded on a single link (link 1815) on a different day (5 days total) Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 56 / 86

66 Example 3: Forecasting Urban Travel Times VI Diagram of the training data construction Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 57 / 86

67 Example 3: Forecasting Urban Travel Times VII Observing travel times using ANPR: a vehicle passes camera l 1 at time t 1, and its number plate is read It then traverses link l and passes camera l 2 at time t 2 and its number plate is read again The two number plates are matched using inbuilt software, and the TT is calculated as t 2 t 1 Raw TTs are converted to UTTs by dividing by len(l), which is the length of link l Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 58 / 86

68 Example 3: Forecasting Urban Travel Times VIII The test links and their patch rates and frequency Link ID % Missing Avg frequency Length (m) Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 59 / 86

69 Example 3: Forecasting Urban Travel Times IX Location of the test links on the LCAP network Legend Test Links ANPR Camera Locations LCAP Network ITN Data Crown Copyright Miles Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 60 / 86

70 Example 3: Forecasting Urban Travel Times X Time series of each of the test links over the first 10 weeks of the training period Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 61 / 86

71 Example 3: Forecasting Urban Travel Times XI Training errors at: (a) 15 and 30 min forecast horizons and (b) 45 and 60 min horizons (a) 15 min 30 min Link ID RMSE NRMSE MAPE MASE RMSE NRMSE MAPE MASE (b) 45 min 60 min Link ID RMSE NRMSE MAPE MASE RMSE NRMSE MAPE MASE Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 62 / 86

72 Example 3: Forecasting Urban Travel Times XII Fitted model parameters at each of the forecast horizons Link 15 min 30 min 45 min 60 min r k w r k w r k w r k w / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /8 3 Note: r values are quantiles of jjx x 0 jj 2 ; k values are multiples of k0 Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 63 / 86

73 Example 3: Forecasting Urban Travel Times XIII Testing errors of the LOKRR model RMSE NRMSE MAPE MASE 15 min min Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 64 / 86

74 Example 3: Forecasting Urban Travel Times XIV Testing errors of the LOKRR model (continued) 45 min min Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 65 / 86

75 Example 3: Forecasting Urban Travel Times XV Comparison with benchmark models C average errors at: (a) 15 min; (b) 30 min; (c) 45 min and (d) 60 min forecast horizons Model Training Testing RMSE NRMSE MAPE RMSE NRMSE MAPE (a) LOKRR SVR ANN ARIMA (b) LOKRR SVR ANN ARIMA (c) Online KRR SVR ANN ARIMA (d) LOKRR SVR ANN ARIMA Note: Training errors are not shown for the ARIMA model due to the difference in the model fitting procedure Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 66 / 86

76 Example 3: Forecasting Urban Travel Times XVI RMSE of each of the models at: (a) 15 min; (b) 30 min; (c) 45 min and; (d) 60 min (a) (b) (c) (d) Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 67 / 86

77 Example 3: Forecasting Urban Travel Times XVII MAPE of each of the models at: (a) 15 min; (b) 30 min; (c) 45 min and; (d) 60 min (a) (b) (c) (d) Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 68 / 86

78 Example 3: Forecasting Urban Travel Times XVIII Time series plots of the observed series (thick black line) against the forecast series at the 15 min interval on (a) Monday 6th June; (b) 7th June; (c) 8th June; (d) 9th June; (e) 10th June; (f) 11th June Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 69 / 86

79 Example 3: Forecasting Urban Travel Times XIX Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 70 / 86

80 Example 3: Forecasting Urban Travel Times XX Time series plots of the observed series (thick black line) against the forecast series at the 60 min interval on (a) Monday 6th June; (b) 7th June; (c) 8th June; (d) 9th June; (e) 10th June; (f) 11th June Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 71 / 86

81 Example 3: Forecasting Urban Travel Times XXI Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 72 / 86

82 Example 3: Forecasting Urban Travel Times XXII Comparison of LOKRR with benchmark models at the 15 min interval on link 442, (a) on a typical weekday and; (b) during a non-recurrent congestion event Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 73 / 86

83 Example 3: Forecasting Urban Travel Times XXIII Comparison of LOKRR with benchmark models at the 15 min interval on link 442, (a) on a typical weekday and; (b) during a non-recurrent congestion event Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 74 / 86

84 Example 3: Forecasting Urban Travel Times XXIV Comparison of LOKRR with benchmark models at the 60 min interval on link 442, (a) on a typical weekday and; (b) during a non-recurrent congestion event Xiqun (Michael) Chen (Zhejiang University) Transportation Big Data Analytics 75 / 86

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Linear model selection and regularization

Linear model selection and regularization Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Dimension Reduction Methods

Dimension Reduction Methods Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Improving forecasting under missing data on sparse spatial networks

Improving forecasting under missing data on sparse spatial networks Improving forecasting under missing data on sparse spatial networks J. Haworth 1, T. Cheng 1, E. J. Manley 1 1 SpaceTimeLab, Department of Civil, Environmental and Geomatic Engineering, University College

More information

Lecture 14: Shrinkage

Lecture 14: Shrinkage Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Day 4: Shrinkage Estimators

Day 4: Shrinkage Estimators Day 4: Shrinkage Estimators Kenneth Benoit Data Mining and Statistical Learning March 9, 2015 n versus p (aka k) Classical regression framework: n > p. Without this inequality, the OLS coefficients have

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

STAT 462-Computational Data Analysis

STAT 462-Computational Data Analysis STAT 462-Computational Data Analysis Chapter 5- Part 2 Nasser Sadeghkhani a.sadeghkhani@queensu.ca October 2017 1 / 27 Outline Shrinkage Methods 1. Ridge Regression 2. Lasso Dimension Reduction Methods

More information

High-dimensional regression modeling

High-dimensional regression modeling High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making

More information

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project MLR Model Selection Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Chapter 6 October 18, 2016 Chapter 6 October 18, 2016 1 / 80 1 Subset selection 2 Shrinkage methods 3 Dimension reduction methods (using derived inputs) 4 High

More information

MODELLING TRAFFIC FLOW ON MOTORWAYS: A HYBRID MACROSCOPIC APPROACH

MODELLING TRAFFIC FLOW ON MOTORWAYS: A HYBRID MACROSCOPIC APPROACH Proceedings ITRN2013 5-6th September, FITZGERALD, MOUTARI, MARSHALL: Hybrid Aidan Fitzgerald MODELLING TRAFFIC FLOW ON MOTORWAYS: A HYBRID MACROSCOPIC APPROACH Centre for Statistical Science and Operational

More information

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman Linear Regression Models Based on Chapter 3 of Hastie, ibshirani and Friedman Linear Regression Models Here the X s might be: p f ( X = " + " 0 j= 1 X j Raw predictor variables (continuous or coded-categorical

More information

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Statistical Methods for Data Mining

Statistical Methods for Data Mining Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Linear Model Selection and Regularization Recall the linear model Y = 0 + 1 X 1 + + p X p +. In the lectures

More information

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods. TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

REVIEW OF SHORT-TERM TRAFFIC FLOW PREDICTION TECHNIQUES

REVIEW OF SHORT-TERM TRAFFIC FLOW PREDICTION TECHNIQUES INTRODUCTION In recent years the traditional objective of improving road network efficiency is being supplemented by greater emphasis on safety, incident detection management, driver information, better

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

Prediction & Feature Selection in GLM

Prediction & Feature Selection in GLM Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis

More information

Modelling, Simulation & Computing Laboratory (msclab) Faculty of Engineering, Universiti Malaysia Sabah, Malaysia

Modelling, Simulation & Computing Laboratory (msclab) Faculty of Engineering, Universiti Malaysia Sabah, Malaysia 1.0 Introduction Intelligent Transportation Systems (ITS) Long term congestion solutions Advanced technologies Facilitate complex transportation systems Dynamic Modelling of transportation (on-road traffic):

More information

TIME SERIES ANALYSIS. Forecasting and Control. Wiley. Fifth Edition GWILYM M. JENKINS GEORGE E. P. BOX GREGORY C. REINSEL GRETA M.

TIME SERIES ANALYSIS. Forecasting and Control. Wiley. Fifth Edition GWILYM M. JENKINS GEORGE E. P. BOX GREGORY C. REINSEL GRETA M. TIME SERIES ANALYSIS Forecasting and Control Fifth Edition GEORGE E. P. BOX GWILYM M. JENKINS GREGORY C. REINSEL GRETA M. LJUNG Wiley CONTENTS PREFACE TO THE FIFTH EDITION PREFACE TO THE FOURTH EDITION

More information

A Sparse Linear Model and Significance Test. for Individual Consumption Prediction

A Sparse Linear Model and Significance Test. for Individual Consumption Prediction A Sparse Linear Model and Significance Test 1 for Individual Consumption Prediction Pan Li, Baosen Zhang, Yang Weng, and Ram Rajagopal arxiv:1511.01853v3 [stat.ml] 21 Feb 2017 Abstract Accurate prediction

More information

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Linear Models 1. Isfahan University of Technology Fall Semester, 2014 Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods

More information

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per

More information

Introduction to Statistical modeling: handout for Math 489/583

Introduction to Statistical modeling: handout for Math 489/583 Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Supervised Learning: Regression I Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some of the

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10 COS53: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 0 MELISSA CARROLL, LINJIE LUO. BIAS-VARIANCE TRADE-OFF (CONTINUED FROM LAST LECTURE) If V = (X n, Y n )} are observed data, the linear regression problem

More information

LECTURE 10: LINEAR MODEL SELECTION PT. 1. October 16, 2017 SDS 293: Machine Learning

LECTURE 10: LINEAR MODEL SELECTION PT. 1. October 16, 2017 SDS 293: Machine Learning LECTURE 10: LINEAR MODEL SELECTION PT. 1 October 16, 2017 SDS 293: Machine Learning Outline Model selection: alternatives to least-squares Subset selection - Best subset - Stepwise selection (forward and

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2015 Announcements TA Monisha s office hour has changed to Thursdays 10-12pm, 462WVH (the same

More information

Variable Selection in Predictive Regressions

Variable Selection in Predictive Regressions Variable Selection in Predictive Regressions Alessandro Stringhi Advanced Financial Econometrics III Winter/Spring 2018 Overview This chapter considers linear models for explaining a scalar variable when

More information

Outline lecture 2 2(30)

Outline lecture 2 2(30) Outline lecture 2 2(3), Lecture 2 Linear Regression it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic Control

More information

How Accurate is My Forecast?

How Accurate is My Forecast? How Accurate is My Forecast? Tao Hong, PhD Utilities Business Unit, SAS 15 May 2012 PLEASE STAND BY Today s event will begin at 11:00am EDT The audio portion of the presentation will be heard through your

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate

More information

The prediction of house price

The prediction of house price 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Elements of Multivariate Time Series Analysis

Elements of Multivariate Time Series Analysis Gregory C. Reinsel Elements of Multivariate Time Series Analysis Second Edition With 14 Figures Springer Contents Preface to the Second Edition Preface to the First Edition vii ix 1. Vector Time Series

More information

Model Selection. Frank Wood. December 10, 2009

Model Selection. Frank Wood. December 10, 2009 Model Selection Frank Wood December 10, 2009 Standard Linear Regression Recipe Identify the explanatory variables Decide the functional forms in which the explanatory variables can enter the model Decide

More information

Machine Learning for Biomedical Engineering. Enrico Grisan

Machine Learning for Biomedical Engineering. Enrico Grisan Machine Learning for Biomedical Engineering Enrico Grisan enrico.grisan@dei.unipd.it Curse of dimensionality Why are more features bad? Redundant features (useless or confounding) Hard to interpret and

More information

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless

More information

Urban traffic flow prediction: a spatio-temporal variable selectionbased

Urban traffic flow prediction: a spatio-temporal variable selectionbased JOURNAL OF ADVANCED TRANSPORTATION J. Adv. Transp. 2016; 50:489 506 Published online 17 December 2015 in Wiley Online Library (wileyonlinelibrary.com)..1356 Urban traffic flow prediction: a spatio-temporal

More information

Prediction on Travel-Time Distribution for Freeways Using Online Expectation Maximization Algorithm

Prediction on Travel-Time Distribution for Freeways Using Online Expectation Maximization Algorithm Prediction on Travel-Time Distribution for Freeways Using Online Expectation Maximization Algorithm Nianfeng Wan Department of Mechanical Engineering, Clemson University nwan@clemson.edu Gabriel Gomes

More information

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1 Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr

More information

Statistics 262: Intermediate Biostatistics Model selection

Statistics 262: Intermediate Biostatistics Model selection Statistics 262: Intermediate Biostatistics Model selection Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Today s class Model selection. Strategies for model selection.

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

NATCOR. Forecast Evaluation. Forecasting with ARIMA models. Nikolaos Kourentzes

NATCOR. Forecast Evaluation. Forecasting with ARIMA models. Nikolaos Kourentzes NATCOR Forecast Evaluation Forecasting with ARIMA models Nikolaos Kourentzes n.kourentzes@lancaster.ac.uk O u t l i n e 1. Bias measures 2. Accuracy measures 3. Evaluation schemes 4. Prediction intervals

More information

Predicting freeway traffic in the Bay Area

Predicting freeway traffic in the Bay Area Predicting freeway traffic in the Bay Area Jacob Baldwin Email: jtb5np@stanford.edu Chen-Hsuan Sun Email: chsun@stanford.edu Ya-Ting Wang Email: yatingw@stanford.edu Abstract The hourly occupancy rate

More information

High-Dimensional Statistical Learning: Introduction

High-Dimensional Statistical Learning: Introduction Classical Statistics Biological Big Data Supervised and Unsupervised Learning High-Dimensional Statistical Learning: Introduction Ali Shojaie University of Washington http://faculty.washington.edu/ashojaie/

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

CHAPTER 5 FORECASTING TRAVEL TIME WITH NEURAL NETWORKS

CHAPTER 5 FORECASTING TRAVEL TIME WITH NEURAL NETWORKS CHAPTER 5 FORECASTING TRAVEL TIME WITH NEURAL NETWORKS 5.1 - Introduction The estimation and predication of link travel times in a road traffic network are critical for many intelligent transportation

More information

Multiple (non) linear regression. Department of Computer Science, Czech Technical University in Prague

Multiple (non) linear regression. Department of Computer Science, Czech Technical University in Prague Multiple (non) linear regression Jiří Kléma Department of Computer Science, Czech Technical University in Prague Lecture based on ISLR book and its accompanying slides http://cw.felk.cvut.cz/wiki/courses/b4m36san/start

More information

Parking Occupancy Prediction and Pattern Analysis

Parking Occupancy Prediction and Pattern Analysis Parking Occupancy Prediction and Pattern Analysis Xiao Chen markcx@stanford.edu Abstract Finding a parking space in San Francisco City Area is really a headache issue. We try to find a reliable way to

More information

Fast Boundary Flow Prediction for Traffic Flow Models using Optimal Autoregressive Moving Average with Exogenous Inputs (ARMAX) Based Predictors

Fast Boundary Flow Prediction for Traffic Flow Models using Optimal Autoregressive Moving Average with Exogenous Inputs (ARMAX) Based Predictors Fast Boundary Flow Prediction for Traffic Flow Models using Optimal Autoregressive Moving Average with Exogenous Inputs (ARMAX) Based Predictors Cheng-Ju Wu (corresponding author) Department of Mechanical

More information

LARGE-SCALE TRAFFIC STATE ESTIMATION

LARGE-SCALE TRAFFIC STATE ESTIMATION Hans van Lint, Yufei Yuan & Friso Scholten A localized deterministic Ensemble Kalman Filter LARGE-SCALE TRAFFIC STATE ESTIMATION CONTENTS Intro: need for large-scale traffic state estimation Some Kalman

More information

Real-Time Travel Time Prediction Using Multi-level k-nearest Neighbor Algorithm and Data Fusion Method

Real-Time Travel Time Prediction Using Multi-level k-nearest Neighbor Algorithm and Data Fusion Method 1861 Real-Time Travel Time Prediction Using Multi-level k-nearest Neighbor Algorithm and Data Fusion Method Sehyun Tak 1, Sunghoon Kim 2, Kiate Jang 3 and Hwasoo Yeo 4 1 Smart Transportation System Laboratory,

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Linear Regression In God we trust, all others bring data. William Edwards Deming

Linear Regression In God we trust, all others bring data. William Edwards Deming Linear Regression ddebarr@uw.edu 2017-01-19 In God we trust, all others bring data. William Edwards Deming Course Outline 1. Introduction to Statistical Learning 2. Linear Regression 3. Classification

More information

Lessons in Estimation Theory for Signal Processing, Communications, and Control

Lessons in Estimation Theory for Signal Processing, Communications, and Control Lessons in Estimation Theory for Signal Processing, Communications, and Control Jerry M. Mendel Department of Electrical Engineering University of Southern California Los Angeles, California PRENTICE HALL

More information

Automatic Forecasting

Automatic Forecasting Automatic Forecasting Summary The Automatic Forecasting procedure is designed to forecast future values of time series data. A time series consists of a set of sequential numeric data taken at equally

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

New Introduction to Multiple Time Series Analysis

New Introduction to Multiple Time Series Analysis Helmut Lütkepohl New Introduction to Multiple Time Series Analysis With 49 Figures and 36 Tables Springer Contents 1 Introduction 1 1.1 Objectives of Analyzing Multiple Time Series 1 1.2 Some Basics 2

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Research Article Accurate Multisteps Traffic Flow Prediction Based on SVM

Research Article Accurate Multisteps Traffic Flow Prediction Based on SVM Mathematical Problems in Engineering Volume 2013, Article ID 418303, 8 pages http://dx.doi.org/10.1155/2013/418303 Research Article Accurate Multisteps Traffic Flow Prediction Based on SVM Zhang Mingheng,

More information

Relevance Vector Machines for Earthquake Response Spectra

Relevance Vector Machines for Earthquake Response Spectra 2012 2011 American American Transactions Transactions on on Engineering Engineering & Applied Applied Sciences Sciences. American Transactions on Engineering & Applied Sciences http://tuengr.com/ateas

More information

Empirical Study of Traffic Velocity Distribution and its Effect on VANETs Connectivity

Empirical Study of Traffic Velocity Distribution and its Effect on VANETs Connectivity Empirical Study of Traffic Velocity Distribution and its Effect on VANETs Connectivity Sherif M. Abuelenin Department of Electrical Engineering Faculty of Engineering, Port-Said University Port-Fouad,

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression Machine Learning Torsten Möller Möller/Mori 1 Reading Chapter 3 of Pattern Recognition and Machine Learning by Bishop Chapter 3+5+6+7 of The Elements of Statistical Learning

More information

Logistic Regression with the Nonnegative Garrote

Logistic Regression with the Nonnegative Garrote Logistic Regression with the Nonnegative Garrote Enes Makalic Daniel F. Schmidt Centre for MEGA Epidemiology The University of Melbourne 24th Australasian Joint Conference on Artificial Intelligence 2011

More information

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models

More information

How the mean changes depends on the other variable. Plots can show what s happening...

How the mean changes depends on the other variable. Plots can show what s happening... Chapter 8 (continued) Section 8.2: Interaction models An interaction model includes one or several cross-product terms. Example: two predictors Y i = β 0 + β 1 x i1 + β 2 x i2 + β 12 x i1 x i2 + ɛ i. How

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

Iterative Selection Using Orthogonal Regression Techniques

Iterative Selection Using Orthogonal Regression Techniques Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department

More information

Graphical LASSO for local spatio-temporal neighbourhood selection

Graphical LASSO for local spatio-temporal neighbourhood selection Graphical LASSO for local spatio-temporal neighbourhood selection James Haworth 1,2, Tao Cheng 1 1 SpaceTimeLab, Department of Civil, Environmental and Geomatic Engineering, University College London,

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Business Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Model Evaluation and Selection Predictive Ability of a Model: Denition and Estimation We aim at achieving a balance between parsimony

More information

Chapter 3: Regression Methods for Trends

Chapter 3: Regression Methods for Trends Chapter 3: Regression Methods for Trends Time series exhibiting trends over time have a mean function that is some simple function (not necessarily constant) of time. The example random walk graph from

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation

More information

... SPARROW. SPARse approximation Weighted regression. Pardis Noorzad. Department of Computer Engineering and IT Amirkabir University of Technology

... SPARROW. SPARse approximation Weighted regression. Pardis Noorzad. Department of Computer Engineering and IT Amirkabir University of Technology ..... SPARROW SPARse approximation Weighted regression Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Université de Montréal March 12, 2012 SPARROW 1/47 .....

More information

Online Appendix for Price Discontinuities in an Online Market for Used Cars by Florian Englmaier, Arno Schmöller, and Till Stowasser

Online Appendix for Price Discontinuities in an Online Market for Used Cars by Florian Englmaier, Arno Schmöller, and Till Stowasser Online Appendix for Price Discontinuities in an Online Market for Used Cars by Florian Englmaier, Arno Schmöller, and Till Stowasser Online Appendix A contains additional tables and figures that complement

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Computational statistics

Computational statistics Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial

More information

Regression I: Mean Squared Error and Measuring Quality of Fit

Regression I: Mean Squared Error and Measuring Quality of Fit Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving

More information

Iterative ARIMA-Multiple Support Vector Regression models for long term time series prediction

Iterative ARIMA-Multiple Support Vector Regression models for long term time series prediction and Machine Learning Bruges (Belgium), 23-25 April 24, i6doccom publ, ISBN 978-2874995-7 Available from http://wwwi6doccom/fr/livre/?gcoi=2843244 Iterative ARIMA-Multiple Support Vector Regression models

More information

Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns

Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns Aly Kane alykane@stanford.edu Ariel Sagalovsky asagalov@stanford.edu Abstract Equipped with an understanding of the factors that influence

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

1 Spatio-temporal Variable Selection Based Support

1 Spatio-temporal Variable Selection Based Support Spatio-temporal Variable Selection Based Support Vector Regression for Urban Traffic Flow Prediction Yanyan Xu* Department of Automation, Shanghai Jiao Tong University & Key Laboratory of System Control

More information

Urban Link Travel Time Estimation Using Large-scale Taxi Data with Partial Information

Urban Link Travel Time Estimation Using Large-scale Taxi Data with Partial Information Urban Link Travel Time Estimation Using Large-scale Taxi Data with Partial Information * Satish V. Ukkusuri * * Civil Engineering, Purdue University 24/04/2014 Outline Introduction Study Region Link Travel

More information

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html

More information