LASSO-type penalties for covariate selection and forecasting in time series

Size: px
Start display at page:

Download "LASSO-type penalties for covariate selection and forecasting in time series"

Transcription

1 LASSO-type penalties for covariate selection and forecasting in time series Evandro Konzen 1 Flavio A. Ziegelmann 2 Abstract This paper studies some forms of LASSO-type penalties in time series to reduce the dimensionality of the parameter space as well as to improve the out-of-sample forecasting performance. Particularly, we propose a method which we call WLadaLASSO (Weighted Lag adaptive LASSO), which assigns not only different weights to each coefficient but also further penalizes coefficients of higher-lagged covariates. In our Monte Carlo implementation, the WLadaLASSO is superior in terms of covariate selection, parameter estimation precision and forecasting, when compared to both LASSO and adalasso, especially for small sample sizes and highly correlated covariates. An empirical study illustrates our approach for U.S. risk premium forecasting with good results. Keywords: Time series, LASSO, adalasso, variable selection, forecasting. JEL Classification: C22; C52; C53. 1 Graduate Program in Economics, Federal University of Rio Grande do Sul, Porto Alegre, RS , Brazil, konzen.evandro@gmail.com 2 Department of Statistics - PPGE and PPGA, Federal University of Rio Grande do Sul, Porto Alegre, RS , Brazil, flavioaz@mat.ufrgs.br. The author wishes to thank CNPq (processes / and /2013-1) and FAPERGS (process 1994/12-6) for financial support. 1

2 1 Introduction High-dimensional models have been increasingly present in the literature. It is known that the inclusion of a large number of economic and financial variables can contribute to substantial gains for time series forecasting. As (Song and Bickel, 2011) pointed out, a challenging problem is to determine which variables and lags are relevant, especially when there is a conjunction of serial correlation, high-dimensional dependence structure among variables and small sample size (relative to the dimensionality). As (Fan and Lv, 2010) stated, statistical accuracy, interpretability of the model and computational complexity are three important pillars of any statistical procedure. Typically, the number of observations n is much greater than the number of variables or parameters p. However, when the dimensionality p is large compared to the sample size n, the traditional methods face some challenges. Among them, how to make interpretable models that are estimable; how to make the statistical procedures robust and computationally efficient; and how to obtain more efficient procedures in terms of statistical inference. Moreover, in a high dimensionality context, when the number of covariates p is large compared to the sample size n, the traditional models can lead to problems of spurious correlation between the covariates, which can be serious even when the covariates are independent and identically distributed, as shown in (Fan and Lv, 2008) and (Fan et al., 2012). One way to challenge the problems caused by high dimensionality is through the sparsity assumption on the p-dimensional parameter vector, forcing that many of its components are exactly zero. Although it generally produces biased estimates, the sparsity assumption helps one to identify the important covariates, then obtaining a more parsimonious model and reducing its complexity as well as the computational cost of estimating it. As (Medeiros and Mendes, 2012) commented, factor models provide a good alternative when many variables manifest importance in the model, situation which the authors name by dense structure of the model. However, when the coefficient vector is indeed sparse, methods that assume sparsity gain importance. The Least Absolute Shrinkage and Selection Operator (LASSO) (Tibshirani, 1996) was proposed in a linear regression context. It imposes a penalty on the set of L 1 norm of the coefficients. Due to the nature of this penalty, LASSO forces some coefficients to be exactly zero, making it useful to select covariates and to reduce the dimensionality of the parameter space. This methodology is an example of regularization techniques characterized by (Breiman, 1995), which considers an error function given by E = error on data + λ model complexity, where the sum of squared residuals can, for instance, represent the term error on data. The second term penalizes those models whose complexity and variance of the estimators are high, with λ representing how severe the penalty is. When it minimizes the error function rather than just minimizing the error on data, complex models are overpenalized and then estimators variances are reduced. If λ is too large though, only very simple models are obtained and a large bias can be introduced. LASSO is one of the most famous regularization techniques, succeeding mainly in cases where there are many null coefficients within the set of coefficients to be estimated. Thus, LASSO also becomes useful for the selection of covariates. 2

3 (Zou, 2006) investigates the oracle properties, mentioned by (Fan and Li, 2001), of the original LASSO proposed by (Tibshirani, 1996). (Zou, 2006) shows that there are cases in which LASSO is not consistent in variable selection; hence proposes the adalasso (adaptive LASSO), where the penalty occurs with different weights for each coefficient, which allows this version to enjoy the oracle properties. In a time series context, LASSO-type penalties are employed in (De Mol et al., 2008), (Hua, 2011) and (Li, 2012). (Medeiros and Mendes, 2012) show that the adalasso consistently chooses the relevant variables as the number of observations increases (model selection consistency) even when the errors are non-gaussian and conditionally heteroskedastic. In addition, (Audrino and Camponovo, 2013) presented some theoretical and empirical results on the finite sample and asymptotic properties of adalasso in time series regression models. Our present work borrows ideas from (Park and Sakaori, 2013) and proposes a variation which we call WLadaLASSO (Weighted Lag adaptive LASSO), a method which assigns different weights to each coefficient and also further penalizes coefficients of higher-lagged covariates. The results show the superiority of WLadaLASSO when it is compared to LASSO and adalasso, essentially for small sample sizes. An application is carried out to forecast risk premium using the same data as in (Goyal and Welch, 2008). Contrary to what the previous article points out, we find that the predictors used in the literature can help to predict the risk premium when LASSO-type methods are applied. The rest of this work is as follows. Section 2 introduces the theme of model selection. In Section 3, LASSO-type penalties are described in detail. Section 4 carries out a comprehensive Monte Carlo simulation study. Section 5 brings our empirical application of Risk Premium forecasting. Finally, Section 6 concludes, bringing our main remarks. 2 Preliminaries on Model Selection One of the main goals in linear regression analysis is to estimate the coefficients of the following gaussian linear model or, in a vector form, y i = β 0 + β 1 x 1i + β 2 x 2i + + β k x ki + ε i, i = 1,..., n, (1) y i = β 0 + X T i β + ε i, (2) where y i R is the response variable, X i = (x 1i,..., x ki ) T R k is the set of predictors, ε i N(0, σ 2 ), β β 0 = (β 0, β 1,..., β k ) T is the set of parameters. One of the most popular methods for estimating the unknown parameters of (1), ordinary least squares (OLS), is based on minimizing the sum of squared residuals (SSR), that is, solving the following minimization problem: ˆβ = argmin β 0,β 1,...,β k n i=1 ( y i β 0 k ) 2 β j x ji. (3) However, the OLS method suffers from some problems when there are too many predictors in the model. Firstly, its estimates often have large variance, which reduces the accuracy 3

4 of the forecasts. Secondly, only by itself, it lacks the skill of determining a smaller set of predictors that exhibits the best effects when one aims to find out which predictors most explain the variability of the response variable. Finally, by construction, the OLS method can not be implemented when the number of parameters exceeds the number of observations. A natural procedure for choosing which predictors enter and which do not enter model (2) is to compute all 2 k possible regression models, investigating all possible combinations of predictors. However, this can require large computational resources. Thus, some procedures have been proposed, such as the Best-Subset Selection, the Forward and Backward Elimination and Forward-Stagewise Regression, as pointed by (Hastie et al., 2001). However, these alternatives also have their own limitations when the model has many candidate covariates. Best Subset Selection is a feasible procedure for a k of a magnitude not exceeding 30 or 40, as explained by (Hastie et al., 2001). Meanwhile, the proceedings of Forward Selection and Backward Elimination may not select the best subset of variables in some situations, as stated by (Berk, 1978). Forward-Stagewise Regression, as an algorithm that can require many steps, has a large computational cost and may be impractical for problems of high dimensionality. Ridge Regression, in turn, is used to shrink the set of coefficients by imposing a penalty on their sum of squares: ˆβ ridge = argmin β 0,β 1,...,β k n i=1 ( y i β 0 k ) 2 β j x ji subject to k βj 2 t, (4) where the parameter t 0 controls the penalty. An equivalent expression is given by { n ( ˆβ ridge = argmin y i β 0 β 0,β 1,...,β k i=1 k ) 2 k β j x ji + λ β 2 j }, (5) where the parameter λ 0 is a function of the parameter t in (4) and controls how severe the penalty is. The larger λ is, the larger the penalty gets. When λ = 0, the vector ˆβ ridge is equal to the coefficient vector obtained by OLS. Ridge regression obtains non-zero estimates for all coefficients, and so it is not a method of variable selection. Mentioning that the choice of variables is important for the interpretation of the model, (Breiman, 1995) proposes the Garrote method that penalizes the regression coefficients so that some of them are forced to zero. The disadvantage of the Garrote is that its solution depends on the sign and magnitude of the OLS estimates, which are let down in the presence of high correlation between predictors. 3 LASSO-type penalties As in Ridge regression, described by equation (4), where a penalty on the sum of squares of the coefficients occurs, there are alternative methods that impose penalties on the sum of the absolute values of the coefficients. Some of these methods are described in this section. 4

5 3.1 LASSO The least absolute shrinkage and selection operator (LASSO), proposed by (Tibshirani, 1996), is another method of shrinking the coefficient set. Just as the Garrote, LASSO aims to estimate a model that produces forecasts with small variance and determine the set of predictors that explain better the response variable. (Tibshirani, 1996) argues that techniques usually employed to improve the OLS estimates, such as Subset Selection and Ridge Regression, have disadvantages. The Subset Selection models are easily interpreted, but the processes of choosing variables have great variability since they are discrete processes. Ridge regression, in turn, has less variability and shrinks the regression coefficients but still remains with all predictors in the model. As in typical regression modelling represented by equations (1) and (2), it is supposed that the y i s are conditionally independent given the x ki s. It is assumed that x ki s are standardized such that i x ki = 0 and i x2 ki /n = 1. LASSO estimates are obtained by minimizing the sum of squared residuals subject to a penalty of L 1 norm of the coefficients: ˆβ LASSO = argmin β 0,β 1,...,β k n i=1 ( y i β 0 k ) 2 β j x ji subject to k β j t, (6) where the tuning parameter t 0 controls the penalty. For all t, the solution for β 0 is ˆβ 0 = y. So one may assume without loss of generality that y = 0 and then omit β 0. The tuning parameter t 0 in (6) controls how much penalty is applied on the set of coefficients. Let { ˆβ j 0 } 1 j k represent the set of OLS coefficients and t 0 = ˆβ j 0. The values t < t 0 will cause a shrinkage of the coefficients toward zero, and some coefficients may be exactly equal to zero. If t t 0, LASSO estimates will be the same as OLS estimates. Using the Lagrangian, (6) has an equivalent expression given by ˆβ LASSO = argmin β 0,β 1,...,β k { n i=1 ( y i β 0 } k ) 2 k β j x ji + λ β j, (7) where the parameter λ 0 is a function of the parameter t. The larger λ, the greater the penalty coefficients; when λ = 0, LASSO estimates are equal to OLS estimates. The value of the parameter λ is chosen via K-fold cross-validation. (Tibshirani, 2013) shows the sufficient conditions for the uniqueness of LASSO solution using Karush-Kuhn-Tucker. LASSO has less variability than the methods of Subset Selection. Moreover, it shrinks some coefficients and forces others to zero, keeping the good features of Subset Selection and Ridge Regression. Furthermore, LASSO performs variable selection and coefficient estimation simultaneously LASSO s consistency in variable selection In order to use LASSO as a selection criterion, its sparse solution should represent well the true model. Hence, one of the desirable properties of such a criterion is that it is consistent, i.e., it identifies the true model when n. 5

6 To study the consistency of LASSO selection, (Zhao and Yu, 2006) consider two aspects: i) whether there is a deterministic amount of regularization that provides consistency in selection; and ii) whether for each sample there is a correct amount of regularization that selects the true model. Their results show that there is a condition that they name Irrepresentable Condition, which is almost necessary and sufficient for both kinds of consistency. Their results hold for linear models with either fixed k or k growing with n. Let us consider the linear regression model Y n = X n β n + ε n, where ε n = (ε 1,..., ε n ) T is a vector of i.i.d. random variables with mean 0 and variance σ 2. Y n is a n 1 response variable and X n = (X1 n,..., Xk n ) is the n k matrix of predictors, where Xi n = (x i1,..., x in ) T, for i = 1,..., k. β n is the k 1 coefficient vector. Unlike the traditional setting where k is fixed, the data and the model parameters are indexed by n to allow them vary with n. LASSO estimates ˆβ n (λ) = ˆβ n = ( ˆβ 1 n,..., ˆβ k n)t are defined by ˆβ n (λ) = argmin Y n X n β n λ β n 1, β where 2 2 denotes the standard L 2 norm of a vector, i.e., the sum of the squares of the vector components; and 1 denotes the L 1 norm of a vector, i.e. the sum of the absolute values of the vector components. There is consistency in model selection when ( P {i : ˆβ ) i n 0} = {i : βi n 0} 1, as n, which is equivalent to signal consistency (Zhao and Yu, 2006). The following definitions are based on (Zhao and Yu, 2006). Definition 1: An estimate ˆβ n is equal in sign to the true β n (which is written by ˆβ n = s β n ) if and only if sign( ˆβ n ) = sign(β n ), where sign( ) takes the value 1 for positive values, 1 to zero and negative values. Definition 2: LASSO is strongly sign consistent if there exists λ n = f(n), i.e., a function of n which is independent of Y n and X n, such that ( ) lim P ˆβn (λ n ) = s β n = 1. n + Definition 3: LASSO is general sign consistent if ( ) lim P λ 0, ˆβn (λ) = s β n = 1. n + Strongly sign consistency means that you can use a preselected λ to obtain a consistent model selection. General sign consistency means that for a random realization there is a correct amount of regularization that selects the true model. (Zhao and Yu, 2006) show that 6

7 both types of consistency are almost equivalent up to a certain condition. We now introduce some notation to define this condition. Without loss of generality, we assume that β n = (β1 n,..., βq n, βq+1, n..., βk n)t, where βj n 0 for j = 1,..., q and βj n = 0 for j = q + 1,..., k. Suppose β(1) n = (βn 1,..., β1) q T e β(2) n = (βq+1, n..., β q k )T. Then one can write X n (1) and X n (2) as the first q and the last k q columns of X n, respectively, and define C n = 1 X n n T X n. By setting C11 n = 1 X n n(1) T X n (1), C22 n = 1 X n n(2) T X n (2), C12 n = 1 X n n(1) T X n (2) and C21 n = 1 X n n(2) T X n (1), C n can be expressed in a block-wise form as follows: ( ) C C n n = 11 C12 n C21 n C22 n. Assuming C n 11 is invertible, the following Irrepresentable Conditions are defined. Strong Irrepresentable Condition. There exists a positive constant vector η such that C n 21 (C n 11) 1 sign(β n (1)) 1 η, where 1 is a (k q) 1 vector and the inequality holds element-wise. Weak Irrepresentable Condition. C n 21 (C n 11) 1 sign(β n (1)) < 1, where the inequality holds element-wise. (Zou, 2006) similarly concludes that there is a necessary condition for LASSO s consistency in model selection. 3.2 adalasso Noting that there may be situations in which the LASSO is not consistent in variable selection, (Zou, 2006) proposes the adaptive LASSO (adalasso), which employs different weights to different coefficients: ˆβ adalasso = argmin β 0,β 1,...,β k { n i=1 ( y i β 0 } k ) 2 k β j x ji + λ ω j β j, (8) ridge where ω j = ˆβ j τ, τ > 0. Individual weights ω j help to select relevant variables. A relevant variable x j tends to ˆβ ridge j have a large coefficient, resulting in a small weight ω j assigned to the coefficient of ridge that variable; otherwise, if the variable x j is irrelevant, the ridge coefficient ˆβ j tends to be small and will result in a great ω j. Thus, the adalasso imposes a greater penalty on coefficients of the variables that appear to be irrelevant. The weights ω j can also be obtained from OLS estimates; however, this would be limited to the case where n > k + 1. Following (Zou, 2006) notations, A = {j : β j 0} is the true set of non-zero coefficients. In turn, A n = {j : ˆβ (n) j 0} is the set of non-zero coefficients estimated by (8), where λ n 7

8 varies with the sample size n. (Zou, 2006) shows that with the use of adequate weights ω j the adalasso has the oracle properties. Theorem 1 (Zou, 2006): Suppose λ n / n 0 and λ n n (τ 1)/2. Therefore the adalasso satisfies: 1. Consistency in variable selection: lim n P (A n = A) = Asymptotic normality: n ( ˆβ (n) A ) β A d N ( ) 0, σ 2 C Thus, in addition to the correct selection of the relevant variables when the sample size increases, the adalasso has estimates of non-zero coefficients that asymptotically follow the same distribution as the OLS estimators when the OLS is estimated only with the relevant variables. 3.3 WLadaLASSO When one performs adalasso in a time series context, each lagged variable enters as a predictor candidate and its coefficient is only penalized according to the size of its Ridge (or OLS) estimate. One can then wonder whether a greater penalty for a more distant lagged variable improves the time series forecasting, since more recent information is usually more important. (Park and Sakaori, 2013) propose some alternative types of penalties for different lags. Borrowing from their ideas in a slight different version, we propose the adalasso with weighted lags, called here as WLadaLASSO (Weighted Lag adaptive LASSO), which is given by where ω j = ˆβ W LadaLASSO = argmin β 0,β 1,...,β k ( { n i=1 ( y i β 0 } k ) 2 k β j x ji + λ ω j β j ) τ ridge ˆβ j e αl, τ > 0, α 0, and l represents the lag order., (9) In what follows, besides looking at WLadaLASSO forecasting skills, we will also study its model selection and estimation capabilities, comparing the results to other regularization methods. 4 Simulation In this section we analyze the WLadaLASSO performance, comparing it to other regularization methods in a Monte Carlo simulation study. All implementations are done with the free software R. The estimation of equations (7), (8) and (9) uses the function glmnet to optimize the parameters β j and λ. We use a 10-fold cross-validation. The parameter τ is set to 1. In WLadaLASSO, for each α belonging to the set [0, 0.5, 1,..., 10], the 10-fold crossvalidation technique is performed to find the optimal λ observing the smallest measurement 8

9 error. The chosen value for α is the one that produces the smallest measurement error of the performed 10-fold cross-validation procedures. Through Monte Carlo simulations with 1,000 replications, we have simulated 10 independent time series that follow an AR(1) as x i,t = φx i,t 1 +u i,t, where u i,t N(0,1), i = 1,..., 10. The following data generating process was considered: y t = 0.8y t x 1,t x 1,t 2 0.5x 2,t 1 0.2x 2,t x 3,t x 3,t x 4,t 1 0.3x 5,t x 6,t 1 + ε t, t = 1, 2,..., T, (10) where ε t N(0,1). The LASSO, adalasso, and WLadaLASSO methods were employed to estimate the equation (10) with 10 lags of y t and 10 lags of x j,t, j = 1,... 10, as candidates, resulting in 110 candidate predictors. In order to compare the coefficients estimates to their true values, we used the Mean Squared Error (MSE) and the Mean Absolute Error (MAE) of the estimates which are respectively given by and MSE = k i=1 MAE = k i=1 k ( ) 2 ˆβj β j (11) k ˆβj β j. (12) We removed the last 10 observations of the simulated series and then performed the onestep-ahead out-of-sample forecasts for the observations removed. For each t, the forecast of y t was based on the entire set of existing information at the date t 1. Table 1, inspired by the working paper of (Medeiros and Mendes, 2012), shows various statistics related to variable selection, divided into panels, for different sample sizes. The first panel shows the fraction of replications in which the model was correctly selected: all relevant variables were included and all irrelevant variables were excluded from the final model; the second panel shows the fraction of replications where all relevant covariates were included; the third shows the fraction of relevant covariates included; the fourth shows the fraction of irrelevant regressors excluded; and the last shows the average number of covariates included. The results in Table 1 show that WLadaLASSO had a slightly better performance on the exclusion of irrelevant covariates than LASSO and adalasso. On the identification of relevant covariates for the cases where φ = 0.3 and φ = 0.6, the three methods had reasonably similar performance for sample sizes 500 and 2000, but WLadaLASSO was quite superior for the sample size 50. For φ = 0.9, this superiority is even greater for the smallest sample size. Table 2 shows the error measures of the set of estimated parameters, calculated by (11) and (12). The WLadaLASSO stood out for all sample sizes, especially for the smallest one. Figure 1 shows the smoothed histogram (using a Gaussian kernel) of the 1,000 estimated values for β 1, the coefficient of y t 1. The plots show that the WLadaLASSO estimates were the best, mainly for the cases of high-correlated covariates and/or small sample sizes. In our simulations we also observed the MSE and MAE of the difference between the OLS and WLadaLASSO estimates when the former is estimated using only the relevant variables. 9

10 Both error measurements tend to zero when the sample size increases, verifying one of the oracle properties (Fan and Li, 2001). The results of the one-step-ahead forecasts are reported in Table 3. The WLadaLASSO had the best predictive performance in almost all cases, and stood out for the cases T = 50. We can observe that it was strikingly superior when there was a higher linear dependence between covariates. Figure 2 illustrates the distribution of the obtained prediction errors. 5 Empirical Analysis Stock returns forecasting is of great interest to academics and financial market investors, and many economic variables have been proposed as potential predictors. (Goyal and Welch, 2008) show that an extensive list of potential predictors used in the literature to provide forecasts are unstable to get out-of-sample forecasts when they are compared to the simple model based on the historical average return. In contrast, (Campbell and Thompson, 2008) show that many regressions can provide better predictions than the historical average predictions when restrictions are imposed on the signs of the coefficients. Although this superiority to the historical average is generally small and statistically insignificant, it can be economically significant. Among the articles that corroborate the results of (Campbell and Thompson, 2008), we can cite (Rapach et al., 2010), (Ferreira and Santa-Clara, 2011) and (Hillebrand et al., 2012). Other recent works have analyzed whether financial time series may be predicted by a list of covariates, see for example (Issler et al., 2014), (Lee et al., 2014) and (Hsiao and Wan, 2014). (Goyal and Welch, 2008) estimated linear regressions via OLS where the risk premium at time t is explained by a predictor variable at time t 1. Furthermore, they estimated linear regressions including all candidate predictors. Here we have applied methods of L 1 norm penalty on the regression coefficients with multiple predictors as candidates, delegating to these methods the choice of predictors for the risk premium, which is the total rate of return on the stock market minus the prevailing short-term interest rate. We used 14 variables from (Goyal and Welch, 2008): dividend-price ratio (log); dividend yield (log); earnings-price ratio (log); dividend-payout ratio (log); stock variance; book-tomarket ratio; net equity expansion; treasury bill rate; long-term yield; long-term return; term spread; default yield spread; default return spread; and inflation. All annual series used as predictors begin in 1927 and end in We worked with differentiated and centered series, estimating the models without intercept. For penalty methods, the number of lags was optimized via a 10-fold cross-validation approach, where the maximal lag order was five. In order to compare the one-step-ahead predictions of the h out-of-sample observations, we use the ROS 2 (out-of-sample R2 ) statistics, as suggested in (Campbell and Thompson, 2008), which is given by R 2 OS = 1 T +h t=t +1 (r t ˆr t ) 2 T +h t=t +1 (r t r t ), 2 where ˆr t, t = 1,..., T are the predicted values for T out-of-sample observations, and r t is the historical mean until t 1. 10

11 P-values of the Modified Diebold-Mariano (MDM) test (Harvey et al., 1997) were computed in order to compare the predictions via historical mean to the other methods. The Diebold Mariano one-tailed test has the null hypothesis of no difference in the accuracy of two competing forecasts. When the null hypothesis is rejected, we conclude that the method has better predictive ability than the historical mean. Table 4 shows the superiority of predictions made by penalization methods compared to the historical mean in three different periods which begin in 1955, 1970 and We employed those methods in two ways: by using only one lag (l = 1) of all individual predictors (as Goyal and Welch (2008)) and by using the optimal number of lags chosen via 10-fold cross-validation. In both cases the penalization methods achieved better predictions than the historical mean, having WLadaLASSO obtained the best results. 6 Conclusion This study aimed to investigate how the penalty on the coefficients set can contribute to the performance of time-series forecasting. The methods that penalize the coefficients are extremely important in reducing the dimensionality of some economic time series, where the number of time series is high and there are few observations. LASSO and adalasso methods have arisen in the context of linear regression but are increasingly present in time series analysis. Observing that some more recent information tends to contribute more to time series forecasting, and inspired by (Park and Sakaori, 2013), this article proposes the WLadaLASSO, which penalizes each lagged variable differently. The simulation study showed that, essentially for small samples, WLadaLASSO outperforms other penalization methods in many perspectives: covariate selection, parameter estimation and out-of-sample forecasting, especially for small sample sizes and highly-correlated covariates. In addition, the application to U.S. financial data shows that the predictions obtained by WLadaLASSO for the risk premium were also superior to competitor methods and were significantly better than those obtained by the historical mean model. References Audrino F, Camponovo L Oracle properties and finite sample inference of the adaptive lasso for time series regression models. Economics Working Paper Series 1327, University of St. Gallen, School of Economics and Political Science. URL p/usg/econwp/ html. Berk KN Comparing subset regression procedures. Technometrics 20(1): 1 6. doi: / Breiman L Better subset selection using the non-negative garrote. Technometrics 37(4): Campbell JY, Thompson SB Predicting excess stock returns out of sample: Can 11

12 anything beat the historical average? Review of Financial Studies 21(4): doi: /rfs/hhm055. De Mol C, Giannone D, Reichlin L Forecasting using a large number of predictors: Is bayesian shrinkage a valid alternative to principal components? Journal of Econometrics 146(2): doi: /j.jeconom Fan J, Guo S, Hao N Variance estimation using refitted cross-validation in ultrahigh dimensional regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 74(1): doi: /j x. Fan J, Li R Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96(456): doi: / Fan J, Lv J Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70(5): doi: /j x. Fan J, Lv J A selective overview of variable selection in high dimensional feature space. Statistica Sinica 20(1): Ferreira MA, Santa-Clara P Forecasting stock market returns: The sum of the parts is more than the whole. Journal of Financial Economics 100(3): doi: /j. jfineco Goyal A, Welch I A comprehensive look at the empirical performance of equity premium prediction. Review of Financial Studies 21(4): doi: /rfs/hhm014. Harvey D, Leybourne S, Newbold P Testing the equality of prediction mean squared errors. International Journal of Forecasting 13(2): doi: /s (96) Hastie T, Tibshirani R, Friedman JJH The elements of statistical learning, volume 1. Springer New York. doi: / Hillebrand E, Lee TH, Medeiros MC Let s do it again: Bagging equity premium predictors. CREATES Research Papers , School of Economics and Management, University of Aarhus. URL Hsiao C, Wan SK Is there an optimal forecast combination? Journal of Econometrics 178: doi: /j.jeconom Hua Y Macroeconomic Forecasting using Large Vector Auto Regressive Model. Master s thesis. URL Issler JV, Rodrigues C, Burjack R Using common features to understand the behavior of metal-commodity prices and forecast them at different horizons. Journal of International Money and Finance 42: doi: /j.jimonfin

13 Lee TH, Tu Y, Ullah A Forecasting equity premium: Global historical average versus local historical average and constraints. Journal of Business & Economic Statistics doi: / Li J Monetary policy analysis based on lasso-assisted vector autoregression (lavar). Working paper. doi: /ssrn Medeiros MC, Mendes EF Estimating high-dimensional time series models. CREATES Research Papers , School of Economics and Management, University of Aarhus. URL Park H, Sakaori F Lag weighted lasso for time series model. Computational Statistics 28(2): doi: /s Rapach DE, Strauss JK, Zhou G Out-of-sample equity premium prediction: Combination forecasts and links to the real economy. Review of Financial Studies 23(2): doi: /rfs/hhp063. URL 2/821.abstract. Song S, Bickel PJ Large vector auto regressions. ArXiv: Tibshirani R Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58: doi: /j x. Tibshirani RJ The lasso problem and uniqueness. Electronic Journal of Statistics 7: doi: /13-ejs815. Zhao P, Yu B On model selection consistency of lasso. The Journal of Machine Learning Research 7: URL: Zou H The adaptive lasso and its oracle properties. Journal of the American statistical association 101(476): doi: /

14 Table 1: Descriptive statistics of model selection 14 φ = 0.3 φ = 0.6 φ = 0.9 T Fraction of variables correctly identified Fraction of variables correctly identified Fraction of variables correctly identified LASSO adalasso WLadaLASSO True Model Included True Model Included True Model Included LASSO adalasso WLadaLASSO Fraction of relevant variables included Fraction of relevant variables included Fraction of relevant variables included LASSO adalasso WLadaLASSO Fraction of irrelevant variables excluded Fraction of irrelevant variables excluded Fraction of irrelevant variables excluded LASSO adalasso WLadaLASSO Number of included variables Number of included variables Number of included variables LASSO adalasso WLadaLASSO

15 Table 2: Descriptive statistics of parameter estimates φ = 0.3 φ = 0.6 φ = 0.9 T Mean Squared Error Mean Squared Error Mean Squared Error LASSO adalasso WLadaLASSO Mean Absolute Error Mean Absolute Error Mean Absolute Error LASSO adalasso WLadaLASSO

16 Table 3: Descriptive statistics of forecasts 16 φ = 0.3 φ = 0.6 φ = 0.9 T Mean of MSE s Mean of MSE s Mean of MSE s LASSO adalasso WLadaLASSO Median of MSE s Median of MSE s Median of MSE s LASSO adalasso WLadaLASSO Mean of MAE s Mean of MAE s Mean of MAE s LASSO adalasso WLadaLASSO Median of MAE s Median of MAE s Median of MAE s LASSO adalasso WLadaLASSO

17 a) φ = 0.3, T = 50 b) φ = 0.3, T = 500 c) φ = 0.3, T = 2000 Density Density Density d) φ = 0.6, T = 50 e) φ = 0.6, T = 500 f) φ = 0.6, T = Density Density Density g) φ = 0.9, T = 50 h) φ = 0.9, T = 500 i) φ = 0.9, T = 2000 Density Density Density Figure 1: Observed density function of ˆβ1 (estimator for β 1 = 0.8). LASSO (continuous line), adalasso (dashed line), and WLadaLASSO (dotted line)

18 φ = 0.3, T = 50 φ = 0.3, T = 500 φ = 0.3, T = LASSO adalasso WLadaLASSO φ = 0.6, T = 50 LASSO adalasso WLadaLASSO φ = 0.6, T = 500 LASSO adalasso WLadaLASSO φ = 0.6, T = LASSO adalasso WLadaLASSO φ = 0.9, T = 50 LASSO adalasso WLadaLASSO φ = 0.9, T = 500 LASSO adalasso WLadaLASSO φ = 0.9, T = LASSO adalasso WLadaLASSO LASSO adalasso WLadaLASSO LASSO adalasso WLadaLASSO Figure 2: Distribution of forecast errors

19 Table 4: Equity Premium Forecasting Results 19 Forecasts begin in 1955 Forecasts begin in 1970 Forecasts begin in 1985 ROS 2 MDM p-value ROS 2 MDM p-value ROS 2 MDM p-value Individual predictors Dividend Price Ratio Dividend Yield Earning Price Ratio Dividend Payout Ratio Stock Variance Book to Market Net Equity Expansion T-Bill Rate Long Term Yield Long Term Return Term Spread Default Yield Spread Default Return Spread Inflation All Regressors Penalization Methods (l = 1) LASSO adalasso Penalization Methods (free l) LASSO adalasso WLadaLASSO

LASSO-Type Penalties for Covariate Selection and Forecasting in Time Series

LASSO-Type Penalties for Covariate Selection and Forecasting in Time Series Journal of Forecasting, J. Forecast. 35, 592 612 (2016) Published online 21 February 2016 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/for.2403 LASSO-Type Penalties for Covariate Selection

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

ESTIMATING HIGH-DIMENSIONAL TIME SERIES MODELS

ESTIMATING HIGH-DIMENSIONAL TIME SERIES MODELS ESTIMATING HIGH-DIMENSIONAL TIME SERIES MODELS Marcelo C. Medeiros Department of Economics Pontifical Catholic University of Rio de Janeiro Rua Marquês de São Vicente 225, Gávea Rio de Janeiro, 22451-9,

More information

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1 Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr

More information

The lasso, persistence, and cross-validation

The lasso, persistence, and cross-validation The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University

More information

Comparisons of penalized least squares. methods by simulations

Comparisons of penalized least squares. methods by simulations Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu Department of Statistics, University of Illinois at Urbana-Champaign WHOA-PSI, Aug, 2017 St. Louis, Missouri 1 / 30 Background Variable

More information

Comparing Forecast Accuracy of Different Models for Prices of Metal Commodities

Comparing Forecast Accuracy of Different Models for Prices of Metal Commodities Comparing Forecast Accuracy of Different Models for Prices of Metal Commodities João Victor Issler (FGV) and Claudia F. Rodrigues (VALE) August, 2012 J.V. Issler and C.F. Rodrigues () Forecast Models for

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

Complete Subset Regressions

Complete Subset Regressions Complete Subset Regressions Graham Elliott UC San Diego Antonio Gargano Bocconi University, visiting UCSD November 7, 22 Allan Timmermann UC San Diego Abstract This paper proposes a new method for combining

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

On High-Dimensional Cross-Validation

On High-Dimensional Cross-Validation On High-Dimensional Cross-Validation BY WEI-CHENG HSIAO Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan hsiaowc@stat.sinica.edu.tw 5 WEI-YING

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly Robust Variable Selection Methods for Grouped Data by Kristin Lee Seamon Lilly A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree

More information

Oracle Properties and Finite Sample Inference of the Adaptive Lasso for Time Series Regression Models

Oracle Properties and Finite Sample Inference of the Adaptive Lasso for Time Series Regression Models Oracle Properties and Finite Sample Inference of the Adaptive Lasso for Time Series Regression Models Francesco Audrino University of St.Gallen Lorenzo Camponovo University of St.Gallen December 2013 Abstract

More information

The Illusion of Independence: High Dimensional Data, Shrinkage Methods and Model Selection

The Illusion of Independence: High Dimensional Data, Shrinkage Methods and Model Selection The Illusion of Independence: High Dimensional Data, Shrinkage Methods and Model Selection Daniel Coutinho Pedro Souza (Orientador) Marcelo Medeiros (Co-orientador) November 30, 2017 Daniel Martins Coutinho

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Least Absolute Shrinkage is Equivalent to Quadratic Penalization

Least Absolute Shrinkage is Equivalent to Quadratic Penalization Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr

More information

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010 Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,

More information

Keywords: sparse models, shrinkage, LASSO, adalasso, time series, forecasting, GARCH.

Keywords: sparse models, shrinkage, LASSO, adalasso, time series, forecasting, GARCH. l 1 -REGULARIZAION OF HIGH-DIMENSIONAL IME-SERIES MODELS WIH FLEXIBLE INNOVAIONS Marcelo C. Medeiros Department of Economics Pontifical Catholic University of Rio de Janeiro Rua Marquês de São Vicente

More information

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator by Emmanuel Flachaire Eurequa, University Paris I Panthéon-Sorbonne December 2001 Abstract Recent results of Cribari-Neto and Zarkos

More information

A Confidence Region Approach to Tuning for Variable Selection

A Confidence Region Approach to Tuning for Variable Selection A Confidence Region Approach to Tuning for Variable Selection Funda Gunes and Howard D. Bondell Department of Statistics North Carolina State University Abstract We develop an approach to tuning of penalized

More information

An Improved 1-norm SVM for Simultaneous Classification and Variable Selection

An Improved 1-norm SVM for Simultaneous Classification and Variable Selection An Improved 1-norm SVM for Simultaneous Classification and Variable Selection Hui Zou School of Statistics University of Minnesota Minneapolis, MN 55455 hzou@stat.umn.edu Abstract We propose a novel extension

More information

Prediction & Feature Selection in GLM

Prediction & Feature Selection in GLM Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne

More information

The Risk of James Stein and Lasso Shrinkage

The Risk of James Stein and Lasso Shrinkage Econometric Reviews ISSN: 0747-4938 (Print) 1532-4168 (Online) Journal homepage: http://tandfonline.com/loi/lecr20 The Risk of James Stein and Lasso Shrinkage Bruce E. Hansen To cite this article: Bruce

More information

Iterative Selection Using Orthogonal Regression Techniques

Iterative Selection Using Orthogonal Regression Techniques Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

On Model Selection Consistency of Lasso

On Model Selection Consistency of Lasso On Model Selection Consistency of Lasso Peng Zhao Department of Statistics University of Berkeley 367 Evans Hall Berkeley, CA 94720-3860, USA Bin Yu Department of Statistics University of Berkeley 367

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates

More information

Out-of-Sample Return Predictability: a Quantile Combination Approach

Out-of-Sample Return Predictability: a Quantile Combination Approach Out-of-Sample Return Predictability: a Quantile Combination Approach Luiz Renato Lima a and Fanning Meng a August 8, 2016 Abstract This paper develops a novel forecasting method that minimizes the effects

More information

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013) A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Linear model selection and regularization

Linear model selection and regularization Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It

More information

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Sílvia Gonçalves and Benoit Perron Département de sciences économiques,

More information

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Econ 423 Lecture Notes: Additional Topics in Time Series 1 Econ 423 Lecture Notes: Additional Topics in Time Series 1 John C. Chao April 25, 2017 1 These notes are based in large part on Chapter 16 of Stock and Watson (2011). They are for instructional purposes

More information

Analysis of Fast Input Selection: Application in Time Series Prediction

Analysis of Fast Input Selection: Application in Time Series Prediction Analysis of Fast Input Selection: Application in Time Series Prediction Jarkko Tikka, Amaury Lendasse, and Jaakko Hollmén Helsinki University of Technology, Laboratory of Computer and Information Science,

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 7/8 - High-dimensional modeling part 1 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Classification

More information

Using all observations when forecasting under structural breaks

Using all observations when forecasting under structural breaks Using all observations when forecasting under structural breaks Stanislav Anatolyev New Economic School Victor Kitov Moscow State University December 2007 Abstract We extend the idea of the trade-off window

More information

Penalized Estimation of Panel Vector Autoregressive Models: A Lasso Approach

Penalized Estimation of Panel Vector Autoregressive Models: A Lasso Approach Penalized Estimation of Panel Vector Autoregressive Models: A Lasso Approach Annika Schnücker Freie Universität Berlin and DIW Berlin Graduate Center, Mohrenstr. 58, 10117 Berlin, Germany October 11, 2017

More information

Package multdm. May 18, 2018

Package multdm. May 18, 2018 Type Package Package multdm May 18, 2018 Title Multivariate Version of the Diebold-Mariano Test Version 1.0 Date 2018-05-18 Author Krzysztof Drachal [aut, cre] (Faculty of Economic Sciences, University

More information

Input Selection for Long-Term Prediction of Time Series

Input Selection for Long-Term Prediction of Time Series Input Selection for Long-Term Prediction of Time Series Jarkko Tikka, Jaakko Hollmén, and Amaury Lendasse Helsinki University of Technology, Laboratory of Computer and Information Science, P.O. Box 54,

More information

Shrinkage Methods: Ridge and Lasso

Shrinkage Methods: Ridge and Lasso Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and

More information

A Comparison between LARS and LASSO for Initialising the Time-Series Forecasting Auto-Regressive Equations

A Comparison between LARS and LASSO for Initialising the Time-Series Forecasting Auto-Regressive Equations Available online at www.sciencedirect.com Procedia Technology 7 ( 2013 ) 282 288 2013 Iberoamerican Conference on Electronics Engineering and Computer Science A Comparison between LARS and LASSO for Initialising

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

The Generalized Cochrane-Orcutt Transformation Estimation For Spurious and Fractional Spurious Regressions

The Generalized Cochrane-Orcutt Transformation Estimation For Spurious and Fractional Spurious Regressions The Generalized Cochrane-Orcutt Transformation Estimation For Spurious and Fractional Spurious Regressions Shin-Huei Wang and Cheng Hsiao Jan 31, 2010 Abstract This paper proposes a highly consistent estimation,

More information

Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building

Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building Cong Liu, Tao Shi and Yoonkyung Lee Department of Statistics, The Ohio State University Abstract Variable selection

More information

Forward Regression for Ultra-High Dimensional Variable Screening

Forward Regression for Ultra-High Dimensional Variable Screening Forward Regression for Ultra-High Dimensional Variable Screening Hansheng Wang Guanghua School of Management, Peking University This version: April 9, 2009 Abstract Motivated by the seminal theory of Sure

More information

Nowcasting New Zealand GDP Using Machine Learning Algorithms

Nowcasting New Zealand GDP Using Machine Learning Algorithms Crawford School of Public Policy CAMA Centre for Applied Macroeconomic Analysis Nowcasting New Zealand GDP Using Machine Learning Algorithms CAMA Working Paper 47/2018 September 2018 Adam Richardson Reserve

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

A simulation study of model fitting to high dimensional data using penalized logistic regression

A simulation study of model fitting to high dimensional data using penalized logistic regression A simulation study of model fitting to high dimensional data using penalized logistic regression Ellinor Krona Kandidatuppsats i matematisk statistik Bachelor Thesis in Mathematical Statistics Kandidatuppsats

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

Forecasting the term structure interest rate of government bond yields

Forecasting the term structure interest rate of government bond yields Forecasting the term structure interest rate of government bond yields Bachelor Thesis Econometrics & Operational Research Joost van Esch (419617) Erasmus School of Economics, Erasmus University Rotterdam

More information

Variable Selection in Predictive Regressions

Variable Selection in Predictive Regressions Variable Selection in Predictive Regressions Alessandro Stringhi Advanced Financial Econometrics III Winter/Spring 2018 Overview This chapter considers linear models for explaining a scalar variable when

More information

Robust Variable Selection Through MAVE

Robust Variable Selection Through MAVE Robust Variable Selection Through MAVE Weixin Yao and Qin Wang Abstract Dimension reduction and variable selection play important roles in high dimensional data analysis. Wang and Yin (2008) proposed sparse

More information

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY Time Series Analysis James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY & Contents PREFACE xiii 1 1.1. 1.2. Difference Equations First-Order Difference Equations 1 /?th-order Difference

More information

Are Forecast Updates Progressive?

Are Forecast Updates Progressive? CIRJE-F-736 Are Forecast Updates Progressive? Chia-Lin Chang National Chung Hsing University Philip Hans Franses Erasmus University Rotterdam Michael McAleer Erasmus University Rotterdam and Tinbergen

More information

Steven Cook University of Wales Swansea. Abstract

Steven Cook University of Wales Swansea. Abstract On the finite sample power of modified Dickey Fuller tests: The role of the initial condition Steven Cook University of Wales Swansea Abstract The relationship between the initial condition of time series

More information

Predicting bond returns using the output gap in expansions and recessions

Predicting bond returns using the output gap in expansions and recessions Erasmus university Rotterdam Erasmus school of economics Bachelor Thesis Quantitative finance Predicting bond returns using the output gap in expansions and recessions Author: Martijn Eertman Studentnumber:

More information

Homogeneity Pursuit. Jianqing Fan

Homogeneity Pursuit. Jianqing Fan Jianqing Fan Princeton University with Tracy Ke and Yichao Wu http://www.princeton.edu/ jqfan June 5, 2014 Get my own profile - Help Amazing Follow this author Grace Wahba 9 Followers Follow new articles

More information

Issues on quantile autoregression

Issues on quantile autoregression Issues on quantile autoregression Jianqing Fan and Yingying Fan We congratulate Koenker and Xiao on their interesting and important contribution to the quantile autoregression (QAR). The paper provides

More information

The prediction of house price

The prediction of house price 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Day 4: Shrinkage Estimators

Day 4: Shrinkage Estimators Day 4: Shrinkage Estimators Kenneth Benoit Data Mining and Statistical Learning March 9, 2015 n versus p (aka k) Classical regression framework: n > p. Without this inequality, the OLS coefficients have

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

Comparing Nested Predictive Regression Models with Persistent Predictors

Comparing Nested Predictive Regression Models with Persistent Predictors Comparing Nested Predictive Regression Models with Persistent Predictors Yan Ge y and ae-hwy Lee z November 29, 24 Abstract his paper is an extension of Clark and McCracken (CM 2, 25, 29) and Clark and

More information

The Role of "Leads" in the Dynamic Title of Cointegrating Regression Models. Author(s) Hayakawa, Kazuhiko; Kurozumi, Eiji

The Role of Leads in the Dynamic Title of Cointegrating Regression Models. Author(s) Hayakawa, Kazuhiko; Kurozumi, Eiji he Role of "Leads" in the Dynamic itle of Cointegrating Regression Models Author(s) Hayakawa, Kazuhiko; Kurozumi, Eiji Citation Issue 2006-12 Date ype echnical Report ext Version publisher URL http://hdl.handle.net/10086/13599

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

In Search of Desirable Compounds

In Search of Desirable Compounds In Search of Desirable Compounds Adrijo Chakraborty University of Georgia Email: adrijoc@uga.edu Abhyuday Mandal University of Georgia Email: amandal@stat.uga.edu Kjell Johnson Arbor Analytics, LLC Email:

More information

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics Summary Algebraic and Statistical Preliminaries Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

Functional Coefficient Models for Nonstationary Time Series Data

Functional Coefficient Models for Nonstationary Time Series Data Functional Coefficient Models for Nonstationary Time Series Data Zongwu Cai Department of Mathematics & Statistics and Department of Economics, University of North Carolina at Charlotte, USA Wang Yanan

More information

Estimating Global Bank Network Connectedness

Estimating Global Bank Network Connectedness Estimating Global Bank Network Connectedness Mert Demirer (MIT) Francis X. Diebold (Penn) Laura Liu (Penn) Kamil Yılmaz (Koç) September 22, 2016 1 / 27 Financial and Macroeconomic Connectedness Market

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 6 Jakub Mućk Econometrics of Panel Data Meeting # 6 1 / 36 Outline 1 The First-Difference (FD) estimator 2 Dynamic panel data models 3 The Anderson and Hsiao

More information

Estimating and Accounting for the Output Gap with Large Bayesian Vector Autoregressions

Estimating and Accounting for the Output Gap with Large Bayesian Vector Autoregressions Estimating and Accounting for the Output Gap with Large Bayesian Vector Autoregressions James Morley 1 Benjamin Wong 2 1 University of Sydney 2 Reserve Bank of New Zealand The view do not necessarily represent

More information

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

Regression Shrinkage and Selection via the Lasso

Regression Shrinkage and Selection via the Lasso Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,

More information

Nonlinear Forecasting With Many Predictors Using Kernel Ridge Regression

Nonlinear Forecasting With Many Predictors Using Kernel Ridge Regression Nonlinear Forecasting With Many Predictors Using Kernel Ridge Regression Peter Exterkate a, Patrick J.F. Groenen b Christiaan Heij b Dick van Dijk b a CREATES, Aarhus University, Denmark b Econometric

More information

Testing an Autoregressive Structure in Binary Time Series Models

Testing an Autoregressive Structure in Binary Time Series Models ömmföäflsäafaäsflassflassflas ffffffffffffffffffffffffffffffffffff Discussion Papers Testing an Autoregressive Structure in Binary Time Series Models Henri Nyberg University of Helsinki and HECER Discussion

More information

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Qiwei Yao Department of Statistics, London School of Economics q.yao@lse.ac.uk Joint work with: Hongzhi An, Chinese Academy

More information

Lecture 14: Shrinkage

Lecture 14: Shrinkage Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the

More information

Lecture 6: Univariate Volatility Modelling: ARCH and GARCH Models

Lecture 6: Univariate Volatility Modelling: ARCH and GARCH Models Lecture 6: Univariate Volatility Modelling: ARCH and GARCH Models Prof. Massimo Guidolin 019 Financial Econometrics Winter/Spring 018 Overview ARCH models and their limitations Generalized ARCH models

More information

Monitoring Forecasting Performance

Monitoring Forecasting Performance Monitoring Forecasting Performance Identifying when and why return prediction models work Allan Timmermann and Yinchu Zhu University of California, San Diego June 21, 2015 Outline Testing for time-varying

More information

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I.

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I. Vector Autoregressive Model Vector Autoregressions II Empirical Macroeconomics - Lect 2 Dr. Ana Beatriz Galvao Queen Mary University of London January 2012 A VAR(p) model of the m 1 vector of time series

More information

Inflation Revisited: New Evidence from Modified Unit Root Tests

Inflation Revisited: New Evidence from Modified Unit Root Tests 1 Inflation Revisited: New Evidence from Modified Unit Root Tests Walter Enders and Yu Liu * University of Alabama in Tuscaloosa and University of Texas at El Paso Abstract: We propose a simple modification

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

A Non-Parametric Approach of Heteroskedasticity Robust Estimation of Vector-Autoregressive (VAR) Models

A Non-Parametric Approach of Heteroskedasticity Robust Estimation of Vector-Autoregressive (VAR) Models Journal of Finance and Investment Analysis, vol.1, no.1, 2012, 55-67 ISSN: 2241-0988 (print version), 2241-0996 (online) International Scientific Press, 2012 A Non-Parametric Approach of Heteroskedasticity

More information

Panel Threshold Regression Models with Endogenous Threshold Variables

Panel Threshold Regression Models with Endogenous Threshold Variables Panel Threshold Regression Models with Endogenous Threshold Variables Chien-Ho Wang National Taipei University Eric S. Lin National Tsing Hua University This Version: June 29, 2010 Abstract This paper

More information

Penalization method for sparse time series model

Penalization method for sparse time series model Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS020) p.6280 Penalization method for sparse time series model Heewon, Park Chuo University, Department of Mathemetics

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information