LASSO-Type Penalties for Covariate Selection and Forecasting in Time Series

Size: px
Start display at page:

Download "LASSO-Type Penalties for Covariate Selection and Forecasting in Time Series"

Transcription

1 Journal of Forecasting, J. Forecast. 35, (2016) Published online 21 February 2016 in Wiley Online Library (wileyonlinelibrary.com) DOI: /for.2403 LASSO-Type Penalties for Covariate Selection and Forecasting in Time Series EVANDRO KONZEN 1 AND FLAVIO A. ZIEGELMANN 2 1 Graduate Program in Economics, Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil; and School of Mathematics and Statistics, Newcastle University, Newcastle upon Tyne, United Kingdom 2 Department of Statistics and Graduate Programs in Economics and Management, Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil ABSTRACT This paper studies some forms of LASSO-type penalties in time series to reduce the dimensionality of the parameter space as well as to improve out-of-sample forecasting performance. In particular, we propose a method that we call WLadaLASSO (weighted lag adaptive LASSO), which assigns not only different weights to each coefficient but also further penalizes coefficients of higher-lagged covariates. In our Monte Carlo implementation, the WLadaLASSO is superior in terms of covariate selection, parameter estimation precision and forecasting, when compared to both LASSO and adalasso, especially for a higher number of candidate lags and a stronger linear dependence between predictors. Empirical studies illustrate our approach for US risk premium and US inflation forecasting with good results.copyright 2016 John Wiley & Sons, Ltd. KEY WORDS time series; LASSO; adalasso; variable selection; forecasting INTRODUCTION High-dimensional models have been increasingly present in the literature. It is known that the inclusion of a large number of economic and financial variables can contribute to substantial gains for time series forecasting. As Song and Bickel (2011) point out, a challenging problem is to determine which variables and lags are relevant, especially when there is a conjunction of serial correlation, high-dimensional dependence structure among variables and small sample size (relative to the dimensionality). As Fan and Lv (2010) state, statistical accuracy, interpretability of the model and computational complexity are three important pillars of any statistical procedure. Typically, the number of observations n is much greater than the number of variables or parameters p. However, when the dimensionality p is large compared to the sample sizen, the traditional methods face some challenges: among them, how to make interpretable models that are estimable; how to make the statistical procedures robust and computationally efficient; and how to obtain more efficient procedures in terms of statistical inference. Moreover, in a high-dimensionality context, when the number of covariates p is large compared to the sample size n, the traditional models can lead to problems of spurious correlation between the covariates, which can be serious even when the covariates are independent and identically distributed, as shown in Fan and Lv (2008) and Fan et al. (2012). One way to challenge the problems caused by high dimensionality is through the sparsity assumption on the p-dimensional parameter vector, forcing that many of its components are exactly zero. Although it generally produces biased estimates, the sparsity assumption helps one to identify the important covariates, then obtaining a more parsimonious model and reducing its complexity as well as the computational cost of estimating it. As Medeiros and Mendes (2012) comment, factor models provide a good alternative when many variables manifest importance in the model, a situation which the authors name by dense structure of the model. However, when the coefficient vector is indeed sparse, methods that assume sparsity gain importance. The least absolute shrinkage and selection operator (LASSO) (Tibshirani, 1996) is proposed in a linear regression context. It imposes a penalty on the set of L 1 norm of the coefficients. Due to the nature of this penalty, LASSO forces some coefficients to be exactly zero, making it useful to select covariates and to reduce the dimensionality of the parameter space. This methodology is an example of regularization techniques characterized by Breiman (1995), which considers an error function given by E D error on data C model complexity where the sum of squared residuals can, for instance, represent the term error on data. Correspondence to: Flavio A. Ziegelmann, Department of Statistics and Graduate Programs in Economics and Management, Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil flavioaz@mat.ufrgs.br Copyright 2016 John Wiley & Sons, Ltd

2 LASSO-Type Penalties in Time Series 593 The second term penalizes those models whose complexity and variance of the estimators are high, with representing how severe the penalty is. When it minimizes the error function rather than just minimizing the error on data, complex models are overpenalized and then estimator variances are reduced. If is too large though, only very simple models are obtained and a large bias can be introduced. LASSO is one of the most well-known regularization techniques, succeeding mainly in cases where there are many null coefficients within the set of coefficients to be estimated. Thus LASSO also becomes useful for the selection of covariates. Zou (2006) investigates the oracle properties, mentioned by Fan and Li (2001), of the original LASSO proposed by Tibshirani (1996). Zou (2006) shows that there are cases in which LASSO is not consistent in variable selection and hence proposes the adalasso (adaptive LASSO), where the penalty occurs with different weights for each coefficient, allowing this version to enjoy the oracle properties. In a time series context, LASSO-type penalties are employed in De Mol et al. (2008), Hua (2011) and Li (2012). Medeiros and Mendes (2012) show that the adalasso consistently chooses the relevant variables as the number of observations increases (model selection consistency), even when the errors are non-gaussian and conditionally heteroskedastic. In addition, Audrino and Camponovo (2013) present some theoretical and empirical results on the finite-sample and asymptotic properties of adalasso in time series regression models. Our present work borrows ideas from (Park and Sakaori, 2013) and proposes a variation which we call WLadaLASSO (weighted lag adaptive LASSO), a method which assigns different weights to each coefficient and also further penalizes coefficients of higher-lagged covariates. This idea is also similarly employed in the Minnesota prior used in Bayesian VARs. The results show the superiority of WLadaLASSO when it is compared to LASSO and adalasso, essentially for a higher linear dependence between predictors and a greater number of candidate lags. A first application is carried out to forecast risk premium using the same data as in Goyal and Welch (2008). Contrary to what the previous article points out, we find that the predictors used in the literature can help to predict the risk premium when LASSO-type methods are applied. A second application is conducted to forecast US inflation employing several explanatory covariates, where WLadaLASSO produces better forecasts than a benchmark model. The rest of this work is as follows. The next section introduces the theme of model selection. In the third section, LASSO-type penalties are described in detail. The fourth carries out a comprehensive Monte Carlo simulation study. The fifth section brings our empirical applications of risk premium and inflation forecasting. Finally, the sixth section concludes, bringing our main remarks. PRELIMINARIES ON MODEL SELECTION One of the main goals in linear regression analysis is to estimate the coefficients of the following Gaussian linear model: y i D ˇ0 C ˇ1x 1i C ˇ2x 2i C :::C ˇkx ki C " i ; i D 1;:::;n; (1) or, in a vector form: y i D ˇ0 C X T i ˇ C " i ; (2) where y i 2 R is the response variable, X i D.x 1i ;:::;x ki / T 2 R k is the set of predictors, " i N.0; 2 /, ˇ [ ˇ0 D.ˇ0;ˇ1;:::;ˇk/ T is the set of parameters. One of the most popular methods for estimating the unknown parameters of equation (1) ordinary least squares (OLS) is based on minimizing the sum of squared residuals (SSR), i.e. solving the following minimization problem: Oˇ D arg min ˇ0;ˇ1;:::;ˇk id1 0 i ˇ0 ˇj x ji 12 A : (3) However, the OLS method suffers from some problems when there are too many predictors in the model. Firstly, its estimates often have large variance, which reduces the accuracy of the forecasts. Secondly, only by itself, it lacks the skill of determining a smaller set of predictors that exhibits the best effects when one aims to find out which predictors most explain the variability of the response variable. Finally, by construction, the OLS method cannot be implemented when the number of parameters exceeds the number of observations. A natural procedure for choosing which predictors enter and which do not enter model (2) is to compute all 2 k possible regression models, investigating all possible combinations of predictors. However, this can require large computational resources. Thus some procedures have been proposed, such as best subset selection, forward and backward elimination, and forward-stagewise regression, as pointed by Hastie T et al. (2001).

3 594 E. Konzen and F. A. Ziegelmann However, these alternatives also have their own limitations when the model has many candidate covariates. Best subset selection is a feasible procedure for a k of magnitude not exceeding 30 or 40, as explained by Hastie T et al. (2001). Meanwhile, the procedure of forward selection and backward elimination may not select the best subset of variables in some situations, as stated by Berk (1978). Forward-stagewise regression, as an algorithm that can require many steps, has a large computational cost and may be impractical for problems of high dimensionality. Ridge regression, in turn, is used to shrink the set of coefficients by imposing a penalty on their sum of squares: Oˇridge D arg min ˇ0;ˇ1;:::;ˇk id1 ˇ0;ˇ1;:::;ˇk 0 i ˇ0 where the parameter t 0 controls the penalty. An equivalent expression is given by 8 0 ˆ< nx Oˇridge D arg i ˇ0 ˆ: id1 ˇj x ji 1 A 2 ˇj x ji subject to 1 A 2 C ˇ2j ˇ2j t; (4) 9 >= >; : (5) where the parameter 0 is a function of the parameter t in equation (4) and controls how severe the penalty is. The larger is, the larger the penalty gets. When D 0, the vector Oˇridge is equal to the coefficient vector obtained by OLS. Ridge regression obtains non-zero estimates for all coefficients, so it is not a method of variable selection. Mentioning that the choice of variables is important for the interpretation of the model, Breiman (1995) proposes the Garrote method, which penalizes the regression coefficients so that some of them are forced to zero. The disadvantage of the Garrote method is that its solution depends on the sign and magnitude of the OLS estimates, which are let down in the presence of high correlation between predictors. LASSO-TYPE PENALTIES As in ridge regression, described by equation (4), where a penalty on the sum of squares of the coefficients occurs, there are alternative methods that impose penalties on the sum of the absolute values of the coefficients. Some of these methods are described in this section. LASSO The least absolute shrinkage and selection operator (LASSO), proposed by Tibshirani (1996), is another method of shrinking the coefficient set. Just as the Garrote, LASSO aims to estimate a model that produces forecasts with small variance and determine the set of predictors that explain better the response variable. Tibshirani (1996) argues that techniques usually employed to improve OLS estimates, such as subset selection and ridge regression, have disadvantages. Subset selection models are easily interpreted, but the processes of choosing variables have great variability since they are discrete processes. Ridge regression, in turn, has less variability and shrinks the regression coefficients but still remains with all predictors in the model. As in typical regression modeling, represented by equations (1) and (2), it is supposed that the y i s are conditionally independent given the x ki s. It is assumed that x ki s are standardized such that P i x ki D 0 and P i x2 =n D 1. ki LASSO estimates are obtained by minimizing the sum of squared residuals subject to a penalty of L 1 norm of the coefficients: 0 12 nx OˇLASSO D minˇ0;ˇ1;:::;ˇk i ˇ0 ˇj x ji A subject to jˇj jt; (6) id1 where the tuning parameter t 0 controls the penalty. For all t, the solution for ˇ0 is Oˇ0 D y. Thus one may assume without loss of generality that y D 0 andthenomitˇ0. The ± tuning parameter t 0 in equation (6) controls how much penalty is applied on the set of coefficients. Let Oˇ0 j represent the set of OLS coefficients and t 0 D P j Oˇ0j j. The values t<t 0 will cause a shrinkage of the 1j k coefficients toward zero, and some coefficients may be exactly equal to zero. If t t 0, LASSO estimates will be the same as OLS estimates.

4 LASSO-Type Penalties in Time Series 595 Using the Lagrangian, equation (6) has an equivalent expression given by 8 0 ˆ< nx OˇLASSO D minˇ0;ˇ1;:::;ˇk i ˇ0 ˆ: id1 ˇj x ji 1 A 2 C 9 >= jˇj j >; ; (7) where the parameter 0 is a function of the parameter t. The larger, the greater are the penalty coefficients; when D 0, LASSO estimates are equal to OLS estimates. The value of the parameter is traditionally chosen via cross-validation in a cross-section framework. However, this strategy is more complicated in a time series set-up. Therefore, we will employ the Bayesian information criterion (BIC) to choose. Tibshirani (2013) shows the sufficient conditions for the uniqueness of the LASSO solution using Karush Kuhn Tucker. LASSO has less variability than the methods of subset selection. Moreover, it shrinks some coefficients and forces others to zero, keeping the good features of subset selection and ridge regression. Furthermore, LASSO performs variable selection and coefficient estimation simultaneously. LASSO s consistency in variable selection In order to use LASSO as a selection criterion, its sparse solution should represent the true model well. Hence one of the desirable properties of such a criterion is that it is consistent, i.e. it identifies the true model when n!1. To study the consistency of LASSO selection, Zhao and Yu (2006) consider two aspects: (i) whether there is a deterministic amount of regularization that provides consistency in selection; and (ii) whether for each sample there is a correct amount of regularization that selects the true model. Their results show that there is a condition that they name the irrepresentable condition, which is almost necessary and sufficient for both kinds of consistency. Their results hold for linear models with either fixed k or k growing with n. Let us consider the linear regression model: Y n D X nˇn C " n ; where " n D." 1 ;:::;" n / T is a vector of i.i.d. random variables with mean 0 and variance 2. Y n is an n 1 response variable and X n D X1 n;:::;xn k is the n k matrix of predictors, where X n i D.x i1 ;:::;x in / T,fori D 1;:::;k. ˇn is the k 1 coefficient vector. Unlike the traditional setting where k is fixed, the data and the model parameters are indexed by n to allow them to vary with n. LASSO estimates Oˇn./ T D Oˇn D Oˇn 1 ;:::; k Oˇn are defined by Oˇn./ D arg minˇ jjy n X nˇnjj 2 2 C jjˇnjj 1 ; where jj jj 2 2 denotes the standard L 2 norm of a vector, i.e. the sum of the squares of the vector components; and jj jj 1 denotes the L 1 norm of a vector, i.e. the sum of the absolute values of the vector components. There is consistency in model selection when ± P i W Oˇni 0 D ¹i W ˇni 0º! 1; as n!1; which is equivalent to signal consistency Zhao and Yu (2006). The following definitions are based on Zhao and Yu (2006). Definition 1. An estimate Oˇn is equal in sign to the true ˇn (which is given by Oˇn D sˇn) if and only if sign Oˇn D sign.ˇn/: where sign./ takes the value 1 for positive values and 1 for zero and negative values. Definition 2. LASSO is strongly sign consistent if there exists n D f.n/, i.e. a function of n which is independent of Y n and X n, such that Definition 3. LASSO is generally sign consistent if lim P Oˇn. n / D sˇn D 1: n!c1 lim P 9 0; Oˇn./ D sˇn D 1: n!c1

5 596 E. Konzen and F. A. Ziegelmann Strong sign consistency means that you can use a preselected to obtain a consistent model selection. General sign consistency means that for a random realization there is a correct amount of regularization that selects the true model. Zhao and Yu (2006) show that both types of consistency are almost equivalent up to a certain condition. We now introduce some notation to define this condition. T Without loss of generality, we assume that ˇn D ˇn1 ;:::;ˇnq ;ˇnqC1 ;:::;ˇn,whereˇn k j 0 for j D 1;:::;q ˇn1 T T and ˇnj D 0 for j D q C 1;:::;k. Suppose ˇn D ;:::;ˇq.1/ 1 e ˇn.2/ D ˇnqC1 k ;:::;ˇq. Then one can write X n.1/ and X n.2/ as the first q and the last k q columns of X n, respectively, and define C n D 1 n X n T X n. By setting C11 n D 1 n X n.1/ T X n.1/, C22 n D 1 n X n.2/ T X n.2/, C12 n D 1 n X n.1/ T X n.2/ and C21 n D 1 n X n.2/ T X n.1/, C n can be expressed in a block-wise form as follows: C n C n D 11 C12 n C21 n C 22 n : Assuming C11 n is invertible, the following irrepresentable conditions are defined. Strong irrepresentable condition. There exists a positive constant vector such that ˇ ˇC21 n C n 1 11 sign ˇn.1/ˇˇˇ 1 ; where 1 is a.k q/ 1 vector and the inequality holds element-wise. Weak irrepresentable condition. ˇ ˇC21 n C n 1 11 sign ˇn.1/ˇˇˇ < 1; where the inequality holds element-wise. Zou (2006) similarly concludes that there is a necessary condition for LASSO s consistency in model selection. adalasso Noting that there may be situations in which the LASSO is not consistent in variable selection, Zou (2006) proposes the adaptive LASSO (adalasso), which employs different weights to different coefficients: OˇadaLASSO D arg min ˇ0;ˇ1;:::;ˇk 0 i ˇ0 id1 ˇj x ji 1 A 2 C μ! j jˇj j ; (8) where! j D ˇ Oˇridge ˇ ˇ j ;>0. Individual weights! j help to select relevant variables. A relevant variable x j tends to have a large coefficient Oˇridge j, resulting in a small weight! j assigned to the coefficient of that variable; otherwise, if the variable x j is irrelevant, the ridge coefficient Oˇridge j tends to be small and will result in a great! j. Thus the adalasso imposes a greater penalty on coefficients of the variables that appear to be irrelevant. The weights! j can also be obtained from OLS estimates; however, this would be limited to the case where n>kc1. Following Zou s ± 2006 notations, A D ¹j W ˇj 0º is the true set of non-zero coefficients. In turn, A n D j W Oˇ.n/ j 0 is the set of non-zero coefficients estimated by equation (8), where n varies with the sample size n. Zou (2006) shows that with the use of adequate weights! j the adalasso has oracle properties. Theorem 1. (Zou, 2006) Suppose n = p n! 0 and n n. 1/=2!1. Thus adalasso satisfies: 1. Consistency in variable selection: lim n!1 P.A n D A/ D Asymptotic normality: p n Oˇ.n/ A ˇA! d N 0; 2 C Therefore, in addition to the correct selection of the relevant variables when the sample size increases, the adalasso has estimates of non-zero coefficients that asymptotically follow the same distribution as the OLS estimators when the OLS is estimated only with the relevant variables.

6 LASSO-Type Penalties in Time Series 597 WLadaLASSO In economic time series, vector autoregression (VAR) models became popular with the seminal work of Sims (1980). However, these models often suffer from over-parametrization, which can create multicollinearity and a loss of degrees of freedom, resulting in inefficient estimates and consequently large out-of-sample forecasting errors. For that reason, further developments have been characterized by imposing restrictions on the parameters via Bayesian priors, which are now well known as Minnesota priors. In particular, Litterman (1979) imposes restrictions on the coefficients across different lag lengths, assuming that the coefficients of longer lags may approach more closely to zero than the coefficients of shorter lags. Other strategies literally exclude lags with statistically insignificant coefficients (Dua and Ray, 1995; Dua et al., 1999). When one performs adalasso in a time series context, each lagged variable enters as a predictor candidate and its coefficient is only penalized according to the size of its Ridge (or OLS) estimate. One can then wonder whether a greater penalty for a more distant lagged variable improves the time series forecasting, since more recent information is usually more important. Park and Sakaori (2013) propose some alternative types of penalties for different lags. Borrowing from their ideas in a slight different version, we propose the adalasso with weighted lags, called here WLadaLASSO (weighted lag adaptive LASSO), which is given by where! j D j Oˇridge j OˇWLadaLASSO D arg min ˇ0;ˇ1;:::;ˇk 8 ˆ< ˆ: 0 i ˇ0 id1 ˇj x ji j e l, >0, 0 and l represents the lag order. 1 A 2 C 9 >=! j jˇj j >; ; (9) In what follows, besides looking at WLadaLASSO forecasting skills, we will also study its model selection and estimation capabilities, comparing the results to other regularization methods. SIMULATION In this section we analyze the WLadaLASSO performance, comparing it to other regularization methods in a Monte Carlo simulation study. All implementations are done with the free software R. The estimation of equations (7), (8) and (9) uses the function glmnet to optimize the parameters ˇj and through BIC. The parameter is set to 1. In WLadaLASSO, for each belonging to the set Œ0;0:5;1;:::;10, the optimal is the one that produces a fit with the smallest BIC value. The chosen value for is the one that produces the smallest BIC value among all obtained BIC values. Through Monte Carlo simulations with 1000 replications, we simulate 10 independent time series that follow AR(1) as x i;t D x i;t 1 C u i;t,whereu i;t N.0; 1/, i D 1;:::;10. The following data-generating process is considered: y t D 0:8y t 1 C 0:6x 1;t 1 C 0:3x 1;t 2 0:5x 2;t 1 0:2x 2;t 2 C 0:4x 3;t 1 C 0:3x 3;t 2 C 0:4x 4;t 1 0:3x 5;t 1 C 0:2x 6;t 1 C " t ; t D 1;2;:::;T; (10) where " t has two specifications: N.0; 1/ and t.5/. The LASSO, adalasso and WLadaLASSO methods are employed to estimate equation (10) with L lags of y t and L lags of x j;t ;j D 1;:::10, as candidates, resulting in 11 L candidate predictors. We analyze situations where L D 5, 10 and 20. In order to compare the coefficients estimates with their true values, we use the mean squared error (MSE) and the mean absolute error (MAE) of the estimates, which are respectively given by MSE D X 1000k id1 2 Oˇj ˇj (11) and MAE D X 1000k id1 ˇ ˇ Oˇj ˇj ˇ: (12) We remove the last 10 observations of the simulated series and then perform the one-step-ahead out-of-sample forecasts for the observations removed. For each t, the forecast of y t is based on the entire set of existing information at date t 1.

7 598 E. Konzen and F. A. Ziegelmann Table I. Descriptive statistics of model selection ( D 0:3) Errors N.0; 1/ Errors t.5/ 5 candidate lags 10 candidate lags 20 candidate lags 5 candidate lags 10 candidate lags 20 candidate lags T FVCI FVCI FVCI FVCI FVCI FVCI LASSO adalasso WLadaLASSO TMI TMI TMI TMI TMI TMI LASSO adalasso WLadaLASSO FRVI FRVI FRVI FRVI FRVI FRVI LASSO adalasso WLadaLASSO FIVE FIVE FIVE FIVE FIVE FIVE LASSO adalasso WLadaLASSO NIV NIV NIV NIV NIV NIV LASSO adalasso WLadaLASSO

8 LASSO-Type Penalties in Time Series 599 Table II. Descriptive statistics of model selection ( D 0:6) Errors N.0; 1/ Errors t.5/ 5 candidate lags 10 candidate lags 20 candidate lags 5 candidate lags 10 candidate lags 20 candidate lags T FVCI FVCI FVCI FVCI FVCI FVCI LASSO adalasso WLadaLASSO TMI TMI TMI TMI TMI TMI LASSO adalasso WLadaLASSO FRVI FRVI FRVI FRVI FRVI FRVI LASSO adalasso WLadaLASSO FIVE FIVE FIVE FIVE FIVE FIVE LASSO adalasso WLadaLASSO NIV NIV NIV NIV NIV NIV LASSO adalasso WLadaLASSO

9 600 E. Konzen and F. A. Ziegelmann Table III. Descriptive statistics of model selection ( D 0:9) Errors N.0; 1/ Errors t.5/ 5 candidate lags 10 candidate lags 20 candidate lags 5 candidate lags 10 candidate lags 20 candidate lags T FVCI FVCI FVCI FVCI FVCI FVCI LASSO adalasso WLadaLASSO TMI TMI TMI TMI TMI TMI LASSO adalasso WLadaLASSO FRVI FRVI FRVI FRVI FRVI FRVI LASSO adalasso WLadaLASSO FIVE FIVE FIVE FIVE FIVE FIVE LASSO adalasso WLadaLASSO NIV NIV NIV NIV NIV NIV LASSO adalasso WLadaLASSO

10 LASSO-Type Penalties in Time Series 601 Tables I III, inspired by the working paper of Medeiros and Mendes (2012), show various statistics related to variable selection, divided into panels, for different sample sizes, for different numbers of candidate lags and for two different distributions of the errors u i;t and " t. Each of these tables reports the results for different levels of linear dependence between covariates ( D 0:3, 0.6 and 0.9). The first panel shows the average fraction (across the replications) of variables correctly identified (FVCI): relevant variables identified as relevant and irrelevant ones identified as irrelevant; the second panel shows the fraction of replications where the true model is included (TMI), i.e. all relevant covariates are included; the third shows the average fraction of relevant variables included (FRVI); the fourth shows the average fraction of irrelevant variables excluded (FIVE); and the last shows the average number of included variables (NIV). In what follows, we will discuss only results for cases in which the errors have an N.0; 1/ distribution. These results are quite similar to those obtained when the errors follow a t.5/ distribution. The main differences in model selection performance among the three types of penalties can be viewed in the FRVI panel in Tables I III. For the largest sample size (T D 2000), all methods have similar performance with the exception of the case D 0:9, where WLadaLASSO is better than LASSO and slightly worse than adalasso. For the smallest sample size (T D 50), we can notice a superiority of WLadaLASSO in FRVI, which becomes more evident as we have a higher number of candidate lags and a stronger linear dependence between covariates. Finally, the case of moderate sample size (T D 500) shows us intermediate results between the other sample sizes. Tables IV VI show the error measures of the set of estimated parameters, calculated via equations (11) and (12). For T D 2000, WLadaLASSO maintains at least the same quality of parameter estimates obtained by adalasso and always has greater success than LASSO. Our proposed penalty also shows slightly better results where T D 500. The drawback of WLadaLASSO appears for T D 50, when it is outperformed by LASSO. In our simulations we also observe the MSE and MAE of the difference between the OLS and WLadaLASSO estimates when the former is estimated using only the relevant variables. Both error measurements tend to zero when the sample size increases, verifying one of the oracle properties (Fan and Li, 2001). The results of the one-step-ahead forecasts are reported in Tables VII IX. The adalasso and WLadaLASSO produce predictions with similar performance for T D 2000, with both overcoming LASSO. However, WLadaLASSO particularly stands out when there are 10 or 20 candidate lags and D 0:9. For the smallest sample size, we can observe that WLadaLASSO becomes strikingly superior the greater the number of candidate lags and the stronger the linear dependence between covariates. The disadvantage of our proposed penalty occurs in the specific situation with T D 50 and only 5 candidate lags. Figures 1 3 illustrate the distribution of the obtained prediction errors for cases in which the errors have N.0; 1/ distribution. EMPIRICAL ANALYSIS Risk premium forecasting Stock returns forecasting is of great interest to academics and financial market investors, and many economic variables have been proposed as potential predictors. Goyal and Welch (2008) show that an extensive list of potential predictors used in the literature to provide forecasts are unstable in obtaining out-of-sample forecasts when they are compared to the simple model based on the historical average return. In contrast, Campbell and Thompson (2008) show that many regressions can provide better predictions than the historical average predictions when restrictions are imposed on the signs of the coefficients. Although this superiority to the historical average is generally small and statistically insignificant, it can be economically significant. Among the articles that corroborate the results of Campbell and Thompson (2008), we can cite Rapach et al. (2010), Ferreira and Santa-Clara (2011) and Hillebrand et al. (2012). Other recent works analyze whether financial time series may be predicted by a list of covariates; see, for example, Issler et al. (2014), Lee et al. (2015) and Hsiao and Wan (2014). Goyal and Welch (2008) estimate linear regressions via OLS where the risk premium at time t is explained by a predictor variable at time t 1. Furthermore, they estimate linear regressions including all candidate predictors. Here we apply methods of L 1 norm penalty on the regression coefficients with multiple predictors as candidates, delegating to these methods the choice of predictors for the risk premium, which is the total rate of return on the stock market minus the prevailing short-term interest rate. We use 14 variables from Goyal and Welch (2008): dividend price ratio (log); dividend yield (log); earnings-price ratio (log); dividend payout ratio (log); stock variance; book-to-market ratio; net equity expansion; treasury bill rate; long-term yield; long-term return; term spread; default yield spread; default return spread; and inflation. All annual series used as predictors begin in 1927 and end in We work with differentiated and centered series, estimating the models without intercept. For penalty methods, the number of lags is optimized through BIC, where the maximal lag order is five.

11 602 E. Konzen and F. A. Ziegelmann Table IV. Descriptive statistics of parameter estimates ( D 0:3) Errors N.0; 1/ Errors t.5/ 5 candidate lags 10 candidate lags 20 candidate lags 5 candidate lags 10 candidate lags 20 candidate lags T MSE MSE MSE MSE MSE MSE LASSO adalasso WLadaLASSO MAE MAE MAE MAE MAE MAE LASSO adalasso WLadaLASSO

12 LASSO-Type Penalties in Time Series 603 Table V. Descriptive statistics of parameter estimates ( D 0:6) Errors N.0; 1/ Errors t.5/ 5 candidate lags 10 candidate lags 20 candidate lags 5 candidate lags 10 candidate lags 20 candidate lags T MSE MSE MSE MSE MSE MSE LASSO adalasso WLadaLASSO MAE MAE MAE MAE MAE MAE LASSO adalasso WLadaLASSO

13 604 E. Konzen and F. A. Ziegelmann Table VI. Descriptive statistics of parameter estimates ( D 0:9) Errors N.0; 1/ Errors t.5/ 5 candidate lags 10 candidate lags 20 candidate lags 5 candidate lags 10 candidate lags 20 candidate lags T MSE MSE MSE MSE MSE MSE LASSO adalasso WLadaLASSO MAE MAE MAE MAE MAE MAE LASSO adalasso WLadaLASSO

14 LASSO-Type Penalties in Time Series 605 Table VII. Descriptive statistics of forecasts ( D 0:3) Errors N.0; 1/ Errors t.5/ 5 candidate lags 10 candidate lags 20 candidate lags 5 candidate lags 10 candidate lags 20 candidate lags T Mean of MSEs Mean of MSEs Mean of MSEs Mean of MSEs Mean of MSEs Mean of MSEs LASSO adalasso WLadaLASSO Median of MSEs Median of MSEs Median of MSEs Median of MSEs Median of MSEs Median of MSEs LASSO adalasso WLadaLASSO Mean of MAEs Mean of MAEs Mean of MAEs Mean of MAEs Mean of MAEs Mean of MAEs LASSO adalasso WLadaLASSO Median of MAEs Median of MAEs Median of MAEs Median of MAEs Median of MAEs Median of MAEs LASSO adalasso WLadaLASSO

15 606 E. Konzen and F. A. Ziegelmann Table VIII. Descriptive statistics of forecasts ( D 0:6) Errors N.0; 1/ Errors t.5/ 5 candidate lags 10 candidate lags 20 candidate lags 5 candidate lags 10 candidate lags 20 candidate lags T Mean of MSEs Mean of MSEs Mean of MSEs Mean of MSEs Mean of MSEs Mean of MSEs LASSO adalasso WLadaLASSO Median of MSEs Median of MSEs Median of MSEs Median of MSEs Median of MSEs Median of MSEs LASSO adalasso WLadaLASSO Mean of MAEs Mean of MAEs Mean of MAEs Mean of MAEs Mean of MAEs Mean of MAEs LASSO adalasso WLadaLASSO Median of MAEs Median of MAEs Median of MAEs Median of MAEs Median of MAEs Median of MAEs LASSO adalasso WLadaLASSO

16 LASSO-Type Penalties in Time Series 607 Table IX. Descriptive statistics of forecasts ( D 0:9) Errors N.0; 1/ Errors t.5/ 5 candidate lags 10 candidate lags 20 candidate lags 5 candidate lags 10 candidate lags 20 candidate lags T Mean of MSEs Mean of MSEs Mean of MSEs Mean of MSEs Mean of MSEs Mean of MSEs LASSO adalasso WLadaLASSO Median of MSEs Median of MSEs Median of MSEs Median of MSEs Median of MSEs Median of MSEs LASSO adalasso WLadaLASSO Mean of MAEs Mean of MAEs Mean of MAEs Mean of MAEs Mean of MAEs Mean of MAEs LASSO adalasso WLadaLASSO Median of MAEs Median of MAEs Median of MAEs Median of MAEs Median of MAEs Median of MAEs LASSO adalasso WLadaLASSO

17 608 E. Konzen and F. A. Ziegelmann 5 candidate lags, T = 50 5 candidate lags, T = candidate lags, T = candidate lags, T = candidate lags, T = candidate lags, T = candidate lags, T = candidate lags, T = candidate lags, T = Figure 1. Distribution of forecast errors ( D 0:3) 5 candidate lags, T = 50 5 candidate lags, T = candidate lags, T = candidate lags, T = candidate lags, T = candidate lags, T = candidate lags, T = candidate lags, T = candidate lags, T = Figure 2. Distribution of forecast errors ( D 0:6)

18 LASSO-Type Penalties in Time Series candidate lags, T = 50 5 candidate lags, T = candidate lags, T = candidate lags, T= candidate lags, T = candidate lags, T = candidate lags, T =50 20 candidate lags, T = candidate lags, T = LASSO adalasso WLadaLASSO Figure 3. Distribution of forecast errors ( D 0:9) In order to compare the one-step-ahead predictions of the h out-of-sample observations, we use the R 2 OS (out-ofsample R 2 ) statistics, as suggested in Campbell and Thompson (2008), which is given by P T Ch ROS 2 D 1 tdt C1.r t Or t / 2 P T Ch tdt C1.r t r t / ; 2 where Or t ;td 1;:::;T are the predicted values for T out-of-sample observations, and r t is the historical mean until t 1. p-values of the modified Diebold Mariano (MDM) test (Harvey et al., 1997) are computed in order to compare the predictions via historical mean to the other methods. The Diebold Mariano one-tailed test has the null hypothesis of no difference in the accuracy of two competing forecasts. When the null hypothesis is rejected, we conclude that the method has better predictive ability than the historical mean. Table X shows the superiority of predictions made by penalization methods compared to the historical mean in three different periods, which begin in 1955, 1970 and We employ those methods in two ways: by using only one lag (l D 1) of all individual predictors (as in Goyal and Welch, 2008) and by using the optimal number of lags chosen via BIC. In both cases the penalization methods achieve better predictions than the historical mean, WLadaLASSO having the best results. Inflation forecasting Chan (2013) carries out a brief literature review of inflation forecasting and points out that both persistence and volatility in the inflation process change considerably over time. He lists some studies which find that models with stochastic volatility provide substantially better forecasts than those obtained from constant error variance models (e.g. Clark and Doh, 2011; Chan et al., 2012). Then he introduces a new class of models that has both stochastic volatility and moving average errors and illustrates an empirical application of forecasting US quarterly CPI inflation, in which his approach provides better in-sample fit and out-of-sample forecast performance than the standard variants with only stochastic volatility. The unobserved components model with MA(1) stochastic volatility (UC-MA) has the best forecast performance in comparison to competing models, essentially at short forecast horizons. We use US inflation observed at a quarterly frequency ranging from 1960:Q1 to 2011:Q2, which is the quarterly log change in the personal consumption expenditures (PCE) deflator. The potential predictors are the same used by

19 610 E. Konzen and F. A. Ziegelmann Table X. Equity premium forecasting results Forecasts begin in 1955 Forecasts begin in 1970 Forecasts begin in 1985 R 2 OS MDM p-value R 2 OS MDM p-value R 2 OS MDM p-value Individual predictors Dividend price ratio Dividend yield Earning price ratio Dividend payout ratio Stock variance Book to market Net equity expansion T-bill rate Long-term yield Long-term return Term spread Default yield spread Default return spread Inflation All regressors Penalization methods (l D 1) LASSO adalasso Penalization methods (free l) LASSO adalasso WLadaLASSO Table XI. Inflation forecasting results Penultimate period Last period Model MSE MDM p-value MAE MDM p-value MSE MDM p-value MAE MDM p-value UC-MA LASSO adalasso WLadaLASSO Groen et al. (2013). They correspond to lagged values of inflation, a host of real activity data, term structure data, nominal data and surveys, resulting in 16 covariates. We work with differentiated and centered series, estimating the models without intercept. For penalty methods, the number of covariate lags is optimized observing BIC statistics, considering that the maximal lag order is 20. p- values of the MDM test (respectively for each MSE and MAE measurement) are computed in order to compare the one-step-ahead predictions of some models with the UC-MA model. We analyze two periods of 20 observations each to evaluate predictions. Table XI shows that both adalasso and WLadaLASSO provide better forecasting performance than UC-MA and LASSO for the two sample periods. CONCLUSION This study aims to investigate how the penalty on the coefficients set can contribute to the performance of time series forecasting. The methods that penalize the coefficients are extremely important in reducing the dimensionality of some economic time series, where the number of time series is high and there are few observations. LASSO and adalasso methods have arisen in the context of linear regression but are increasingly present in time series analysis. Observing that some more recent information tends to contribute more to time series forecasting, and inspired by Park and Sakaori (2013), this article proposes the WLadaLASSO, which penalizes each lagged variable differently. The simulation study shows that, essentially for a stronger linear dependence between predictors and a higher number of candidate lags, WLadaLASSO outperforms other penalization methods in many perspectives: covariate

LASSO-type penalties for covariate selection and forecasting in time series

LASSO-type penalties for covariate selection and forecasting in time series LASSO-type penalties for covariate selection and forecasting in time series Evandro Konzen 1 Flavio A. Ziegelmann 2 Abstract This paper studies some forms of LASSO-type penalties in time series to reduce

More information

ESTIMATING HIGH-DIMENSIONAL TIME SERIES MODELS

ESTIMATING HIGH-DIMENSIONAL TIME SERIES MODELS ESTIMATING HIGH-DIMENSIONAL TIME SERIES MODELS Marcelo C. Medeiros Department of Economics Pontifical Catholic University of Rio de Janeiro Rua Marquês de São Vicente 225, Gávea Rio de Janeiro, 22451-9,

More information

Complete Subset Regressions

Complete Subset Regressions Complete Subset Regressions Graham Elliott UC San Diego Antonio Gargano Bocconi University, visiting UCSD November 7, 22 Allan Timmermann UC San Diego Abstract This paper proposes a new method for combining

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1 Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr

More information

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics Summary Algebraic and Statistical Preliminaries Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Sílvia Gonçalves and Benoit Perron Département de sciences économiques,

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication G. S. Maddala Kajal Lahiri WILEY A John Wiley and Sons, Ltd., Publication TEMT Foreword Preface to the Fourth Edition xvii xix Part I Introduction and the Linear Regression Model 1 CHAPTER 1 What is Econometrics?

More information

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Econ 423 Lecture Notes: Additional Topics in Time Series 1 Econ 423 Lecture Notes: Additional Topics in Time Series 1 John C. Chao April 25, 2017 1 These notes are based in large part on Chapter 16 of Stock and Watson (2011). They are for instructional purposes

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Iterative Selection Using Orthogonal Regression Techniques

Iterative Selection Using Orthogonal Regression Techniques Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department

More information

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly Robust Variable Selection Methods for Grouped Data by Kristin Lee Seamon Lilly A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree

More information

Keywords: sparse models, shrinkage, LASSO, adalasso, time series, forecasting, GARCH.

Keywords: sparse models, shrinkage, LASSO, adalasso, time series, forecasting, GARCH. l 1 -REGULARIZAION OF HIGH-DIMENSIONAL IME-SERIES MODELS WIH FLEXIBLE INNOVAIONS Marcelo C. Medeiros Department of Economics Pontifical Catholic University of Rio de Janeiro Rua Marquês de São Vicente

More information

Linear model selection and regularization

Linear model selection and regularization Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It

More information

A Non-Parametric Approach of Heteroskedasticity Robust Estimation of Vector-Autoregressive (VAR) Models

A Non-Parametric Approach of Heteroskedasticity Robust Estimation of Vector-Autoregressive (VAR) Models Journal of Finance and Investment Analysis, vol.1, no.1, 2012, 55-67 ISSN: 2241-0988 (print version), 2241-0996 (online) International Scientific Press, 2012 A Non-Parametric Approach of Heteroskedasticity

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

Volatility. Gerald P. Dwyer. February Clemson University

Volatility. Gerald P. Dwyer. February Clemson University Volatility Gerald P. Dwyer Clemson University February 2016 Outline 1 Volatility Characteristics of Time Series Heteroskedasticity Simpler Estimation Strategies Exponentially Weighted Moving Average Use

More information

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I.

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I. Vector Autoregressive Model Vector Autoregressions II Empirical Macroeconomics - Lect 2 Dr. Ana Beatriz Galvao Queen Mary University of London January 2012 A VAR(p) model of the m 1 vector of time series

More information

Penalization method for sparse time series model

Penalization method for sparse time series model Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS020) p.6280 Penalization method for sparse time series model Heewon, Park Chuo University, Department of Mathemetics

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

10. Time series regression and forecasting

10. Time series regression and forecasting 10. Time series regression and forecasting Key feature of this section: Analysis of data on a single entity observed at multiple points in time (time series data) Typical research questions: What is the

More information

Variable Selection in Predictive Regressions

Variable Selection in Predictive Regressions Variable Selection in Predictive Regressions Alessandro Stringhi Advanced Financial Econometrics III Winter/Spring 2018 Overview This chapter considers linear models for explaining a scalar variable when

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Ridge Regression and Ill-Conditioning

Ridge Regression and Ill-Conditioning Journal of Modern Applied Statistical Methods Volume 3 Issue Article 8-04 Ridge Regression and Ill-Conditioning Ghadban Khalaf King Khalid University, Saudi Arabia, albadran50@yahoo.com Mohamed Iguernane

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates

More information

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013) A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne

More information

Multicollinearity and A Ridge Parameter Estimation Approach

Multicollinearity and A Ridge Parameter Estimation Approach Journal of Modern Applied Statistical Methods Volume 15 Issue Article 5 11-1-016 Multicollinearity and A Ridge Parameter Estimation Approach Ghadban Khalaf King Khalid University, albadran50@yahoo.com

More information

Outlier detection and variable selection via difference based regression model and penalized regression

Outlier detection and variable selection via difference based regression model and penalized regression Journal of the Korean Data & Information Science Society 2018, 29(3), 815 825 http://dx.doi.org/10.7465/jkdi.2018.29.3.815 한국데이터정보과학회지 Outlier detection and variable selection via difference based regression

More information

Introduction to Econometrics

Introduction to Econometrics Introduction to Econometrics STAT-S-301 Introduction to Time Series Regression and Forecasting (2016/2017) Lecturer: Yves Dominicy Teaching Assistant: Elise Petit 1 Introduction to Time Series Regression

More information

Out-of-Sample Return Predictability: a Quantile Combination Approach

Out-of-Sample Return Predictability: a Quantile Combination Approach Out-of-Sample Return Predictability: a Quantile Combination Approach Luiz Renato Lima a and Fanning Meng a August 8, 2016 Abstract This paper develops a novel forecasting method that minimizes the effects

More information

Mixed frequency models with MA components

Mixed frequency models with MA components Mixed frequency models with MA components Claudia Foroni a Massimiliano Marcellino b Dalibor Stevanović c a Deutsche Bundesbank b Bocconi University, IGIER and CEPR c Université du Québec à Montréal September

More information

Time-varying sparsity in dynamic regression models

Time-varying sparsity in dynamic regression models Time-varying sparsity in dynamic regression models Professor Jim Griffin (joint work with Maria Kalli, Canterbury Christ Church University) University of Kent Regression models Often we are interested

More information

Prediction & Feature Selection in GLM

Prediction & Feature Selection in GLM Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis

More information

Forecasting. Bernt Arne Ødegaard. 16 August 2018

Forecasting. Bernt Arne Ødegaard. 16 August 2018 Forecasting Bernt Arne Ødegaard 6 August 208 Contents Forecasting. Choice of forecasting model - theory................2 Choice of forecasting model - common practice......... 2.3 In sample testing of

More information

An Improved 1-norm SVM for Simultaneous Classification and Variable Selection

An Improved 1-norm SVM for Simultaneous Classification and Variable Selection An Improved 1-norm SVM for Simultaneous Classification and Variable Selection Hui Zou School of Statistics University of Minnesota Minneapolis, MN 55455 hzou@stat.umn.edu Abstract We propose a novel extension

More information

Time Series Methods. Sanjaya Desilva

Time Series Methods. Sanjaya Desilva Time Series Methods Sanjaya Desilva 1 Dynamic Models In estimating time series models, sometimes we need to explicitly model the temporal relationships between variables, i.e. does X affect Y in the same

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Using all observations when forecasting under structural breaks

Using all observations when forecasting under structural breaks Using all observations when forecasting under structural breaks Stanislav Anatolyev New Economic School Victor Kitov Moscow State University December 2007 Abstract We extend the idea of the trade-off window

More information

On High-Dimensional Cross-Validation

On High-Dimensional Cross-Validation On High-Dimensional Cross-Validation BY WEI-CHENG HSIAO Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan hsiaowc@stat.sinica.edu.tw 5 WEI-YING

More information

Shrinkage Methods: Ridge and Lasso

Shrinkage Methods: Ridge and Lasso Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and

More information

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines) Dr. Maddah ENMG 617 EM Statistics 11/28/12 Multiple Regression (3) (Chapter 15, Hines) Problems in multiple regression: Multicollinearity This arises when the independent variables x 1, x 2,, x k, are

More information

The Risk of James Stein and Lasso Shrinkage

The Risk of James Stein and Lasso Shrinkage Econometric Reviews ISSN: 0747-4938 (Print) 1532-4168 (Online) Journal homepage: http://tandfonline.com/loi/lecr20 The Risk of James Stein and Lasso Shrinkage Bruce E. Hansen To cite this article: Bruce

More information

A Guide to Modern Econometric:

A Guide to Modern Econometric: A Guide to Modern Econometric: 4th edition Marno Verbeek Rotterdam School of Management, Erasmus University, Rotterdam B 379887 )WILEY A John Wiley & Sons, Ltd., Publication Contents Preface xiii 1 Introduction

More information

Penalized Estimation of Panel VARs: A Lasso Approach. Annika Schnücker DIW Berlin Graduate Center and Freie Universität Berlin Draft - February 2017

Penalized Estimation of Panel VARs: A Lasso Approach. Annika Schnücker DIW Berlin Graduate Center and Freie Universität Berlin Draft - February 2017 Penalized Estimation of Panel VARs: A Lasso Approach Annika Schnücker DIW Berlin Graduate Center and Freie Universität Berlin Draft - February 2017 Abstract Panel vector autoregressive (PVAR) models account

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

Lecture 6: Univariate Volatility Modelling: ARCH and GARCH Models

Lecture 6: Univariate Volatility Modelling: ARCH and GARCH Models Lecture 6: Univariate Volatility Modelling: ARCH and GARCH Models Prof. Massimo Guidolin 019 Financial Econometrics Winter/Spring 018 Overview ARCH models and their limitations Generalized ARCH models

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

University of Pretoria Department of Economics Working Paper Series

University of Pretoria Department of Economics Working Paper Series University of Pretoria Department of Economics Working Paper Series Predicting Stock Returns and Volatility Using Consumption-Aggregate Wealth Ratios: A Nonlinear Approach Stelios Bekiros IPAG Business

More information

The Illusion of Independence: High Dimensional Data, Shrinkage Methods and Model Selection

The Illusion of Independence: High Dimensional Data, Shrinkage Methods and Model Selection The Illusion of Independence: High Dimensional Data, Shrinkage Methods and Model Selection Daniel Coutinho Pedro Souza (Orientador) Marcelo Medeiros (Co-orientador) November 30, 2017 Daniel Martins Coutinho

More information

Lecture 5: Unit Roots, Cointegration and Error Correction Models The Spurious Regression Problem

Lecture 5: Unit Roots, Cointegration and Error Correction Models The Spurious Regression Problem Lecture 5: Unit Roots, Cointegration and Error Correction Models The Spurious Regression Problem Prof. Massimo Guidolin 20192 Financial Econometrics Winter/Spring 2018 Overview Stochastic vs. deterministic

More information

Oracle Properties and Finite Sample Inference of the Adaptive Lasso for Time Series Regression Models

Oracle Properties and Finite Sample Inference of the Adaptive Lasso for Time Series Regression Models Oracle Properties and Finite Sample Inference of the Adaptive Lasso for Time Series Regression Models Francesco Audrino University of St.Gallen Lorenzo Camponovo University of St.Gallen December 2013 Abstract

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

Comparisons of penalized least squares. methods by simulations

Comparisons of penalized least squares. methods by simulations Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy

More information

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY Time Series Analysis James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY & Contents PREFACE xiii 1 1.1. 1.2. Difference Equations First-Order Difference Equations 1 /?th-order Difference

More information

An estimate of the long-run covariance matrix, Ω, is necessary to calculate asymptotic

An estimate of the long-run covariance matrix, Ω, is necessary to calculate asymptotic Chapter 6 ESTIMATION OF THE LONG-RUN COVARIANCE MATRIX An estimate of the long-run covariance matrix, Ω, is necessary to calculate asymptotic standard errors for the OLS and linear IV estimators presented

More information

Inflation Revisited: New Evidence from Modified Unit Root Tests

Inflation Revisited: New Evidence from Modified Unit Root Tests 1 Inflation Revisited: New Evidence from Modified Unit Root Tests Walter Enders and Yu Liu * University of Alabama in Tuscaloosa and University of Texas at El Paso Abstract: We propose a simple modification

More information

Predicting bond returns using the output gap in expansions and recessions

Predicting bond returns using the output gap in expansions and recessions Erasmus university Rotterdam Erasmus school of economics Bachelor Thesis Quantitative finance Predicting bond returns using the output gap in expansions and recessions Author: Martijn Eertman Studentnumber:

More information

Warwick Business School Forecasting System. Summary. Ana Galvao, Anthony Garratt and James Mitchell November, 2014

Warwick Business School Forecasting System. Summary. Ana Galvao, Anthony Garratt and James Mitchell November, 2014 Warwick Business School Forecasting System Summary Ana Galvao, Anthony Garratt and James Mitchell November, 21 The main objective of the Warwick Business School Forecasting System is to provide competitive

More information

Introduction to Eco n o m et rics

Introduction to Eco n o m et rics 2008 AGI-Information Management Consultants May be used for personal purporses only or by libraries associated to dandelon.com network. Introduction to Eco n o m et rics Third Edition G.S. Maddala Formerly

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

The lasso, persistence, and cross-validation

The lasso, persistence, and cross-validation The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University

More information

Introduction to Econometrics

Introduction to Econometrics Introduction to Econometrics T H I R D E D I T I O N Global Edition James H. Stock Harvard University Mark W. Watson Princeton University Boston Columbus Indianapolis New York San Francisco Upper Saddle

More information

Optimizing forecasts for inflation and interest rates by time-series model averaging

Optimizing forecasts for inflation and interest rates by time-series model averaging Optimizing forecasts for inflation and interest rates by time-series model averaging Presented at the ISF 2008, Nice 1 Introduction 2 The rival prediction models 3 Prediction horse race 4 Parametric bootstrap

More information

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010 Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,

More information

The Prediction of Monthly Inflation Rate in Romania 1

The Prediction of Monthly Inflation Rate in Romania 1 Economic Insights Trends and Challenges Vol.III (LXVI) No. 2/2014 75-84 The Prediction of Monthly Inflation Rate in Romania 1 Mihaela Simionescu Institute for Economic Forecasting of the Romanian Academy,

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

A radial basis function artificial neural network test for ARCH

A radial basis function artificial neural network test for ARCH Economics Letters 69 (000) 5 3 www.elsevier.com/ locate/ econbase A radial basis function artificial neural network test for ARCH * Andrew P. Blake, George Kapetanios National Institute of Economic and

More information

This is the author s final accepted version.

This is the author s final accepted version. Bagdatoglou, G., Kontonikas, A., and Wohar, M. E. (2015) Forecasting US inflation using dynamic general-to-specific model selection. Bulletin of Economic Research, 68(2), pp. 151-167. (doi:10.1111/boer.12041)

More information

Dynamic Regression Models (Lect 15)

Dynamic Regression Models (Lect 15) Dynamic Regression Models (Lect 15) Ragnar Nymoen University of Oslo 21 March 2013 1 / 17 HGL: Ch 9; BN: Kap 10 The HGL Ch 9 is a long chapter, and the testing for autocorrelation part we have already

More information

Monitoring Forecasting Performance

Monitoring Forecasting Performance Monitoring Forecasting Performance Identifying when and why return prediction models work Allan Timmermann and Yinchu Zhu University of California, San Diego June 21, 2015 Outline Testing for time-varying

More information

The Size and Power of Four Tests for Detecting Autoregressive Conditional Heteroskedasticity in the Presence of Serial Correlation

The Size and Power of Four Tests for Detecting Autoregressive Conditional Heteroskedasticity in the Presence of Serial Correlation The Size and Power of Four s for Detecting Conditional Heteroskedasticity in the Presence of Serial Correlation A. Stan Hurn Department of Economics Unversity of Melbourne Australia and A. David McDonald

More information

Variable Targeting and Reduction in High-Dimensional Vector Autoregressions

Variable Targeting and Reduction in High-Dimensional Vector Autoregressions Variable Targeting and Reduction in High-Dimensional Vector Autoregressions Tucker McElroy (U.S. Census Bureau) Frontiers in Forecasting February 21-23, 2018 1 / 22 Disclaimer This presentation is released

More information

Penalized Estimation of Panel Vector Autoregressive Models: A Lasso Approach

Penalized Estimation of Panel Vector Autoregressive Models: A Lasso Approach Penalized Estimation of Panel Vector Autoregressive Models: A Lasso Approach Annika Schnücker Freie Universität Berlin and DIW Berlin Graduate Center, Mohrenstr. 58, 10117 Berlin, Germany October 11, 2017

More information

Introduction to Statistical modeling: handout for Math 489/583

Introduction to Statistical modeling: handout for Math 489/583 Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error

More information

FinQuiz Notes

FinQuiz Notes Reading 9 A time series is any series of data that varies over time e.g. the quarterly sales for a company during the past five years or daily returns of a security. When assumptions of the regression

More information

Christopher Dougherty London School of Economics and Political Science

Christopher Dougherty London School of Economics and Political Science Introduction to Econometrics FIFTH EDITION Christopher Dougherty London School of Economics and Political Science OXFORD UNIVERSITY PRESS Contents INTRODU CTION 1 Why study econometrics? 1 Aim of this

More information

Day 4: Shrinkage Estimators

Day 4: Shrinkage Estimators Day 4: Shrinkage Estimators Kenneth Benoit Data Mining and Statistical Learning March 9, 2015 n versus p (aka k) Classical regression framework: n > p. Without this inequality, the OLS coefficients have

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

The Generalized Cochrane-Orcutt Transformation Estimation For Spurious and Fractional Spurious Regressions

The Generalized Cochrane-Orcutt Transformation Estimation For Spurious and Fractional Spurious Regressions The Generalized Cochrane-Orcutt Transformation Estimation For Spurious and Fractional Spurious Regressions Shin-Huei Wang and Cheng Hsiao Jan 31, 2010 Abstract This paper proposes a highly consistent estimation,

More information

Input Selection for Long-Term Prediction of Time Series

Input Selection for Long-Term Prediction of Time Series Input Selection for Long-Term Prediction of Time Series Jarkko Tikka, Jaakko Hollmén, and Amaury Lendasse Helsinki University of Technology, Laboratory of Computer and Information Science, P.O. Box 54,

More information

Comparison of Some Improved Estimators for Linear Regression Model under Different Conditions

Comparison of Some Improved Estimators for Linear Regression Model under Different Conditions Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 3-24-2015 Comparison of Some Improved Estimators for Linear Regression Model under

More information

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima Applied Statistics Lecturer: Serena Arima Hypothesis testing for the linear model Under the Gauss-Markov assumptions and the normality of the error terms, we saw that β N(β, σ 2 (X X ) 1 ) and hence s

More information

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE)

More information

Department of Economics, UCSB UC Santa Barbara

Department of Economics, UCSB UC Santa Barbara Department of Economics, UCSB UC Santa Barbara Title: Past trend versus future expectation: test of exchange rate volatility Author: Sengupta, Jati K., University of California, Santa Barbara Sfeir, Raymond,

More information

Forecasting the unemployment rate when the forecast loss function is asymmetric. Jing Tian

Forecasting the unemployment rate when the forecast loss function is asymmetric. Jing Tian Forecasting the unemployment rate when the forecast loss function is asymmetric Jing Tian This version: 27 May 2009 Abstract This paper studies forecasts when the forecast loss function is asymmetric,

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

A TIME SERIES PARADOX: UNIT ROOT TESTS PERFORM POORLY WHEN DATA ARE COINTEGRATED

A TIME SERIES PARADOX: UNIT ROOT TESTS PERFORM POORLY WHEN DATA ARE COINTEGRATED A TIME SERIES PARADOX: UNIT ROOT TESTS PERFORM POORLY WHEN DATA ARE COINTEGRATED by W. Robert Reed Department of Economics and Finance University of Canterbury, New Zealand Email: bob.reed@canterbury.ac.nz

More information

Functional Coefficient Models for Nonstationary Time Series Data

Functional Coefficient Models for Nonstationary Time Series Data Functional Coefficient Models for Nonstationary Time Series Data Zongwu Cai Department of Mathematics & Statistics and Department of Economics, University of North Carolina at Charlotte, USA Wang Yanan

More information