Accepted author version posted online: 12 Aug 2015.

Size: px
Start display at page:

Download "Accepted author version posted online: 12 Aug 2015."

Transcription

1 This article was downloaded by: [Institute of Software] On: 13 August 2015, At: 12:15 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: Registered office: 5 Howick Place, London, SW1P 1WG Journal of Business & Economic Statistics Publication details, including instructions for authors and subscription information: Threshold Estimation via Group Orthogonal Greedy Algorithm Ngai Hang Chan ab, Ching-Kang Ing cd, Yuanbo Li b & Chun Yip Yau b a Southwestern University of Finance and Economics b Chinese University of Hong Kong c Academia Sinica d National Taiwan University Accepted author version posted online: 12 Aug Click for updates To cite this article: Ngai Hang Chan, Ching-Kang Ing, Yuanbo Li & Chun Yip Yau (2015): Threshold Estimation via Group Orthogonal Greedy Algorithm, Journal of Business & Economic Statistics, DOI: / To link to this article: Disclaimer: This is a version of an unedited manuscript that has been accepted for publication. As a service to authors and researchers we are providing this version of the accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proof will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to this version also. PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the Content ) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at

2 Threshold Estimation via Group Orthogonal Greedy Algorithm Ngai Hang Chan 1,2, Ching-Kang Ing 3,4, Yuanbo Li 2 and Chun Yip Yau 2 Southwestern University of Finance and Economics 1, Chinese University of Hong Kong 2, Academia Sinica 3 and National Taiwan University 4 Abstract Threshold autoregressive (TAR) model is an important class of nonlinear time series models that possess many desirable features such as asymmetric limit cycles and amplitude dependent frequencies. Statistical inference for TAR model encounters a major difficulty in the estimation of thresholds, however. This paper develops an efficient procedure to estimate the thresholds. The procedure first transforms multiple-threshold detection to a regression variable selection problem, and then employs a group orthogonal greedy algorithm to obtain the threshold estimates. Desirable theoretical results are derived to lend support to the proposed methodology. Simulation experiments are conducted to illustrate the empirical performances of the method. Applications to U.S. GNP data are investigated. Keywords: high-dimensional regression, information criteria, multiple-regime, multiple-threshold, nonlinear time series. 1

3 1 Introduction The (m + 1)-regime threshold autoregressive (TAR) model, defined by p Y t = φ ( o j) + φ ( j) i Y t i + σ j η t, for r j 1 < Y t d r j, (1.1) i=1 where j = 1,..., m + 1 and η t IID(0, 1), asserts that the autoregressive structure of the process {Y t } is governed by the range of values of Y t d. The integer-valued quantity d is called the delay parameter and the parameters = r 0 < r 1 < < r m < r m+1 = are known as the thresholds. The regime-changing autoregressive structure in (1.1) offers a simple and convenient means for interpretation. More importantly, realizations of this model capture important features observed in many empirical studies that cannot be explained by linear time series models, e.g., asymmetric limit cycles, amplitude dependent frequencies, jump resonance and chaos, among others. As a result, the TAR model has attracted enormous attention in the literature across different fields such as finance, econometrics and hydrology since the seminal paper of Tong (1978). See for example the excellent surveys on the TAR models by Tong (1990, 2011), and theoretical backgrounds on inference for TAR models by Chan (1993), Li and Ling (2012) and Li et al. (2015). In spite of the well-developed asymptotic theory of estimation, estimation of TAR models incurs a high computational cost due to the irregular nature of the threshold parameters; e.g., see Li and Ling (2012). In particular, to explicitly evaluate the loss functions such as least-squares criterion or likelihood functions, the thresholds r 1,..., r m have to be specified first a priori. Hence, for an (m + 1)-regime TAR model, locating the global minimum of any loss function requires a multi-parameter grid search over all possible values of the r threshold parameters, which is computational infeasible, if not impossible. To circumvent this difficulty, Tsay (1989) develops a transformation that connects (1.1) to a change-point model and suggests a graphical approach to visually determine the number and values of the thresholds. Coakley et al. (2003) use similar techniques to develop an estimation approach that is based on QR factorizations of matrices. When the number of thresholds r is unknown, Gonzalo and Pitarakis (2002) suggest a sequential estimation 2

4 procedure for choosing r, under the assumption that all σ j s are equal. Recently, Chan et al. (2015) reframe the problem into a regression variable selection context and propose an computationally efficient way to solve the problem by least absolute shrinkage and selection operator (LASSO) estimation. In this paper, motivated by the connection between TAR model and the high dimensional regression model developed by Chan et al. (2015), we explore an alternative procedure to efficiently estimate the number and values of thresholds. The procedure is based on an group orthogonal greedy algorithm (GOGA) algorithm for the high dimensional regression problem. With the use of GOGA, the well known problem of biasedness in the LASSO procedure (e.g., Fan and Li (2001) and Zou (2006)) can be circumvented. Simulation studies demonstrate that the GOGA procedure gives much better empirical results than the LASSO procedure. On the other hand, the asymptotic theory is non-trivial since the design matrix of the high dimensional regression model constructed from a TAR model does not satisfy some standard regularity conditions assumed in the literature. In particular, the design matrix has highly correlated columns and thus the standard restricted eigenvalue condition (e.g., Bickel et al. (2009), Ing and Lai (2011)) is not applicable. Substantial arguments are needed in establishing the consistency of the number and locations of the estimated thresholds by GOGA. Further, the convergence rate of the thresholds is also established. This paper is organized as follows. In Section 2, we first review the connection between TAR models and high dimensional regression problem. In Section 3, estimation for TAR models with GOGA will be discussed. Simulation studies are given in Sections 4. Applications to U.S. GNP data are provided in Section 5. Technical details are deferred to the Appendix. 2 TAR Model In this section, we introduce the connection between the TAR model and a high-dimensional regression model developed in Chan et al. (2015). For notational simplicity, suppose we observe Y = (Y 1 d,..., Y n ) T from model (1.1). Let (Y π(1), Y π(2),, Y π(n) ) T be the order statistics 3

5 of (Y 1 d, Y 2 d,, Y n d ) T, from the smallest to the largest. For example, if Y 5 < Y 3 are the two smallest observations among (Y 1 d, Y 2 d,, Y n d ) T, then π(1) = 5 and π(2) = 3. Note from (1.1) that (Y π(1), Y π(2),, Y π(n) ) T governs the dependence structure of Y 0 n (Y π(1)+d,, Y π(n)+d ) T, since the time lag between them is the delay parameter d. Denote the error terms corresponding to Y 0 n by ε(n) = (ε π(1)+d,, ε π(n)+d ) T. Define the design matrix X n as Y T π(1) Y T π(2) Y T π(2) X n = Y T π(3) Y T π(3) Y T π(3)... 0, (2.2)..... Y T π(n) Y T π(n) Y T π(n)... Y T π(n) where Y T π( j) = (1, Y π( j)+d 1, Y π( j)+d 2,..., Y π( j)+d p ) and p is the AR order. Note that we can assume all the regimes follow the same AR(p) model since an AR(p ) model with p < p can be regarded as an AR(p) model with the last few coefficient equaling to zero. Note also that the design matrix X n is of size n n(p + 1), i.e., the number of variables is greater than the number of observations. For simplicity, we assume that the delay parameter d is known. Consider the regression model Y 0 n = X n θ(n) + ε(n), (2.3) where θ(n) {θ T π(1), θt π(2),..., θt π(n) }T is the regression coefficient. By expanding the quantity X n θ(n), it can be shown that (1.1) is equivalent to the model (2.3) with θ π(1) = φ 1 and φ k+1 φ k if Y π( j 1) r k < Y π( j), for k = 1,..., m θ π( j) = 0 otherwise, (2.4) where φ k = (φ (k) 0, φ(k) 1,, φ(k) p ) is the autoregressive parameter vector in the k-th segment of the TAR model and r 1,..., r m are the unknown thresholds. 4

6 The basic idea of model (2.3) is as follows. The j-th block θ π( j) of the coefficient vector θ(n) indicates a change in the autoregressive parameter when the value of the threshold variable is increased from Y π( j 1) to Y π( j). Thus, if a threshold is within the interval [Y π( j 1), Y π( j) ], then θ π( j) is non-zero and it follows that the corresponding observations Y π( j 1)+d and Y π( j)+d are generated from different autoregressive processes. Otherwise, if there is no threshold within the interval [Y π( j 1), Y π( j) ], then θ π( j) is zero and Y π( j 1)+d and Y π( j)+d are in the same regime. Since only m of the vectors θ π( j) s in θ(n) are non-zero, the solution to the high dimension regression model (2.3) is sparse. In Section 3, we develop a three-step procedure using GOGA to obtain an estimate ˆθ n to (2.3). Given the estimate ˆθ n, unknown thresholds can be identified by the positions of non-zero elements in ˆθ(n). In particular, the estimates of the thresholds are given by A n = {Y π( j 1) : θ π( j) 0, j 2}. (2.5) Thus, ˆm #(A n ), the cardinality of the set A n, is the estimated number of thresholds. Denote the elements of A n by Y π(ˆt1 ),..., Y π(ˆt ˆm ) so that Y π(ˆtk ) is the k-th estimated threshold. Finally, the autoregressive parameter for the first and the k + 1-th regime can be estimated by ˆφ 1 = ˆθ π(1) and ˆφ k = ˆt k ˆθ j=1 π( j), respectively, where k = 1,..., ˆm. Since many computationally efficient procedures for high dimensional regression have been developed in recent decades (e.g., Efron et al. (2004), Fan and Li (2001), Fan and Lv (2008), Buhlmann et al. (2009), Cho and Fryzlewicz (2012) and Ing and Lai (2011)), estimating the TAR model under the framework of (2.3) can be much more effective than the traditional approaches of a full combinatorial search. Chan et al. (2015) proposed a group LASSO procedure to solve (2.3). The computation is conducted efficiently by a group LARS algorithm that approximately solves the group LASSO problem. In the next section we develop a group orthogonal greedy algorithm (GOGA) algorithm which yields improved thresholds estimates. 5

7 3 Three-step procedure: GOGA+HDIC+Trim In this section, we introduce a three-step procedure originally proposed by Ing and Lai (2011) to estimate the high-dimensional regression model (2.3). Some modifications are introduced to the current setting. The first step employs the group orthogonal greedy algorithm (GOGA) to build a solution path, which includes one additional group of variables sequentially. The second and third steps incorporate a high-dimensional information criterion (HDIC) to consistently select relevant groups of variables on the solution path. 3.1 Group Orthogonal Greedy Algorithm Let X n, j = (0,..., 0, Y π( j), Y π( j+1),..., Y π(n) ) T for j = 1,..., n. From (2.4), the design matrix can be expressed as X n = {X n,1, X n,2,, X n,n }. Replacing Y 0 n by Y 0 n Y 0 n and X n,i by X n,i X n,i, for i = 1,..., n, where Y 0 n = n 1 11 T Y 0 n and X n,i = n 1 11 T X n,i, we can assume that Y 0 n and X n,i have mean zeros. Here 1 is an n-dimension vector with all elements equaling to 1. The GOGA algorithm is described as follows: Group Orthogonal Greedy Algorithm (GOGA) 1. Initializing k = 0 with fitted value Ŷ (0) = 0 and active set ˆ J 0 =. 2. Compute the current residuals U (k) := Y 0 n Ŷ (k). Regress U (k) against each of X n, j for j = 1,..., n and obtain the most correlated group ĵ k+1 = arg min U (k) X n, j (X T n, jx n, j ) 1 X T n, ju (k) 2. (3.6) 1 j n Include the ĵ k+1 -th group to update the active set as ˆ J k+1 = { ĵ 1,, ĵ k, ĵ k+1 }. 3. Transform the matrix X n, ĵk+1 into the orthogonalized version X n, ĵ k+1 as follows: (a) Denote H { j} = X n, j(x n, jt X n, j) 1 X n, jt. The first column of X n, ĵ k+1, denoted as X n, ĵ k+1,1, is 6

8 computed by X n, ĵ k+1,1 = X n, ĵ k+1,1 k H X { ĵ i } n, ĵ k+1,1, (3.7) i=1 i.e., the residual of projecting the first column of X n, ĵk+1 onto the space spanned by { X n, ĵ i }i=1,...,k. (b) For l = 2,..., p + 1, the l-th column of X n, ĵ k+1, X n, ĵ k+1, is computed by,l X n, ĵ k+1,l = X n, ĵ k+1,l k l 1 H X { ĵ i } n, ĵ k+1,l H X { ĵ k+1,u} n, ĵ k+1,l. (3.8) i=1 u=1 Thus, the l-th column X n, ĵ k+1,l is orthogonal to both X n, ĵ 1, X n, ĵ 2,..., X n, ĵ k and X n, ĵ k+1,1,..., X n, ĵ k+1,l Updated the fitted value Ŷ (k+1) by Ŷ (k+1) = Ŷ (k) + H { ĵ k+1 } U(k). (3.9) 5. Repeat steps 2-4 until iteration k = K n, where K n is a pre-specified upper bound. The resulting variables selected is given by ˆ J Kn = { ĵ 1,..., ĵ Kn }. There are two advantages to obtain the orthogonalized predictors X n,i sequentially. First, the difficulties from inverting high-dimensional matrices in the usual implementation of ordinary least squares can be circumvented by a componentwise linear regression. Second, the algorithm is more efficient since the same group cannot be selected repeatedly. If each group contains only one variable, then the GOGA reduces to the ordinary orthogonal greedy algorithm. Note that the most correlated group (3.6) can be rewritten as ĵ k+1 = arg min 1 j n 1 r2 j, (3.10) where r j is the correlation coefficient between X n, j and the current residual U (k). In other words, OGA chooses the predictor that is most correlated with U (k) at the k-th step. This is similar to the LARS algorithm ( Efron et al. (2004)). However, the LARS algorithm selects the best predictor that has as much correlation with the current residual as the best predictor at the next iteration does. In 7

9 the high-dimension regression model (2.3), the design matrix in (2.2) has a very special structure: the adjacent groups are highly correlated since they differ by only one entry. As a consequence, LARS algorithm tends to select many irrelevant groups around each important group. Although a second step involving information criterion can be used to exclude these irrelevant groups (Chan et al. (2015)), the effect is not satisfactory because it is tricky to distinguish the true relevant groups. The results could be even worse when the number of thresholds is large. On the other hand, the GOGA s greedy manner can avoid including too many irrelevant groups in the active set if the iterations is not large. After a group X n, j is selected, then the groups near X n, j would not be selected easily because their contributions to further reducing current residuals are relatively small compared to the groups that are far away from X n, j. Ing and Lai (2011) first use the OGA to solve a general high dimensional regression problem. They establish the sure screening property (i.e., including all relevant variables with probability approaching 1) of the OGA with a relative small number of iterations, K n = O((n/ log p) 1/2 ), where n is the sample size and p is the number of variables. Their extensive simulations studies indicate that GOGA a better performance then other penalizaton methods such as SIS-SCAD, ISIS-SCAD and Adaptive Lasso. In the theoretical results of Ing and Lai (2011), it is assumed that the regression predictors are not highly correlated with each others, which is reasonable in a practical variable selection context. Since the design matrix X n in (2.3) has the special property that adjacent groups are highly correlated, the assumption in Ing and Lai (2011) is not guaranteed, however. Indeed, as adjacent columns in X n only differs by one entry, they cannot be distinguished asymptotically, and thus the usual consistency of model selection does not hold. Therefore, the theoretical properties of the solution for (2.3) is different from that in Ing and Lai (2011) and should incorporate information from TAR models. To establish the asymptotic theory, we impose the following assumptions, which are similar to Chan and Tsay (1998) and Chan et al. (2015): Assumptions. A1: {η t } has a bounded, continuous and positive density function and E η 1 2+ι < for some ι > 0. 8

10 A2: {Y t } is a α-mixing stationary process with a geometric decaying rate and E Y t 2+ι <. A3: min 2 i m0 +1 φ 0 i φ 0 i 1 > ν for some ν > 0, where φ0 i is the true AR parameter vector in the i-th regime, i = 1,..., m Also, there exists Z i = (1, z p 1,i,..., z 0,i ) T, where z p d,i = r i 1, such that (φ 0 i 1 φ0 i )T Z i 0 for i = 2,..., m A4: There exist constants l, u such that r i [l, u] for i = l, 2,..., m 0, where m 0 is unknown and not fixed. Furthermore, for some γ n satisfying γ n /n 0, m 0 γ n /n 0 and m 0 nγ 1 n (γ n ) ι/2 (log n) 2(2+ι) 0, it holds that min 2 i m0 r i r i 1 /(γ n /n). The following theorem ensures that the variables selected by the GOGA will be close to the true one so that consistency of the thresholds ˆr 1,..., ˆr m can be obtained. Theorem 3.1. Suppose that Assumptions A1-A4 hold and let K n = O((n/ log n) 1/2 ). Suppose that the groups ˆ J Kn = { ĵ 1, ĵ 2,..., ĵ Kn } is selected by the GOGA at the end of K n iterations. Let J = { j 0 1, j0 2,..., j0 m 0 } be the index set of true non-zero elements in θ(n). Then as n, P{d H ( ˆ J Kn, J) γ n } 1, where γ n (3.11) = log(n) and d H (A, B) is the Hausdorff distance between two sets A and B given by d H (A, B) = max b B min a A b a, and d H (A, ) = d H (, B) = 1 if is an empty set. It follows that { P d H (ˆr, r 0 ) γ } n 1, n where ˆr and r 0 are the sets of estimated and true thresholds, respectively. (3.12) 3.2 High-dimensional Information Criterion Note that Theorem 3.1 only ensures that each true threshold can be identified by an estimated threshold up to a γ n /n neighborhood. It is possible that the set ˆr is greater than r 0, thus the consistency of ˆm = #(ˆr) for the number of threshold is not guaranteed. One natural remedy is to select the best subset among ˆr according to some criteria. Based on Ing and Lai (2011), we prescribe the 9

11 second and the third step of the threshold estimation procedure in this subsection. In particular, the second step selects the model along the solution path of GOGA based on a high-dimensional information criterion (HDIC). The third step further trims the model to eliminate irrelevant variables by HDIC. Specifically, for a non-empty subset J of {1,..., n}, let ˆσ 2 J = n 1 Y 0 n Ŷ 0 n;j2, where Ŷ 0 n;j is the fitted value by projecting Y 0 n onto the space spanned by {X n, j }, j J. Define the high-dimensional information criterion (HDIC) by HDIC(J) = n log ˆσ 2 J + #(J) log n(log n log log n), (3.13) ˆk n = arg min HDIC( Jˆ k ), 1 k K n (3.14) in which ˆ J k = { ĵ 1,..., ĵ k } contains the first k groups along the GOGA solution path. Similar to other information criteria, in (3.13), the n log ˆσ 2 J is the lack of fit term and #(J) log n(log n log log n) is the penalty term for model complexity. The HDIC reduces to the usual BIC without the factor log n log log n. Thus, the factor log n log log n can be interpreted as the additional adjustment for a high-dimensional problem. Note also that the penalty term of the HDIC in Ing and Lai (2011) is log n(log n(p + 1)), where n(p + 1) is the total number of columns in X n. It is different from the term log n(log n log log n) in (3.13) because groups of variable are selected. In the derivation of the HDIC in Section 4.2 of Ing and Lai (2011), the penalization factor log p log log p is established, where p is the number of variables. Since the log log p term is relatively small, only the log p term was used in defining the HDIC. However, extensive simulation evidence suggests that keeping the log log p term yields good finite sample performance in the threshold estimation context. Therefore, as there are n groups of variables, we have p = n and the term log n log log n in our definition of HDIC. In terms of the computational complexity, the second step involves relatively little cost because ˆσ 2 J can be readily computed in the k-th GOGA iteration. Therefore, the HDIC along the GOGA solution path at each step can be obtained efficiently. 10

12 Finally, the third step of the threshold estimation procedure trims the solution set ˆ Jˆk n second step to eliminate irrelevant groups as follows. Define in the ˆN n = { ĵ l : HDIC( ˆ Jˆk n { ĵ l }) > HDIC( ˆ Jˆk n ), 1 l ˆk n } if ˆk n > 1, (3.15) and ˆN n = { ĵ 1 } if ˆk n = 1. The reason of conducting the third step is that the second step procedure only chooses the best subset of variables along the GOGA solution path. While it guarantees that all relevant variables are selected, irrelevant groups may be present. Therefore, the third step is required to eliminate the irrelevant variable by HDIC again. Note that HDIC( ˆ Jˆk n ) is already obtained in the preceding step, obtaining ˆN n only requires the computation of ˆk n 1 ordinary least squares regressions. Since ˆk n is usually not large, the third step can be quickly computed. The following theorem shows that the three-step procedure, GOGA+HDIC+Trim, has the consistency property, i.e., with probability approaching 1, the number of threshold is correctly estimated and all the true threshold can be identified up to a neighborhood shrinking to zero. The proof follows from similar arguments in Chan et al. (2015) and is thus omitted. Theorem 3.2. Assume that Assumption A1-A4 hold. For the index set ˆN n = { ĵ 1, ĵ 2,..., ĵ #( ˆN n )} that satisfies (3.15), there exist a constant B > 0 and γ n = O(log n) such that as n, { } P{ ˆN n = m 0 } 1 and P max ĵ i j 0 i Bγ n 1. 1 i m 0 Equivalently, we have { } P max ˆr i r 0 i Bγ n 1. 1 i m 0 n 4 Simulation Studies In this section, we explore the finite sample performance of GOGA+HDIC+Trim. We also consider the two-step procedure proposed by Chan et al. (2015), who use LASSO to build the solution path. We first compare the performances of the two methods in several simple threshold models 11

13 and then demonstrate the advantage of GOGA+HDIC+Trim in complicated cases, which contain larger number of thresholds. In applying the GOGA+HDIC+Trim, K n = n/ log n is employed in the first step. Example 1 In this example, 1000 realizations are simulated from the model Y t 1 0.2Y t 2 + ɛ t, if y t 1 1.5, Y t = 1.9Y t Y t 2 + ɛ t, if 1.5 < y t 1 1.5, (4.1) Y t Y t 2 + ɛ t, if 1.5 y t 1. Following the transformation in Section 2.1, we obtain the response Y 0 n {y 0 1, y0 2,..., y0 n}. The original observations {Y t } and the transformed observations {y 0 t } are plotted in Figure 1 for a sample with size n = 600. Note that the right plot shows a clear regime switching structure, which suggests that the model may contain two thresholds. We apply the GOGA-HDIC-Trim and the LASSO procedure to estimate the threshold. The results are summarized in Table 1. The percentage (%) of correct estimation for the number of regimes, and the bias and standard error of the threshold estimates are reported. The bias and standard error are computed for the cases where the estimated number of thresholds equals to the true value. From Table 1, GOGA+HDIC+Trim method has 100 % accuracy in detecting the two thresholds and has lower bias and standard errors of the first threshold estimates in all the three different sample sizes. For the second threshold estimate, GOGA+HDIC+Trim has three times smaller bias. This is consistent with the finding that LASSO estimation tends to have a larger bias in the regression coefficients. Example 2 In this example, 1000 realizations are generated from Y t 1 + ɛ t, if y t 1 1, Y t = Y t Y t 2 + ɛ t, if 1 < y t 1 2.5, (4.2) Y t 1 0.6Y t 2 + ɛ t, if 2.5 y t 1. 12

14 Similar to Figure 1, the transformed variable Y 0 n in Figure 2 also suggests that three regimes exist in the data. Table 2 reports the estimation results of the GOGA+HDIC+Trim and the LASSO procedure. In this case, GOGA+HDIC+Trim has a much better performance than the LASSO procedure in terms of the number and accuracy of the thresholds estimates. In particular, the percentage of correct identification is 100 for GOGA+HDIC+Trim but is only 90 for LASSO. Also, GOGA+HDIC+Trim attains lower bias and standard errors in both two thresholds for all sample sizes. Example 3 In this example, the time series 0.8Y t 1 0.2Y t 2 + ɛ t, if y t 1 2, Y t = 1.9Y t Y t 2 + ɛ t, if 2 < y t 1 2, (4.3) 0.6Y t 1 Y t 2 + ɛ t, if 2 y t 1, is more difficult to handle because the intercepts in all segments are zero and each segment has a similar structure. In particular, it is observed from the right plot of Figure 3 that the regime switching pattern is more obscure. Performances of the two methods are summarized in Table 3. Surprisingly, the results of GOGA+HDIC+Trim is still very promising. The number of thresholds is perfectly identified in such a difficult situation. In contrast, only 70% of the LASSO estimates identify the correct number of thresholds. Moreover, GOGA+HDIC+Trim achieves lower bias and standard deviations in all thresholds and in all sample size. Example 4 13

15 In this example, 1000 realizations are simulated from the model Y t 1 0.5Y t ɛ t, if y t 1 1, Y t = Y t Y t 2 + ɛ t, if 1 < y t 1 2.5, (4.4) Y t 1 0.6Y t Y t 3 + ɛ t, if 2.5 y t 1. Compare to the previous examples, this model involves higher AR orders and conditional heteroscedasticity. The estimation results of GOGA+HDIC+Trim and the LASSO procedure are summarized in Table 4. In can be seen that percentage of correct identification of GOGA+HDIC+Trim is always larger than that of the LASSO procedure under three different settings of sample size n. Also, GOGA+HDIC+Trim achieves lower bias and standard deviations in most cases. Example 5 In this last example, we use the following model to explore the performance of the two methods when the number of thresholds is large: Y t 1 + ɛ t, if y t 1 3.5, Y t Y t 2 + ɛ t, if 3.5 < y t 1 2.5, 2 0.9Y t 1 + ɛ t, if 2.5 < y t 1 1.5, Y t Y t 2 + ɛ t, if 1.5 < y t 1 0.5, Y t = Y t 1 + ɛ t, if 0.5 < y t 1 0.5, (4.5) Y t 1 + ɛ t, if 0.5 < y t 1 1.5, Y t 1 + ɛ t, if 1.5 < y t 1 2.5, Y t 1 0.2Y t 2 + ɛ t, if 2.5 < y t 1 3.5, Y t 1 + ɛ t, if 3.5 y t 1. 14

16 The transformed observations are plotted in Figure 5 and the estimation results are reported in Table 5. From Figure 5, the boundaries between each segments are indistinguishable especially for the last two thresholds. When the sample size is as small as 2000, each segment has only around 200 observations. With such a little information, GOGA+HIDC+Trim still has over 84% chance to detect the correct number of thresholds with small bias and stardard errors. When the number of observations increases to 3000, GOGA+HDIC+Trim attains nearly 100% detection accuracy, and achieves low bias and small standard deviation for all thresholds estimates. Again, GOGA+HDIC+Trim outperforms the LASSO procedure in terms of the number and accuracy of the threshold estimates. The LASSO procedure is particularly disappointing in this scenario in view of the fact that it has only 54.7% detection accuracy even for n = One possible reason for the failure of LASSO is that too many irrelevant variables around each true threshold are selected in the first step. The maximum number of threshold K is not sufficient to include groups around all true thresholds when the number of true threshold is large, hence the number of thresholds is under-estimated. As discussed in Section 3.1, GOGA avoids this problem and thus better empirical performance is achieved. 5 Applications to Real Data In this section, we apply the GOGA-HDIC-Trim procedure to a multiple-regime TAR model for the growth rate of the quarterly U.S. real GNP data over the period Given the quarterly GNP data {y t } t=1,...,261 from 1947 to the first quarter of 2012, the growth rate is defined as x t = 100(log y t log y t 1 ) t = 2,..., 261. This data set has been investigated previously by Li and Ling (2012) and Chan et al. (2015). Li and Ling (2012) use a three-regime model encompassing bad, good and normal times and the two threshold are 1.20 and 2.43 respectively. The two-step procedure of Chan et al. (2015) identified 15

17 three thresholds at 1.23, 1.83 and 2.55 respectively. They also show that the four-regime model may give a better fit to the data. Setting K n = 6 and AR order p = 11 in GOGA-HDIC-Trim, the estimated thresholds are 1.23, 1.65 and 2.23, which are very quite close to the estimates in Chan et al. (2015). This example not only illustrates the usefulness of GOGA+HDIC+Trim, but also reinforces the notion that a four-regime model may give a better explanation to this GNP data set. Acknowledgements We would like to thank an Associate Editor and two anonymous referees for their thoughtful and useful comments, which led to an improved version of this paper. Research supported in part by HKSAR-RGC-GRF Nos: , and HKSAR-RGC-CRF: CityU8/CRG/12G (Chan); Academia Sinica Investigator Award (Ing), HKSAR-RGC-ECS: and HKSAR-RGC-GRF: (Yau). Part of this research was conducted while NH Chan was visiting Renmin University of China (RUC) and Southwestern University of Finance and Economics (SWUFE). Research supported by the School of Statistics at RUC and SWUFE is gratefully acknowledged. 16

18 References Bickel, P., Ritov, Y. and Tsybakov, A. (2009) Simultaneous analysis of lasso and dantzig selector. Annals of Statistics, 4, Buhlmann, P., Kalisch, M. and Maathuis, M. (2009) Variable selection for high-dimensional models: partially faithful distributions and the pc-simple algorithm. Biometrika, 97, Chan, K. S. (1993) Consistency and limiting distribution of the least squares estimator of a threshold autoregressive model. Annals of Statistics, 21, Chan, K. S. and Tsay, R. S. (1998) Limiting properties of the least squares estimator of a continuous threshold autoregressive model. Biometrika, 85, Chan, N. H., Yau, C. Y. and Zhang, R. (2015) Lasso estimation for threshold autoregressive models. Journal of Econometrics, In press. Cho, H. and Fryzlewicz, P. (2012) High dimensional variable selection via tilting. Journal of Royal Statistical Society, Series B, 74, Coakley, J., Fuertes, A.-M. and Pérez, M.-T. (2003) Numerical issues in threshold autoregressive modeling of time series. Journal of Economic Dynamics and Control, 27, Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004) Least angle regression. Annals of Statistics, 32, Fan, J. and Li, R. (2001) Variable selection via nonconcave penalized likelhiood and its oracle properties. Journal of American Statistical Association, 96, Fan, J. and Lv, J. (2008) Sure indendence screening for ultrahigh dimensional feature space. Journal of Royal Statistical Society, Series B, 70, Gonzalo, J. and Pitarakis, J.-Y. (2002) Estimation and model selection based inference in single and multiple threshold models. Journal of Econometrics, 110,

19 Ing, C. K. and Lai, T. L. (2011) A stepwise regression method and consistent model selection for high-dimensional sparse linear models. Statistica Sinica, 21, Li, D. and Ling, S. (2012) On the least squares estimation of multiple-regime threshold autoregressive models. Journal of Econometrics, 167, Li, D., Ling, S. and Zhang, R. (2015) On a threshold double autoregressive model. Journal of Business and Economic Statistics, In press. Temlyakov, V. (2000) Weak greedy algorithms. Advances in Computational Mathematics, 12, Tong, H. (1978) On a threshold model. In Pattern Recognition and Signal Processing. NATO ASI Series E: Applied Sc. (29). (ed. C. Chen), Oxford University Press, Oxford. Tong, H. (1990) Non-linear Time Series: a Dynamical System Approach. Oxford University Press, Oxford. Tong, H. (2011) Threshold models in time series analysis 30 years on. Statistics and its Interface, 4, Tsay, R. S. (1989) Testing and modeling threshold autoregressive processes. Journal of the American Statistical Association, 84, Zou, H. (2006) The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101, Appendix: Proof of Theorems First define two types of group orthogonal greedy algorithms, population GOGA and its generalized version, which are crucial to the study of the sample version of GOGA introduced in Section 18

20 3.1. For notational simplicity, we denote X n, j by X j. Suppose the true model is u = n j=1 X j b j. Let H J be the projection matrix associated with the linear space spanned by X j, j J {1,..., n}. Since ( ) u X j X T 1 j X j X T j u u 2 the criterion (3.6) is equivalent to ĵ k+1 = arg max 1 j n U(k)T H { j} U (k). 2 = 1 ut H { j}u u 2, This representation will be used in the following proof. First, we introduce the population GOGA as follows. Let U (0) = u, j 1 = arg max 1 j n U (0)T H { j} U (0) and U (1) = ( I H { j 1 }) u. At the m-th iteration, we have j m = arg max 1 j n U(m 1)T H { j} U (m 1), U (m) = (I H { j 1,..., j m } )u, and the active set J m = { j 1,..., j m }. The population GOGA approximates u by H J m u. (A.1) The generalization of the population GOGA considers an additional parameter 0 ξ 1. At the i-th step, instead of (A.1), the generalization of the population GOGA replaces j i by j i,ξ, where j i,ξ is any 1 l n satisfying U (i 1)T H {l} U (i 1) ξ max 1 j n U(i 1)T H { j} U (i 1). (A.2) Without loss of generality, we assume that each group X j contains only two variables. Thus, 19

21 the matrix X n in (2.2) can be denoted as a 1 b a 2 b 2 a 2 b X n = a 3 b 3 a 3 b 3 a 3 b , (A.3) a n 1 b n 1 a n 1 b n 1 a n 1 b n 1... a n 1 b n 1 a n b n a n b n a n b n... a n b n where X j is the submatrix containing the (2 j 1)-th and (2 j)-th columns of X n. Now we prove an inequality for the generalization of population GOGA. Lemma A.1 Let 0 ξ 1, m 1, and J m,ξ = { j 1,ξ, j 2,ξ,, j m,ξ } be the active set selected by the generalization of population GOGA at the end of the m-th iteration. Then, I H J m,ξ u 2 2 n a j (1 + mξ) 1, ( ) where a j = b T j X T j X j A j and A j A j = ( ) 1. X T j X j Proof. For J {1,..., n}, i 1,..., n and m 1, define v J,i = n 1/2 A i X T i (I H J )u. Note that ) 2 ( I H J m,ξ u j=1 ) ( ) ( ( I H J m 1,ξ u T 1 ) T 2 X j m,ξ X j m,ξ X j m,ξ X j m,ξ I H J m 1,ξ u ) 2 ( I H J m 1,ξ u n v J m 1,ξ, j m,ξ 2 ) 2 ( I H J m 1,ξ u nξ max 2. v J 1 j n m 1,ξ, j (A.4) 20

22 Note that for i, j {1,..., n}, ) 2 ( ) 2 ( I H J m,ξ u I H J m 1,ξ u nξ v J m 1,ξ,i 2, and ) 2 ( ) 2 ( I H J m,ξ u I H J m 1,ξ u nξ v J m 1,ξ, j 2. Therefore, ( ( I H Jm,ξ) u 2 ) 2 Thus, we have = ( ( I H Jm 1,ξ) ) 2 2 ( u + (nξ) 2 v J m 1,ξ,i 2 v J m 1,ξ, j 2) ) 2 2 ( I H J m 1,ξ u nξ v J m 1,ξ,i v J m 1,ξ, j ( ( ) 2 I H Jm 1,ξ u nξ v J m 1,ξ,i 2 v J m 1,ξ, j). ) 2 ( ) 2 ( I H J m,ξ u I H J m 1,ξ u nξ v J m 1,ξ,i Since this is true for i, j {1,..., n}, ) 2 ( ) 2 ( I H J m,ξ u I H J m 1,ξ u nξ max On the other hand, ( ( I H Jm 1,ξ) u 2 ) 2 = v J 1 i, j n m 1,ξ,i n ( ) b T j X T j I H J m 1,ξ u j=1 n = n = n a j v J m 1,ξ, j j=1 n ( ai v J m 1,ξ,i i, j=1 n max v J 1 i, j n m 1,ξ,i 2 2 ) ( a j v J m 1,ξ, j) v J m 1,ξ, j j=1 v J m 1,ξ, j. v J m 1,ξ, j. (A.5) 2 n a j. (A.6) 21

23 It follows from (A.4) and (A.5) that ) 2 ( ) 2 ( I H J m,ξ u I H J m 1,ξ u 1 ( n j=1 ξ ) 2 a ) ( I H 2 J m 1,ξ u. j Combining (A.6) with Lemma 3.1 of Temlyakov (2000) yields the desired conclusion. (A.7) Lemma A.2 For the matrix X n defined in (A.3), denote X j,k (1 j n, k = 1, 2) as the k-th columns of the j-th group. Then there exists M > 0 such that max 1 #(J) K n,i J,k=1,2 (XT J X J ) 1 X T J X i,k 1 < M a.s. as n Proof. For notational simplicity, let J = {1, 2,..., p} and i = p + s, where p < K n. That is, we select the first p groups of X n as J and consider X p+s = (X p+s,1, X p+s,2 ) which is s groups away from J. Since the best predictor of X p+s in terms of X J is through X p, the regression coefficients of other variables except X p is zero. Thus, we only need to calculate (X T px p ) 1 X T px p+s instead of the complicated inverse (X T J X J ) 1. Therefore, (X T J X J ) 1 X T J X p+s = 0 0, (A.8) m11 m12 m21 m22 22

24 where m 11 m 12 = n j=p a 2 j n j=p a j b j n j=p+s a 2 j n j=p+s a j b j. m 21 m 22 n j=p a j b j n j=p b 2 j 1 n j=p+s a j b j n j=p+s b 2 j Let q j = (a j b j ) T and n j=p a 2 j n j=p a j b j Q =. n j=p a j b j n j=p b 2 j 1 Let Σ = lim n Q be the population covariance matrix. Then it follows that for sufficiently n large n, p+s 1 Q 1 Q j=p q j q T j = I Q 1 p+s 1 j=p q j q T j = I n 1 (Σ 1 + O p (1)) p+s 1 j=p q j q T j. (A.9) Combining (A.9), s n and the ergodicity of TAR model, we have that m i j, i, j = 1, 2, are bounded by a constant almost surely. The other cases can be proved similarly. Thus, the proof of Lemma A.2 is completed. Proof of Theorem 3.1. For a given 0 ξ 1, let ξ = 2/(1 ξ). Define v J,i = n 1/2 A i X T i (I H J )u and ˆv J,i = n 1/2 A i X i (I H J )y, where y = n j=1 X j b j + ɛ, and the white noise ɛ t = σ j η t if Y t d belongs to the j-th regime. Define also σ U = max{σ 1, σ 2,..., σ m0 +1}. Construct the sets A n and B n as A n = B n = { } max (J,i):#(J) m 1,i J ˆvT J,iˆv J,i v T J,iv J,i C 2 σ 2 Un 1 log n, { min max v J ˆ 0 k m 1 k,i } v J ˆk, j > ξ 2 C 2 σ 2 Un 1 log n, 1 i, j n where ˆ J m = { ĵ 1,, ĵ m } is the index set associated with the groups selected by the sample GOGA 23

25 . For all 1 q m and on the set A n B n, we have v Ṱ J q 1, ĵ q vj ˆq 1, ĵ q ˆv Ṱ J q 1, ĵ q ˆv J ˆq 1, ĵ q v Ṱ J q 1, ĵ q vj ˆq 1, ĵ q + ˆv Ṱ J q 1, ĵ q ˆv ˆ max ˆv T J,iˆv J,i v T J,iv J,i + ˆv T ˆ (J,i):#(J) m 1,i J J q 1, ĵ q ˆv ˆ ( ) C 2 σ 2 Un 1 log n + max ˆv Ṱ jˆv 1 j n J q 1, Jˆ q 1, j ( ) 2C 2 σ 2 Un 1 log n + max v Ṱ v 1 j n J q 1, j Jˆ q 1, j ξ max 1 j n vṱ J q 1, j v ˆ J q 1, j. J q 1, ĵ q J q 1, ĵ q (A.10) That is, on the set A n B n, ˆ J m is also the set of groups chosen by the generalization of the population GOGA (A.2). Therefore, Lemma A.1 implies the upper bound 2 ) 2 n ( I H J ˆm u IAn B n a j (1 + mξ) 1. (A.11) j=1 ) Since 2 ( ) 2 ( I H J ˆm u I H J ˆk u for 0 k m 1, we have that on B c n, ) 2 ) ( I H J ˆm u min 2 ( I H J ˆk u Define 0 k m 1 min 0 k m 1 n max 1 i, j n v ˆ J k,i v ˆ n ξcσ U (log n) 1/2 a j. { C n = j=1 min max 0 k m 1 1 i, j n v ˆ J k,i J k, j n a j v ˆ j=1 J k, j 2 (Lemma A.1) } < ξ 2 C 2 σ 2 Un 2 log n, Similar to the preceding calculations, it can be shown that, on C n, ) 2 ( I H J ˆm u min 0 k m 1 n max 1 i, j n v ˆ J k,i v ˆ J k, j n ξcσ U (n 1 log n) 1/2 a j, j=1 n a j j=1 2 (A.12) 24

26 With the specific structure of the matrix X n, it can be shown that for m > K n (K n >> m 0 ), ( lim P min max v J ˆ n 0 k m 1 k,i ) v J ˆk, j < ξ 2 C 2 σ 2 Un 2 log n = 1, 1 i, j n Denote a = nj=1 a j n. For a given m, it follows from (A.11) and (A.12) that n 1 ( I H ˆ J m ) u 2 IAn a mξ I (log n) 1/2 {m<k n } + a ξcσ U I {m Kn }. n (A.13) Since A n decreases as m increases, we have for all 1 m K n that n 1 ( I H ˆ J m ) u 2 IA a mξ I (log n) 1/2 {m<k n } + a ξcσ U I {m Kn }, n where A corresponds to the set A n with m = K n. Note that max #(J) K n 1,i J ˆv T J,iˆv J,i v T J,iv J,i = max #(J) K n 1,i J n 1 u T (I H J )H {i} (I H J )u y T (I H J )H {i} (I H J )y = max #(J) K n 1,i J n 1 ɛ T (I H J )H {i} (I H J )ɛ + 2u T (I H J )H {i} (I H J )ɛ. Denote each group X i as X i = (X i1 X i2 ) and n(x T i X i ) 1 = w i,11 w i,12 w i,21 w i,22 X T i1 (I H J)ɛ and v i2 = X T i2 (I H J)ɛ. The first part of (A.14) can be expressed as v 2 n 1 ɛ T (I H J )X i (X T i X i ) 1 X T i1 i (I H J )ɛ = w i,11 n + w 2 i,22 n + 2w v i1 v i2 2 i,12. n 2 Using Lemma A.2, we can show that v 2 i2 (A.14). Define also v i1 = max #(J) K n 1,i J n 1 v i1 = max #(J) K n 1,i J n 1 X T i1 (I X J(X T J X J ) 1 X T J)ɛ = max #(J) K n 1,i J n 1 X T i1 ɛ XT i1 X J(X T J X J ) 1 X T Jɛ max #(J) K n 1,i J max 1 i n, j=1,2 n 1 X T i jɛ (1 + X T i1 X J(X T J X J ) 1 X T J ) max 1 i n, j=1,2 n 1 X T i jη (1 + M)σ U. (A.15) 25

27 Moreover, for a sufficiently large constant C > 0, we have ( ) 1/2 log n P max #(J) K n 1,i J n 1 v i1 > Cσ U n ( n P max t=1 X i j,t η ) t 1 i n, j=1,2 > C(log n)1/2 (1 + M) 1 0, n 1/2 (A.16) as n. The last convergence (A.16) follows from the invariant principle and the fact that { n t=1 X i j,t η t } i=1,2,... is a partial sum process and {η t } t=1,2,... is an independent sequence. Similarly, we can show that ( ) 1/2 log n P max #(J) K n 1,i J n 1 v i2 > Cσ U 0 as n. (A.17) n Since w i,11, w i,12 and w i,22, 1 i n, are O p (1), it follows from (A.16) and (A.17) that ( P max #(J) K n 1,i J n 1 ɛ T (I H J )H {i} (I H J )ɛ ) > C 2 σ 2 log n U 0, n (A.18) as n. Since most coefficients are zero-vector and only m 0 groups are non-zero, the second part of (A.14) is equal to j m0 2n 1 j= j 1 b T j X T j (I H J )X i (X T i X i ) 1 X T i (I H J )ɛ. (A.19) For each component of (A.19), we have = 2 where 2n 1 b T j X T j (I H J )X i (X T i X i ) 1 X T i (I H J )ɛ ( v i1 a j1 w i,11 n n + w v i2 a j2 i,22 n n + w v i1 i,21 n max a jk #(J) K n 1,i J n = max #(J) K n 1,i J n 1 b T j X T j X ik n 1 b T j X T j H J X, ik k = 1, 2, are bounded. Combining with (A.16) and (A.17), we have a j2 n + w v i2 i,12 n a ) j1, n P( max #(J) K n 1,i J n 1 2u T (I H J )H {i} (I H J )ɛ > C 2 σ 2 log n U n ) 0, (A.20) 26

28 as n. Finally, it follows from (A.18) and (A.20) that lim n P(Ac ) = 0. (A.21) Moreover, for m K n, it follows from (A.13) that lim n P(Ac n(m)) lim P(A c ) = 0, n n 1 ( I H ˆ J m ) u 2 IAn (m) It follows from (A.21) that, (A.22) a mξ I {m<k n }+a ξcσ U (log n) 1/2 n I {m Kn}. (A.23) lim P(F n) = 0, n where F n = {n 1 (I H ˆ J m )u 2 > j ˆ J m J a2 I (log n) 1+mξ {m<k n } + a ξcσ 1/2 U I n {m Kn }}. (A.24) For J {1,..., n} and j J, define b j (J) be the parameter of X j in the best linear predictor i J X i b i (J) of u that minimizes u i J X i λ i 2. Let b j (J) = 0 if j J. Thus, ( ) n 1 2 I H J ˆm u = n 1 ( X j b j b j ( Jˆ m ) ) 2. (A.25) Now suppose that a true group j 0 s has not been selected, that is, ˆ J c m J = { j 0 s}, where m > K n. We can conclude that there must exist at least one group ĵ l ˆ J m contained in the neighbourhood of j 0 s, i.e., l s < γ n = log(n). Otherwise, on { ˆ J c m J and l s > log(n)}, it follows from (A.25) that n 1 ( I H ˆ J m ) u 2 n 1 ( ) min b j 2 j J ( λ min X T Jˆ X ) D log(n) m J Jˆ m J = n, (A.26) ( where D is a positive constant and λ min X T Jˆ X ) m J Jˆ m J is the smallest eigenvalue of X T Jˆ X m J Jˆ m J. For a sufficiently large n, we obtain that n 1 ( I H ˆ J m ) u 2 D log(n) n (log n) 1/2 a ξcσ U. (A.27) n Since we already select more than K n variables using the sample GOGA, this implies that {d H ( ˆ J m, J) > 27

29 log(n)} F n. Thus, lim P(d H( Jˆ Kn, J) log(n)) = 1. n 28

30 Figure 1: Example 1: Left: Time series plot of a realization of (4.1). Right: Time series plot of the transformed observations Y 0 n. Sample size n =

31 Figure 2: Example 2: Left: Time series plot of a realization of (4.2). Right: Time series plot of the transformed observations Y 0 n. Sample size n =

32 Figure 3: Example 3: Left: Time series plot of a realization of (4.3). Right: Time series plot of the transformed observations Y 0 n. Sample size n =

33 Figure 4: Example 4: Left: Time series plot of a realization of (4.4). Right: Time series plot of the transformed observations Y 0 n. Sample size n =

34 Figure 5: Example 5: Time series plot of the transformed observations Y 0 n based on (4.5). Sample size n =

35 Figure 6: Left: Growth rate of the U.S. GNP data from 1947 to Right: plot of the transformed observations. 34

36 Table 1: Percentage of correct identification of the number of thresholds (%m 0 ), bias and empirical standard deviation (ESD) of the GOGA+HDIC+Trim. The corresponding values of the two-step estimation in Chan et al. (2015) are given in the parenthesis. Replication=1000. n %m 0 r 1 r Bias (97) (0.010) (0.012) ESD (0.056) (0.018) Bias (96) (0.006) (0.008) ESD (0.037) (0.011) Bias (95.7) (0.004) (0.007) ESD (0.027) (0.009) 35

37 Table 2: Percentage of correct identification of the number of thresholds (%m 0 ), bias and empirical standard deviation (ESD) of the GOGA+HDIC+Trim. The corresponding values of the two-step estimation in Chan et al. (2015) are given in the parenthesis. Replication=1000. n %m 0 r 1 r Bias (90) (0.005) (0.009) ESD (0.025) (0.036) Bias (89.3) (0.007) (0.008) ESD (0.027) (0.011) Bias (91.4) (0.000) (0.005) ESD (0.026) (0.007) 36

38 Table 3: Percentage of correct identification of the number of thresholds (%m 0 ), bias and empirical standard deviation (ESD) of the GOGA+HDIC+Trim. The corresponding values of the two-step estimation in Chan et al. (2015) are given in the parenthesis. Replication=1000. n %m 0 r 1 r Bias (71.2) (0.002) (0.020) ESD (0.056) (0.107) Bias (72.5) (0.000) (0.015) ESD (0.036) (0.069) Bias (71) (0.002) (0.012) ESD (0.033) (0.053) 37

39 Table 4: Percentage of correct identification of the number of thresholds (%m 0 ), bias and empirical standard deviation (ESD) of the GOGA+HDIC+Trim. The corresponding values of the two-step estimation in Chan et al. (2015) are given in the parenthesis. Replication=1000. n %m 0 r 1 r Bias (93.3) (0.013) (0.012) ESD (0.040) (0.084) Bias (96) (0.007) (0.010) ESD (0.025) (0.012) Bias (96.6) (0.007) (0.008) ESD (0.020) (0.010) 38

40 Table 5: Percentage of correct identification of the number of thresholds (%m 0 ), bias and empirical standard deviation (ESD) of the GOGA+HDIC+Trim. The corresponding values of the two-step estimation in Chan et al. (2015) are given in the parenthesis. Replication=1000. n %m 0 r 1 r 2 r 3 r 4 r 5 r 6 r 7 r Bias (54.7) (0.003) (0.003) (0.002) (0.004) (0.003) (0.096) (0.059) (0.030) ESD (0.004) (0.003) (0.012) (0.014) (0.012) (0.170) (0.050) (0.137) Bias (41) (0.004) (0.004) (0.003) (0.002) (0.004) (0.056) (0.622) (0.283) ESD (0.006) (0.004) (0.019) (0.046) (0.051) (0.117) (0.440) (0.516) Bias (20.4) (0.006) (0.006) (0.002) (0.005) (0.003) (0.043) (0.981) (0.758) ESD (0.009) (0.008) (0.020) (0.026) (0.017) (0.025) (0.230) (0.475) 39

On High-Dimensional Cross-Validation

On High-Dimensional Cross-Validation On High-Dimensional Cross-Validation BY WEI-CHENG HSIAO Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan hsiaowc@stat.sinica.edu.tw 5 WEI-YING

More information

University, Tempe, Arizona, USA b Department of Mathematics and Statistics, University of New. Mexico, Albuquerque, New Mexico, USA

University, Tempe, Arizona, USA b Department of Mathematics and Statistics, University of New. Mexico, Albuquerque, New Mexico, USA This article was downloaded by: [University of New Mexico] On: 27 September 2012, At: 22:13 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered

More information

The American Statistician Publication details, including instructions for authors and subscription information:

The American Statistician Publication details, including instructions for authors and subscription information: This article was downloaded by: [National Chiao Tung University 國立交通大學 ] On: 27 April 2014, At: 23:13 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954

More information

Use and Abuse of Regression

Use and Abuse of Regression This article was downloaded by: [130.132.123.28] On: 16 May 2015, At: 01:35 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer

More information

Gilles Bourgeois a, Richard A. Cunjak a, Daniel Caissie a & Nassir El-Jabi b a Science Brunch, Department of Fisheries and Oceans, Box

Gilles Bourgeois a, Richard A. Cunjak a, Daniel Caissie a & Nassir El-Jabi b a Science Brunch, Department of Fisheries and Oceans, Box This article was downloaded by: [Fisheries and Oceans Canada] On: 07 May 2014, At: 07:15 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office:

More information

Testing Goodness-of-Fit for Exponential Distribution Based on Cumulative Residual Entropy

Testing Goodness-of-Fit for Exponential Distribution Based on Cumulative Residual Entropy This article was downloaded by: [Ferdowsi University] On: 16 April 212, At: 4:53 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 172954 Registered office: Mortimer

More information

A Consistent Model Selection Criterion for L 2 -Boosting in High-Dimensional Sparse Linear Models

A Consistent Model Selection Criterion for L 2 -Boosting in High-Dimensional Sparse Linear Models A Consistent Model Selection Criterion for L 2 -Boosting in High-Dimensional Sparse Linear Models Tze Leung Lai, Stanford University Ching-Kang Ing, Academia Sinica, Taipei Zehao Chen, Lehman Brothers

More information

Statistica Sinica Preprint No: SS

Statistica Sinica Preprint No: SS Statistica Sinica Preprint No: SS-2017-0500 Title Lasso-based Variable Selection of ARMA Models Manuscript ID SS-2017-0500 URL http://www.stat.sinica.edu.tw/statistica/ DOI 10.5705/ss.202017.0500 Complete

More information

University, Wuhan, China c College of Physical Science and Technology, Central China Normal. University, Wuhan, China Published online: 25 Apr 2014.

University, Wuhan, China c College of Physical Science and Technology, Central China Normal. University, Wuhan, China Published online: 25 Apr 2014. This article was downloaded by: [0.9.78.106] On: 0 April 01, At: 16:7 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 10795 Registered office: Mortimer House,

More information

Guangzhou, P.R. China

Guangzhou, P.R. China This article was downloaded by:[luo, Jiaowan] On: 2 November 2007 Access Details: [subscription number 783643717] Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number:

More information

Accepted author version posted online: 28 Nov 2012.

Accepted author version posted online: 28 Nov 2012. This article was downloaded by: [University of Minnesota Libraries, Twin Cities] On: 15 May 2013, At: 12:23 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954

More information

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Qiwei Yao Department of Statistics, London School of Economics q.yao@lse.ac.uk Joint work with: Hongzhi An, Chinese Academy

More information

Open problems. Christian Berg a a Department of Mathematical Sciences, University of. Copenhagen, Copenhagen, Denmark Published online: 07 Nov 2014.

Open problems. Christian Berg a a Department of Mathematical Sciences, University of. Copenhagen, Copenhagen, Denmark Published online: 07 Nov 2014. This article was downloaded by: [Copenhagen University Library] On: 4 November 24, At: :7 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 72954 Registered office:

More information

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model

More information

Forward Regression for Ultra-High Dimensional Variable Screening

Forward Regression for Ultra-High Dimensional Variable Screening Forward Regression for Ultra-High Dimensional Variable Screening Hansheng Wang Guanghua School of Management, Peking University This version: April 9, 2009 Abstract Motivated by the seminal theory of Sure

More information

Online publication date: 22 March 2010

Online publication date: 22 March 2010 This article was downloaded by: [South Dakota State University] On: 25 March 2010 Access details: Access Details: [subscription number 919556249] Publisher Taylor & Francis Informa Ltd Registered in England

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu Department of Statistics, University of Illinois at Urbana-Champaign WHOA-PSI, Aug, 2017 St. Louis, Missouri 1 / 30 Background Variable

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Asymptotic inference for a nonstationary double ar(1) model

Asymptotic inference for a nonstationary double ar(1) model Asymptotic inference for a nonstationary double ar() model By SHIQING LING and DONG LI Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong maling@ust.hk malidong@ust.hk

More information

High-dimensional variable selection via tilting

High-dimensional variable selection via tilting High-dimensional variable selection via tilting Haeran Cho and Piotr Fryzlewicz September 2, 2010 Abstract This paper considers variable selection in linear regression models where the number of covariates

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

Published online: 05 Oct 2006.

Published online: 05 Oct 2006. This article was downloaded by: [Dalhousie University] On: 07 October 2013, At: 17:45 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office:

More information

Full terms and conditions of use:

Full terms and conditions of use: This article was downloaded by:[rollins, Derrick] [Rollins, Derrick] On: 26 March 2007 Access Details: [subscription number 770393152] Publisher: Taylor & Francis Informa Ltd Registered in England and

More information

Dissipation Function in Hyperbolic Thermoelasticity

Dissipation Function in Hyperbolic Thermoelasticity This article was downloaded by: [University of Illinois at Urbana-Champaign] On: 18 April 2013, At: 12:23 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954

More information

Precise Large Deviations for Sums of Negatively Dependent Random Variables with Common Long-Tailed Distributions

Precise Large Deviations for Sums of Negatively Dependent Random Variables with Common Long-Tailed Distributions This article was downloaded by: [University of Aegean] On: 19 May 2013, At: 11:54 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

Published online: 17 May 2012.

Published online: 17 May 2012. This article was downloaded by: [Central University of Rajasthan] On: 03 December 014, At: 3: Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 107954 Registered

More information

Ankara, Turkey Published online: 20 Sep 2013.

Ankara, Turkey Published online: 20 Sep 2013. This article was downloaded by: [Bilkent University] On: 26 December 2013, At: 12:33 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office:

More information

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach By Shiqing Ling Department of Mathematics Hong Kong University of Science and Technology Let {y t : t = 0, ±1, ±2,

More information

To cite this article: Edward E. Roskam & Jules Ellis (1992) Reaction to Other Commentaries, Multivariate Behavioral Research, 27:2,

To cite this article: Edward E. Roskam & Jules Ellis (1992) Reaction to Other Commentaries, Multivariate Behavioral Research, 27:2, This article was downloaded by: [Memorial University of Newfoundland] On: 29 January 2015, At: 12:02 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered

More information

Nacional de La Pampa, Santa Rosa, La Pampa, Argentina b Instituto de Matemática Aplicada San Luis, Consejo Nacional de Investigaciones Científicas

Nacional de La Pampa, Santa Rosa, La Pampa, Argentina b Instituto de Matemática Aplicada San Luis, Consejo Nacional de Investigaciones Científicas This article was downloaded by: [Sonia Acinas] On: 28 June 2015, At: 17:05 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates

More information

Online publication date: 01 March 2010 PLEASE SCROLL DOWN FOR ARTICLE

Online publication date: 01 March 2010 PLEASE SCROLL DOWN FOR ARTICLE This article was downloaded by: [2007-2008-2009 Pohang University of Science and Technology (POSTECH)] On: 2 March 2010 Access details: Access Details: [subscription number 907486221] Publisher Taylor

More information

George L. Fischer a, Thomas R. Moore b c & Robert W. Boyd b a Department of Physics and The Institute of Optics,

George L. Fischer a, Thomas R. Moore b c & Robert W. Boyd b a Department of Physics and The Institute of Optics, This article was downloaded by: [University of Rochester] On: 28 May 2015, At: 13:34 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office:

More information

Geometric View of Measurement Errors

Geometric View of Measurement Errors This article was downloaded by: [University of Virginia, Charlottesville], [D. E. Ramirez] On: 20 May 205, At: 06:4 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number:

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Characterizations of Student's t-distribution via regressions of order statistics George P. Yanev a ; M. Ahsanullah b a

Characterizations of Student's t-distribution via regressions of order statistics George P. Yanev a ; M. Ahsanullah b a This article was downloaded by: [Yanev, George On: 12 February 2011 Access details: Access Details: [subscription number 933399554 Publisher Taylor & Francis Informa Ltd Registered in England and Wales

More information

Discussion on Change-Points: From Sequential Detection to Biology and Back by David Siegmund

Discussion on Change-Points: From Sequential Detection to Biology and Back by David Siegmund This article was downloaded by: [Michael Baron] On: 2 February 213, At: 21:2 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 172954 Registered office: Mortimer

More information

Park, Pennsylvania, USA. Full terms and conditions of use:

Park, Pennsylvania, USA. Full terms and conditions of use: This article was downloaded by: [Nam Nguyen] On: 11 August 2012, At: 09:14 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer

More information

Communications in Algebra Publication details, including instructions for authors and subscription information:

Communications in Algebra Publication details, including instructions for authors and subscription information: This article was downloaded by: [Professor Alireza Abdollahi] On: 04 January 2013, At: 19:35 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered

More information

In Search of Desirable Compounds

In Search of Desirable Compounds In Search of Desirable Compounds Adrijo Chakraborty University of Georgia Email: adrijoc@uga.edu Abhyuday Mandal University of Georgia Email: amandal@stat.uga.edu Kjell Johnson Arbor Analytics, LLC Email:

More information

CCSM: Cross correlogram spectral matching F. Van Der Meer & W. Bakker Published online: 25 Nov 2010.

CCSM: Cross correlogram spectral matching F. Van Der Meer & W. Bakker Published online: 25 Nov 2010. This article was downloaded by: [Universiteit Twente] On: 23 January 2015, At: 06:04 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office:

More information

arxiv: v1 [stat.me] 30 Dec 2017

arxiv: v1 [stat.me] 30 Dec 2017 arxiv:1801.00105v1 [stat.me] 30 Dec 2017 An ISIS screening approach involving threshold/partition for variable selection in linear regression 1. Introduction Yu-Hsiang Cheng e-mail: 96354501@nccu.edu.tw

More information

The Fourier transform of the unit step function B. L. Burrows a ; D. J. Colwell a a

The Fourier transform of the unit step function B. L. Burrows a ; D. J. Colwell a a This article was downloaded by: [National Taiwan University (Archive)] On: 10 May 2011 Access details: Access Details: [subscription number 905688746] Publisher Taylor & Francis Informa Ltd Registered

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

A note on adaptation in garch models Gloria González-Rivera a a

A note on adaptation in garch models Gloria González-Rivera a a This article was downloaded by: [CDL Journals Account] On: 3 February 2011 Access details: Access Details: [subscription number 922973516] Publisher Taylor & Francis Informa Ltd Registered in England and

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

The Risk of James Stein and Lasso Shrinkage

The Risk of James Stein and Lasso Shrinkage Econometric Reviews ISSN: 0747-4938 (Print) 1532-4168 (Online) Journal homepage: http://tandfonline.com/loi/lecr20 The Risk of James Stein and Lasso Shrinkage Bruce E. Hansen To cite this article: Bruce

More information

Full terms and conditions of use:

Full terms and conditions of use: This article was downloaded by:[smu Cul Sci] [Smu Cul Sci] On: 28 March 2007 Access Details: [subscription number 768506175] Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered

More information

Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club

Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club 36-825 1 Introduction Jisu Kim and Veeranjaneyulu Sadhanala In this report

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

Discussion of High-dimensional autocovariance matrices and optimal linear prediction,

Discussion of High-dimensional autocovariance matrices and optimal linear prediction, Electronic Journal of Statistics Vol. 9 (2015) 1 10 ISSN: 1935-7524 DOI: 10.1214/15-EJS1007 Discussion of High-dimensional autocovariance matrices and optimal linear prediction, Xiaohui Chen University

More information

Published online: 10 Apr 2012.

Published online: 10 Apr 2012. This article was downloaded by: Columbia University] On: 23 March 215, At: 12:7 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 172954 Registered office: Mortimer

More information

Bi-level feature selection with applications to genetic association

Bi-level feature selection with applications to genetic association Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may

More information

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008 Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)

More information

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:

More information

Issues on quantile autoregression

Issues on quantile autoregression Issues on quantile autoregression Jianqing Fan and Yingying Fan We congratulate Koenker and Xiao on their interesting and important contribution to the quantile autoregression (QAR). The paper provides

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Outlier detection and variable selection via difference based regression model and penalized regression

Outlier detection and variable selection via difference based regression model and penalized regression Journal of the Korean Data & Information Science Society 2018, 29(3), 815 825 http://dx.doi.org/10.7465/jkdi.2018.29.3.815 한국데이터정보과학회지 Outlier detection and variable selection via difference based regression

More information

On Mixture Regression Shrinkage and Selection via the MR-LASSO

On Mixture Regression Shrinkage and Selection via the MR-LASSO On Mixture Regression Shrinage and Selection via the MR-LASSO Ronghua Luo, Hansheng Wang, and Chih-Ling Tsai Guanghua School of Management, Peing University & Graduate School of Management, University

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

Econometric modeling of the relationship among macroeconomic variables of Thailand: Smooth transition autoregressive regression model

Econometric modeling of the relationship among macroeconomic variables of Thailand: Smooth transition autoregressive regression model The Empirical Econometrics and Quantitative Economics Letters ISSN 2286 7147 EEQEL all rights reserved Volume 1, Number 4 (December 2012), pp. 21 38. Econometric modeling of the relationship among macroeconomic

More information

arxiv: v2 [math.st] 12 Feb 2008

arxiv: v2 [math.st] 12 Feb 2008 arxiv:080.460v2 [math.st] 2 Feb 2008 Electronic Journal of Statistics Vol. 2 2008 90 02 ISSN: 935-7524 DOI: 0.24/08-EJS77 Sup-norm convergence rate and sign concentration property of Lasso and Dantzig

More information

Tong University, Shanghai , China Published online: 27 May 2014.

Tong University, Shanghai , China Published online: 27 May 2014. This article was downloaded by: [Shanghai Jiaotong University] On: 29 July 2014, At: 01:51 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered

More information

Online publication date: 30 March 2011

Online publication date: 30 March 2011 This article was downloaded by: [Beijing University of Technology] On: 10 June 2011 Access details: Access Details: [subscription number 932491352] Publisher Taylor & Francis Informa Ltd Registered in

More information

Spatial Lasso with Applications to GIS Model Selection

Spatial Lasso with Applications to GIS Model Selection Spatial Lasso with Applications to GIS Model Selection Hsin-Cheng Huang Institute of Statistical Science, Academia Sinica Nan-Jung Hsu National Tsing-Hua University David Theobald Colorado State University

More information

Feature selection with high-dimensional data: criteria and Proc. Procedures

Feature selection with high-dimensional data: criteria and Proc. Procedures Feature selection with high-dimensional data: criteria and Procedures Zehua Chen Department of Statistics & Applied Probability National University of Singapore Conference in Honour of Grace Wahba, June

More information

Obtaining Critical Values for Test of Markov Regime Switching

Obtaining Critical Values for Test of Markov Regime Switching University of California, Santa Barbara From the SelectedWorks of Douglas G. Steigerwald November 1, 01 Obtaining Critical Values for Test of Markov Regime Switching Douglas G Steigerwald, University of

More information

The lasso, persistence, and cross-validation

The lasso, persistence, and cross-validation The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University

More information

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Yale School of Public Health Joint work with Ning Hao, Yue S. Niu presented @Tsinghua University Outline 1 The Problem

More information

Linear model selection and regularization

Linear model selection and regularization Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It

More information

Iterative Selection Using Orthogonal Regression Techniques

Iterative Selection Using Orthogonal Regression Techniques Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department

More information

High-dimensional variable selection via tilting

High-dimensional variable selection via tilting High-dimensional variable selection via tilting Haeran Cho and Piotr Fryzlewicz September 2, 2011 Abstract This paper considers variable selection in linear regression models where the number of covariates

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

Version of record first published: 01 Sep 2006.

Version of record first published: 01 Sep 2006. This article was downloaded by: [University of Miami] On: 27 November 2012, At: 08:47 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office:

More information

Diatom Research Publication details, including instructions for authors and subscription information:

Diatom Research Publication details, including instructions for authors and subscription information: This article was downloaded by: [Saúl Blanco] On: 26 May 2012, At: 09:38 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

Comparisons of penalized least squares. methods by simulations

Comparisons of penalized least squares. methods by simulations Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy

More information

Marginal Screening and Post-Selection Inference

Marginal Screening and Post-Selection Inference Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA

LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA The Annals of Statistics 2009, Vol. 37, No. 1, 246 270 DOI: 10.1214/07-AOS582 Institute of Mathematical Statistics, 2009 LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA BY NICOLAI

More information

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.

More information

MSE Performance and Minimax Regret Significance Points for a HPT Estimator when each Individual Regression Coefficient is Estimated

MSE Performance and Minimax Regret Significance Points for a HPT Estimator when each Individual Regression Coefficient is Estimated This article was downloaded by: [Kobe University] On: 8 April 03, At: 8:50 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 07954 Registered office: Mortimer House,

More information

Theoretical results for lasso, MCP, and SCAD

Theoretical results for lasso, MCP, and SCAD Theoretical results for lasso, MCP, and SCAD Patrick Breheny March 2 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/23 Introduction There is an enormous body of literature concerning theoretical

More information

Post-Selection Inference

Post-Selection Inference Classical Inference start end start Post-Selection Inference selected end model data inference data selection model data inference Post-Selection Inference Todd Kuffner Washington University in St. Louis

More information

Economics Division University of Southampton Southampton SO17 1BJ, UK. Title Overlapping Sub-sampling and invariance to initial conditions

Economics Division University of Southampton Southampton SO17 1BJ, UK. Title Overlapping Sub-sampling and invariance to initial conditions Economics Division University of Southampton Southampton SO17 1BJ, UK Discussion Papers in Economics and Econometrics Title Overlapping Sub-sampling and invariance to initial conditions By Maria Kyriacou

More information

Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building

Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building Cong Liu, Tao Shi and Yoonkyung Lee Department of Statistics, The Ohio State University Abstract Variable selection

More information

Sparsity in Underdetermined Systems

Sparsity in Underdetermined Systems Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2

More information

Statistica Sinica Preprint No: SS R3

Statistica Sinica Preprint No: SS R3 Statistica Sinica Preprint No: SS-2015-0413.R3 Title Regularization after retention in ultrahigh dimensional linear regression models Manuscript ID SS-2015-0413.R3 URL http://www.stat.sinica.edu.tw/statistica/

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

University of Thessaloniki, Thessaloniki, Greece

University of Thessaloniki, Thessaloniki, Greece This article was downloaded by:[bochkarev, N.] On: 14 December 2007 Access Details: [subscription number 746126554] Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number:

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information