Improved model selection criteria for SETAR time series models

Size: px

Start display at page:

Download "Improved model selection criteria for SETAR time series models"

Emmeline Lawrence
6 years ago
Views:

Journal of Statistical Planning and Inference 37 2007 2802 284 www.elsevier.

Compostela, 5782 Santiago de Compostela, Spain b Departamento de Estadística, Universidad Carlos III de Madrid, 28903 Getafe, Madrid, Spain Received 20 February 2006; received in revised form 8

1 Journal of Statistical Planning and Inference Improved model selection criteria for SEAR time series models Pedro Galeano a,, Daniel Peña b a Departamento de Estadística e Investigación Operativa, Universidad de Santiago de Compostela, 5782 Santiago de Compostela, Spain b Departamento de Estadística, Universidad Carlos III de Madrid, Getafe, Madrid, Spain Received 20 February 2006; received in revised form 8 September 2006; accepted 6 October 2006 Available online February 2007 Abstract he purpose of this paper is threefold. First, we obtain the asymptotic properties of the modified model selection criteria proposed by Hurvich et al Improved estimators of Kullback-Leibler information for autoregressive model selection in small samples. Biometrika 77, for autoregressive models. Second, we provide some highlights on the better performance of this modified criteria. hird, we extend the modification introduced by these authors to model selection criteria commonly used in the class of self-exciting threshold autoregressive SEAR time series models. We show the improvements of the modified criteria in their finite sample performance. In particular, for small and medium sample size the frequency of selecting the true model improves for the consistent criteria and the root mean square error RMSE of prediction improves for the efficient criteria. hese results are illustrated via simulation with SEAR models in which we assume that the threshold and the parameters are unknown Elsevier B.V. All rights reserved. Keywords: Asymptotic efficiency; Autoregressive models; Consistency; Model selection criteria; SEAR models. Introduction Since the seminal work by Akaike 969, model selection criteria have become a widely used tool for selecting the order of different time series models. Most of these criteria can be classified into two groups. he first one includes the efficient criteria, which asymptotically select the model which produces the least mean square prediction error. he final prediction error criterion FPE, by Akaike 969, the Akaike information criterion AIC, by Akaike 973 and the corrected Akaike information criterion AICc, by Hurvich and sai 989 are efficient criteria. he FPE selects the model that minimizes the one step ahead square prediction error. he AIC is an estimator of the expected Kullback Leibler divergence between the true and the fitted model, while the AICc is a bias correction form of the AIC that appears to work better in small samples. he second group includes the consistent criteria, which, under the assumption that the data come from a finite order autoregressive process, asymptotically select the true order of the process. he Bayesian information criterion BIC, by Schwarz 978, and the Hannan Quinn criterion HQC, by Hannan and Quinn 979, are consistent criteria. he BIC approaches the posterior probabilities of the models, while the HQC is designed to be a consistent criterion with the fastest convergence rate to the true model. Corresponding author. el.: x3207; fax: address: pgaleano@usc.es P. Galeano /$ - see front matter 2007 Elsevier B.V. All rights reserved. doi:0.06/j.jspi

2 P. Galeano, D. Peña / Journal of Statistical Planning and Inference All these criteria can be written compactly as members of the family of criteria: min { p log σ2 p + p + C,p + }, where p is the order of the autoregressive process, σ 2 p is the maximum likelihood estimate of the residual variance of the process, is the sample size and C,p + = p+ +p+ log p+ for the FPE, C,p + = 2 for the AIC, C,p + = for the AICc, C,p + = log, for the BIC and C,p + = 2m log log with 2 p+ m>, for the HQC. Hurvich et al. 990 further approximated the expected Kullback Leibler divergence to derive a criterion, AICc, which can be written as follows: min p { log Σ α p + 2p+ p + }, 2 where Σ α p is the determinant of the estimated covariance matrix of the series under the autoregressive process with order p and parameters α p, that will be defined in Section 2. hese authors also introduced the AIC and BIC criteria, by replacing log σ 2 p by the determinant term in the AIC and BIC criteria, and showed in a Monte Carlo experiment the good performance on this modification. However, they did not study the asymptotic properties of these modified criteria. he first contribution of this paper is to show that the asymptotic properties of the original criteria applies to the modified criteria 2. hus, we show the efficiency of AIC and AICc and the consistency of BIC. Although Hurvich et al. 990 showed via simulation the better performance of the modified criteria, no theoretical reasons have been given explaining this improvements. he second contribution of this paper is to provide three interpretations on the advantages of using the determinant term by using three different comparisons: the one step ahead prediction variances; 2 the correlation structure; 3 a measure of the goodness of the fit. A useful nonlinear extension of linear time series models are the self-exciting threshold autoregressive SEAR models, see ong 990. hese models can explain interesting features found in real data, such as asymmetric limit cycles, jump phenomena, chaos and so on. Model selection for SEAR models has been addressed in several papers. ong 990 suggested to use the AIC but no theoretical justification was given. Wong and Li 998 showed that the AICc criterion is an asymptotically unbiased estimator of the expected Kullback Leibler information for SEAR models and analyzed the small sample properties of AIC, AICc and BIC via simulation experiments. Kapetanios 200 extended some of the existing theoretical results for several model selection criteria in linear models to threshold models. De Gooijer 200 proposed three cross-validation criteria. Campbell 2004 and Unnikrishnan 2004 developed Bayesian model selection within a Markov Chain Monte Carlo MCMC framework, and, finally, Öhrvik and Schoier 2005 studied the performance of several bootstrap selection criteria. As a SEAR model is piecewise autoregressive linear, it seems natural to extend the modification considered by Hurvich et al. 990 for autoregressive models to these nonlinear models. he third contribution of this paper is to present new SEAR model selection criteria based on the determinant term and show via a Monte Carlo study the better performance for small and medium sample size of these modified criteria. he rest of this paper is organized as follows. Section 2 briefly reviews model selection criteria for the class of linear autoregressive models, proves that the correction by the determinant term keeps their asymptotic properties, and provides some intuition to justify why this correction can improve their performance. Section 3 develops the modification by the determinant term for SEAR time series model selection criteria. Section 4 shows the better performance of the modified criteria in a Monte Carlo experiment. 2. Model selection criteria for the class of linear autoregressive processes 2.. Model selection criteria for autoregressive processes Suppose it is known that a given time series, x = x,...,x, has been generated by the class of autoregressive AR Gaussian processes, given by x t x t p x t p = a t, t =...,, 0,,...,

3 2804 P. Galeano, D. Peña / Journal of Statistical Planning and Inference where a t is a sequence of independent Gaussian distributed random variables with zero mean and variance σ 2 p.we assume that p {0,...,p max }, where p max is an upper bound. he ARp model, denoted by M p, has parameters α p = β p, σ2 p, where β p is the p max vector of parameters, β p = p,..., pp, 0,...,0. }{{} p max We assume that the models M p are causal, invertible and stationary. hen the covariance matrix of x can be written as Σα p = σ 2 p Qβ p, where Qβ p is a matrix which only depends on β p. Let, Qβ p = Lβ p L β p be the Cholesky decomposition of Qβ p such that aβ p = Lβ p x. We denote the parameters of the model that have generated the data as α 0 = β 0, σ2 0. In practice, the model parameters are unknown. he maximum likelihood estimates of the vector of parameters β p and the innovations variance σ 2 p, denoted by β p and σ 2 p, respectively, are obtained after maximizing the likelihood function, px M p, given by: px M p = 2π /2 Σα p /2 exp 2 x Σ α p x. 3 Akaike 973 proposed to select the model which minimizes the expected Kullback Leibler divergence between the fitted and the true model, defined by, [ [ E αp E α0 2 log py α ]] 0 = E αp [E α0 [ 2 log py α p ]] E αp [E α0 [ 2 log py α 0 ]] 4 py α p for an arbitrary realization y = y,...,y of the process. As 4 is always positive, minimizing it implies making py α p as close as possible to py α 0, in the expected Kullback Leibler divergence. As the second term of the expected Kullback Leibler divergence is constant for all the models, minimizing 4 is equivalent to minimize, [ ] E αp [E α0 [ 2 log py α p ]] = 2 log py α p py α 0 dy p α p α 0 d α p, 5 where y and α p are assumed to be independent. hus, the rule proposed by Akaike selects the autoregressive model that minimizes 5 with respect to the two sources of uncertainty: the distribution of future observations given the parameters and the distribution of the estimate. Akaike 973 approached 5 as follows: E αp [E α0 [ 2 log py α p ]] = log 2π + + log σ 2 p + 2p + + o p, which leads to the AIC AICp = log σ 2 p + 2p +. Hurvich and sai 989 obtained an approximation of 5 which reduces the small sample bias of the approximation by Akaike 973, and is given by E αp [E α0 [ 2 log py α p ]] = log 2π + + log σ 2 p + 2p+ p + + o p which leads to the AICc AICcp = log σ 2 p + 2p+ p +. From the Bayesian point of view, the model selected is the one with maximum posterior probability, pm p x, where pm p x px M p pm p, and px M p = px α p,m p pα p M p dα p, such that pm p and pα p M p are the

4 P. Galeano, D. Peña / Journal of Statistical Planning and Inference prior probabilities of models and parameters, respectively. Schwarz 978 approximated the posterior probability pm p x to derive the BIC, given by BICp = log σ 2 p + log p +. he criteria AIC, AICc and BIC can be written in a compact way as members of the family of criteria, min { p log σ2 p + p + C,p + }, 6 2 where C,p + is 2, for AIC, p+, for AICc, and log, for BIC. Hurvich et al. 990 noted that the expected Kullback Leibler divergence was better approximated if log σ 2 p is replaced by the term log Σ α p and defined the AIC p, AICc p and BIC p criteria, which may be written in compact form as: min p {log Σ α p +p + C,p + }. 7 hese authors showed by simulation the advantages of considering log Σ α p instead of log σ 2 p for autoregression fitting, but did not analyze the asymptotic properties of the modified criteria. First, we show that the AIC and AICc criteria are efficient in the following heorem, whose proof is in Appendix A. heorem. Assume that the following assumptions hold: A {x t } is generated by a stationary process x t x t 2 x t 2 =a t, t =,, 0,, where a t is a sequence of independent Gaussian distributed random variables with zero mean and variance σ 2 a and j= j < ; A2 the polynomial z = z 2 z 2, is nonzero for every complex number z with z ; A3 the upper bound p max is a sequence of positive integers which depends on such that p max and p max / 0 as ; A4 {x t }is not degenerate to a finite order autoregressive process. hen, the AIC and AICc are efficient criteria. On the other hand, the consistency property of the BIC criterion is established in the following heorem, proved in Appendix A. heorem 2. Assume that the following assumptions hold: B {x t } is generated by a stationary process x t x t p x t p = a t, t =,, 0,,...where a t is a sequence of independent Gaussian distributed random variables with zero mean and variance σ 2 a ; B2 the polynomial z = z 2 z 2 p z p, is nonzero for every complex number z with z ; B3 the upper bound p max is fixed and known a priori. hen, the BIC is a consistent criteria. hus, AIC, AICc and BIC are similar to AIC, AICc and BIC for large samples but as we will see in the Monte Carlo study in Section 4, in small and medium sample settings, the difference between the performance of these criteria may be substantial hree interpretations on the advantages of using AIC, AICc and BIC Next, we provide three interpretations on the advantages of using the determinant term Σ α p by showing that this term leads to a better comparison of the models under consideration by the one step ahead prediction variances; 2 the correlation structure; 3 a goodness of fit test Interpretation by one step ahead prediction variances o show the first interpretation, let x t p be the one step ahead mean square predictions under the M p model and let e t p = x t x t p be the corresponding one step ahead prediction errors. hese errors have variances which can be written as E[e t p 2 ]=σ 2 p v2 t p see, Harvey, 98. For instance, for the AR model, v2 t = / 2, for t =,

5 2806 P. Galeano, D. Peña / Journal of Statistical Planning and Inference and vt 2 p =, for t>. hus, the logarithm of the determinant of the covariance matrix of x can be written as / log Σ α p = log σ 2 p v2 t p = log σ 2 p v2 t p, where v 2 t p are obtained after replacing the estimated parameters in v2 t p, and log Σ α p is times the logarithm of the geometric mean of the estimated one step ahead prediction variances. herefore, the difference AIC p + AIC p can be written as AIC p + AIC p = log σ2 p+ v2 t p + σ p v2 t p he first term measures the relative change between the one step ahead prediction variances, while the second term is a penalization for the inclusion of one additional parameter. herefore, AIC p + <AIC p if the geometric mean of the one step ahead prediction variances under the model M p+ is significatively smaller than the corresponding mean under the model M p, or in other words, the AIC will select the model which has a better predictive performance penalized by the number of parameters. As AIC p = log σ 2 p + log v t 2 p + 2p + = AICp + log v t 2 p, the AICp does not take into account the terms v t 2p, but only the estimated residual variance, σ2 p. he same conclusions holds for the AICc and BIC criteria Interpretation by the correlation structure Now we show that the determinant term provides a more sophisticated comparison of the autocorrelation structure of the models under consideration. As Qβ p = σ 2 x /σ2 p Rα p, where σ 2 x is the variance of x and Rα p is the correlation matrix of x, wehave Qβ p =σ 2 x /σ2 p Rα p. Durbin 960 and Ramsey 974 showed that p σ 2 x /σ2 p = p 2 ii β p, Rα p = 2 ii β p i, respectively, where ii β p are the partial autocorrelations of the process under the model M p, so that, Qβ p = p 2 ii β p i p 2 ii β p = p 2 ii β p i. Consequently, the criteria 7 can be written as follows: { p } min log σ 2 p p + p + C,p + i log 2 ii β p, while the difference AIC p + AIC p is AIC p + AIC p = log σ2 p+ σ 2 p + 2 p i log 2 ii β p+ 2 ii β p p + log 2 p+,p+ β p+. he first two terms are the one used by the AIC criterion but now two additional terms appear in the comparison. hey measure the discrepancy between all the partial autocorrelation coefficients under both hypothesis, M p and M p+, with weights that increase with the lag. herefore, AIC p + <AIC p if either: a σ 2 p+ is smaller enough than σ2 p

6 P. Galeano, D. Peña / Journal of Statistical Planning and Inference or b the weighted sum of the partial autocorrelation coefficients computed under the ARp model is greater enough than the corresponding sum under the M p+ model. Note that the last term is acting as a penalization term because it is always positive. he same interpretation applies to BIC and AICc, and the only difference is the penalization term for including one additional parameter Interpretation by a goodness of fit test A last interpretation is given in terms of goodness of fit tests for time series. Peña and Rodriguez 2006 used the log of the determinant of the autocorrelation matrix of the estimated residuals for testing goodness of fit in time series. As aβ p = Lβ p x, we can write [ ] aβ paβ p =[Lβ p ] xx [Lβ p ]. 8 hus, after fitting the model M p, a β p a β p = σ 2 p Ra β p, 9 where Ra β p is the sample correlation matrix of the estimated residuals, a β p. herefore, from 8 and 9, we have so that, log σ 2 p + log Q β p =log / xx log Ra β p AIC p + AIC p = log Ra β p log Ra β p+ +2. hus, AIC p + <AIC p if the logarithm of the determinant of the correlation matrix of the estimated residuals after the model M p+ is significative larger than the one for the model M p. he terms log Ra β p and log Ra β p+ are the values of the statistic proposed by Peña and Rodriguez 2006 for models M p and M p+. herefore, the AIC will select the model with have a significatively larger value of the statistic proposed by Peña and Rodriguez Consequently, the term log σ 2 p + log Q β p, can be seen as a measure of the goodness of fit of the model M p to the series x. As before, the same interpretation applies to BIC and AICc after changing the penalization term. 3. Model selection in SEAR models One of the most often used nonlinear time series model is the SEAR model. A time series data, x = x,...,x, generated by the class of SEAR processes follows the model: p j x t = j0 + ji x t i + a jt, if r j x t d <r j, j =,...,k, 0 where a jt are sequences of independent Gaussian distribution random variables with zero mean and variances σ 2 j.we assume that p j {0,...,pj max } and d {0,...,d max } are nonnegative integers, where pj max, j =,...,kand d max, are some upper bounds, and = r 0 <r < <r k <r k = are the thresholds. he SEARp,...,p k,dmodel, denoted by M p,...,p k,d, has parameters α p,...,p k,d = β p,...,p k,d, σ2,...,σ2 p k, where β p,...,p k,d is the k j= pj max + + k + 2 vector of parameters, β p,...,p k,d = 0,,..., p, 0,...,0, k0, k,..., kpk, 0,...,0,r 0,...,r }{{}}{{} k, d. k j= pj max k+ +

7 2808 P. Galeano, D. Peña / Journal of Statistical Planning and Inference We assume that the models M p,...,p k,d are stationary, ergodic with finite second moments and the stationary distribution of x = x,...,x admits a density that is positive everywhere. We denote the parameters of the model that have generated the data as α 0. Exact maximum likelihood estimates of the parameters of model 0 are not considered because of the complexity of the likelihood function. Assuming that d is known, the conditional log-likelihood of model 0 is given by log p c x α p,...,p k,d = 2 log2π 2 k j log σ 2 p j + S j,r j,r j σ 2, p j j= where j are the number of observations in each regime for the thresholds r,...,r k, j = j0,..., jpj and S j,r j,r j = r j x t d <r j ajt 2, j =,...,k. he conditional maximum likelihood estimators of the parameters, denoted by α p,...,p k,d = β p,...,p k,d, σ2,..., σ2 p k, are the values that maximize the conditional likelihood in, with residual variances σ 2 j = S j, r j, r j j, j =,...,k. Chan 993 showed the strong consistency of the conditional least squares estimators of the parameters for k =. Wong and Li 998 approximated the expected Kullback Leibler divergence for SEAR models as follows: E αp,...,p k,d [E α 0 [ 2 log p c y α p,...,p k,d]] = log2π + k i log σ 2 p i + k i i + p i + i p i 3 + o p, which leads to the AICc criterion for SEAR models. hese authors compared in a simulation study three model selection criteria, AIC, AICc and BIC, which for k regimes are given by min p,...,p k k [ j log σ 2 j + p j + C j j,p j + ], 2 j= j j +p j + where C j j,p j + is 2 for AIC, p j + j p j + 2 for AICc, and log j for BIC. he procedure proposed by Wong and Li 998 works as follows, when k = 2, r = r and d are unknown: a fix the maximum autoregressive and delay orders {p max,p2 max,d max }; b assume r [l,u] R, where l is the % percentile and u is the % percentile of x; c let x,...,x be the order statistics of x; d let I r ={[0.25 ],...,[0.75 ]}, where [ν ] is the largest integer less than ν. Set r = x i, i I r ; e calculate min p,p 2,d,x i {MSCp,p 2,d,x i }, where MSCp,p 2,d,x i is one of the model selection criteria in 2. he autoregressive orders p,p 2, the delay parameter, d, and the threshold, x i, selected are the ones that minimize MSCp,p 2,d,x i. Wong and Li 998 carried out a Monte Carlo experiment for different models and sample sizes for the criteria in 2, and conclude that the AICc has the best performance for small sample sizes and BIC for medium and large sample sizes. De Gooijer 200 proposed a procedure for selecting and estimating the parameters of a SEAR model by crossvalidation which works as follows: a d are as in the previous procedure; e omit one observation of the series, x t, and with the remaining data set obtain conditional least squares estimates of the parameters of the corresponding model, which we denote by t j, predict the omitted observation and obtain the predictive residual, a t t j, r j, r j ; f repeat the previous step for all the observations. he final model is the one that minimizes one of the criteria C,Ccand

8 P. Galeano, D. Peña / Journal of Statistical Planning and Inference Cu, written in compact way as follows: min p,...,p k log at 2 t j, r j, r j + k p j + C j j,p j +, 3 j= where C j j,p j + is 0 for C, j j +p j + p j + j p j 3 for Cc, and p j + [ j j +p j + j p j 3 + j log{ j j p j 2 }] for Cu. he C criterion was analyzed in Stoica et al. 986 for linear models and proved that for a given model, C =AIC+O /2. he Cc and the Cu criteria came from adding the penalty terms of the AICc and AICu criteria to the C criterion. he AICu is a criterion introduced by McQuarrie et al. 997 for linear models and is neither efficient nor consistent but it has a good performance in finite samples. For SEAR models the AICu criterion can be written as in 2 with C j j,p j + = p j + [ j j +p j + j p j 3 + j log{ j j p j 2 }]. As SEAR models are piecewise autoregressive linear, we propose to modify the criteria as in the autoregressive models case as follows. he criteria BIC, AIC, AICc and AICu are modified by adding the determinant term in each regime as follows: k min [ j log σ 2 j + p j + C j j,p j + + log Q j ]. 4 p,...,p k j= In order to compute the determinant term in each regime we first estimate the parameters of the model by conditional likelihood and then obtain the determinant term in each regime. For that we use the expression provided by Leeuw 994 who showed that Q j = M M NN, 5 where M and N are p j p j matrices with elements given by 0 a<b, M ab = a = b, a>b, j,a b { j,pj N ab = +a b, a b, 0, a>b. In the same way, we modify the cross-validation criteria C, Cc and Cu proposed by De Gooijer 200 in 3 by adding the determinant term in each regime. herefore, the modified cross-validation criteria C,Cc and Cu are defined as follows: min p,...,p k log at 2 t j, r j, r j + k [p j + C j j,p j + + log Q j ]. 6 j= he procedures in Wong and Li 998 and De Gooijer 200 are modified by adding the determinant term in the last step obtained with the conditional least squares estimates of the parameters with the whole series. hen, the final model selected is the one that minimizes one of the criteria in 6. he determinant term is computed using Monte Carlo experiments o evaluate the performance of the proposed criteria for SEAR models, 000 realizations were generated from the following two stationary SEAR models: { xt = 0.8x M t + a t, x t 0, x t = 0.2x t + a 2t, x t > 0, M2 { xt = 0.5x t + a t, x t 0, x t = 0.5x t + a 2t, x t > 0,

9 280 P. Galeano, D. Peña / Journal of Statistical Planning and Inference able Frequency of times of correct selection, root mean square errors of the threshold parameter and root mean square prediction errors assuming that d is known M = 30 BIC BIC AIC AIC AICc AICc AICu AICu C C Cc Cc Cu Cu p,p RMSE RMSPE p,p RMSE RMSPE = 50 p,p RMSE RMSPE p,p RMSE RMSPE = 00 p,p RMSE MSPE p,p RMSE RMSPE where a jt N0,, j =, 2. Based on Section 3, we compare the performance of the criteria in 2 with respect to the criteria in 4 and the criteria in 3 with respect to the criteria in 6. In all cases, 000 series were generated from models M and M2 with sample sizes = 3, 5 and 0. We proceed as in Wong and Li 998 and De Gooijer 200 by using a grid to estimate the threshold parameter r. We fit each model to the first observations of each series by conditional likelihood and obtain the determinant term in 5 in each regime. First, we assume that the delay parameter is known and fix p max =p2 max =5 for =3, 5 and 0, so that taking into account that the number of possible values of the threshold parameter is /2, we compare 375, 625 and 250 models, respectively. In every case, we consider the following measures of the performance of the model selection criteria: a the frequency detection of the correct order p,p 2 =, ; b the root mean square error RMSE of estimation of the threshold parameter and c the root mean square prediction error RMSPE for the last observation by using the model chosen by each criteria, the fitted parameters and the true value. he results are in able. It can be seen that for small sample size, =30, the improvement in the number of times in which the correct model is selected can be as large as 30.5% see C and C in M, for =50 as large as 50.5% see AIC and AIC in M and for = 00 as large as 22.6% see AIC and AIC in M. First part of able 3 includes the improvement percentage of the number of times in which the correct orders are selected by all the modified criteria. We note that theaicu, AICu, Cu and Cu have larger frequency detection for =30 but the frequency detection decreases when the sample size increases. On the other hand, the RMSE of estimation of the threshold parameter are very close for the original and modified criteria, whereas the RMSPE is usually smaller for the modified criteria. Now, we assume that the delay is unknown and apply the same design as before with p max = p2 max = 5, d max = 4 and = 3, 5 and 0. Now we compare 500, 2500 and 5000 models, respectively. In every case, we consider the same three measures of the performance of the model selection criteria, the frequency detection of the correct order p,p 2, RMSE and RMSPE but we also include the frequency detection of selecting the correct delay parameter d and the frequency detection of the correct orders and delay parameter, p,p 2,d. he results are given in able 2.

10 P. Galeano, D. Peña / Journal of Statistical Planning and Inference able 2 Frequency of times of correct selection, root mean square errors of the threshold parameter and root mean square prediction errors assuming that d is unknown M = 30 BIC BIC AIC AIC AICc AICc AICu AICu C C Cc Cc Cu Cu p,p d p,p 2,d RMSE RMSPE p,p d p,p 2,d RMSE RMSPE = 50 p,p d p,p 2,d RMSE RMSPE p,p d p,p 2,d RMSE RMSPE = 00 p,p d p,p 2,d RMSE MSPE p,p d p,p 2,d RMSE RMSPE It can be seen that for small sample size, = 30, the improvement in the number of times in which the correct orders p,p 2,d=,, are selected can be as large as 43.8% see AIC and AIC in M2, for =50 as large as 74.2% see AIC and AIC in M2 and for =00 as large as 42.2% seeaic and AIC in M. See the second part of able 3 to find the improvement percentage of the number of times in which the correct orders and delay parameter are selected by the modified criteria. As in the case in which d is assumed known, the AICu, AICu, Cu and Cu have the larger frequency detection for the true autoregression orders and delay parameters for = 30, but the frequency detection decreases when the sample size increases. We note that sometimes the modified criteria have a shorter frequency detection of the delay parameter, but this is not a drawback for them because the aim is to detect both the true autoregressive orders and the delay parameter, and not only the delay parameter. Regarding the RMSE and the RMSPE, the results are similar to the case in which d is assumed to be known. In terms of computational effort, the time needed to select the model by all the criteria considered in this paper for a series with = 3, 5 and 0, when both the threshold and the delay parameter are unknown are 2.8, 8.9 and 547 s, respectively, using a program written in Matlab with a Pentium M.

11 282 P. Galeano, D. Peña / Journal of Statistical Planning and Inference able 3 Improvement percentage of the number of times in which the correct orders are selected by the modified criteria up, assuming d is known, down, assuming d is unknown M = 30 BIC AIC AICc AICu C Cc Cu d Known p,p p,p = 50 p,p p,p = 00 p,p p,p d Unknown = 30 p,p 2,d p,p 2,d = 50 p,p 2,d p,p 2,d = 00 p,p 2,d p,p 2,d Acknowledgments We would like to thank the associate editor and two anonymous referees for their helpful comments, and Jan De Gooijer for making his code available to us. We acknowledge financial support by Ministerio de Educación y Ciencia Grant SEJ and Comunidad de Madrid Grant HSE/074/2004. he first author also acknowledges financial support by Xunta de Galicia under the Isidro Parga Pondal Program. Appendix A. Proof of heorem. Shibata 980 considers order selection criteria of the form: S o p = p max + δ p + 2p σ 2 p. he order chosen for the selection criteria S o p is efficient if δ p verifies the conditions imposed in heorem 4.2 of Shibata 980 δ p. p lim max = 0, p p max p max δ p δ p 2. p lim max = 0, p p max p max L p where p lim denotes limit in probability, L p, is the following function: L p = pσ2 a p max + i=p+ j=p+ i j Σ ij,

12 P. Galeano, D. Peña / Journal of Statistical Planning and Inference where Σ ij = Covx t,x t i j and p is a sequence of positive integers with p p max which attain the minimum of L p for each see Shibata, 980, p. 54. he AIC can be written in terms of S o p taking δ p = δ AIC p = exp 2p p max 2p. Shibata 980 has shown that this term verifies the two conditions, and this gives the asymptotic efficiency of AIC. We can write AIC in terms of S o p taking δ p = δ AIC p = exp 2p log Q β p / p max 2p. herefore, 2p p = δ AIC p exp δ AIC log Q β p /. We show that δ AIC p verifies both conditions. First we write, δ AIC p exp 2p log Q β p / 2p = p max p max p max. 7 Hannan 973 shows that log Qγ /, for every γ belonging to the parametric space, and consequently, log Q β p / and the limit when of the maximum of the values 7 in the set p p max is 0. his proves the first condition. For the second condition, we write the following decomposition: δ AIC p δ AIC p p max L p δaic p δ AIC p p max L p 2p exp log Q β p / exp 2p log Q β p / +. p max L p Shibata 980 showed that the first term tends to 0 implying that AIC is efficient. For the second expression, for any p such that p p max including p, it can be shown that, 2p p lim exp log Q β p / = log i log 2 ii β p <. As this limit is bounded for every p and p max L p when, for every p p max, the second expression also tends to 0. hen, δ AIC p verifies the second condition. herefore, AIC is efficient. As AICc is asymptotically equivalent to AIC, AICc is also efficient. Proof of heorem 2. he BIC can be written as follows by using one step ahead prediction variances: BIC p = log σ 2 p + log vt 2 p + p + log = log σ 2 p + p + p + log vt 2 p + log. Now, note that the last term in the previous expression verifies: log vt 2 p + p + log, log vt 2 p + p + log 0. his shows that the criterion BIC is under the conditions of heorem 3 in Hannan 980, p. 073, implying that BIC is consistent.

13 284 P. Galeano, D. Peña / Journal of Statistical Planning and Inference References Akaike, H., 969. Fitting autoregressive models for prediction. Ann. Inst. Statist. Math. 2, Akaike, H., 973. Information theory and an extension of the maximum likelihood principle. In: Proceedings of the Second International Symposium on Information heory. Akademiai Kiadó, Budapest, pp Campbell, E.P., Bayesian selection of threshold autoregressive models. J. ime Ser. Anal. 25, Chan, K.S., 993. Consistency and limiting distribution of the least squares estimator of a threshold autoregressive model. Ann. Statist. 2, De Gooijer, J.G., 200. Cross-validation criteria for SEAR model selection. J. ime Ser. Anal. 22, Durbin, J., 960. he fitting of time series models. Rev. Internat. Statist. Inst. 28, Hannan, E.J., 973. he asymptotic theory of linear time-series models. J. Appl. Probab. 0, Hannan, E.J., 980. Estimation of the order of an ARMA process. Ann. Statist. 8, Hannan, E.J., Quinn, B.G., 979. he determination of the order of an autoregression. J. Roy. Statist. Soc. Ser. B 4, Harvey, A.C., 98. Forecasting, Structural ime Series Models and the Kalman Filter. Cambridge University Press, Cambridge. Hurvich, C.M., sai, C., 989. Regression and time series model selection in small samples. Biometrika 76, Hurvich, C.M., Shumway, R., sai, C., 990. Improved estimators of Kullback-Leibler information for autoregressive model selection in small samples. Biometrika 77, Kapetanios, G., 200. Model selection in threshold models. J. ime Ser. Anal. 22, van der Leeuw, J., 994. he covariance matrix of ARMA errors in closed form. J. Econometrics 63, McQuarrie, A., Shumway, R., sai, C.L., 997. he model selection criterion AICu. Statist. Probab. Lett. 34, Öhrvik, J., Schoier, G., SEAR model selection a bootstrap approach. Comput. Statist. 20, Peña, D., Rodriguez, J., he log of the determinant of the autocorrelation matrix for testing goodness of fit in time series. J. Statist. Plann. Inference 36, Ramsey, F.L., 974. Characterization of the partial autocorrelation function. Ann. Statist. 2, Schwarz, G., 978. Estimating the dimension of a model. Ann. Statist. 6, Shibata, R., 980. Asymptotically efficient selection of the order of the model for estimating parameters of a linear process. Ann. Statist. 8, Stoica, P., Eykhoff, P., Janssen, P., Söderström,., 986. Model-structure selection by cross-validation. Internat. J. Control 43, ong, H., 990. Non-linear ime Series: A Dynamical System Approach. Oxford University Press, Oxford. Unnikrishnan, N.K., Bayesian subset model selection for time series. J. ime Ser. Anal. 25, Wong, C.S., Li, W.K., 998. A note on the corrected Akaike information criterion for the threshold autoregressive models. J. ime Ser. Anal. 9, 3 24.

Outlier detection in ARIMA and seasonal ARIMA models by. Bayesian Information Type Criteria

Outlier detection in ARIMA and seasonal ARIMA models by Bayesian Information Type Criteria Pedro Galeano and Daniel Peña Departamento de Estadística Universidad Carlos III de Madrid 1 Introduction The