Neural Network Method for Nonlinear Time Series Analysis

Size: px

Start display at page:

Download "Neural Network Method for Nonlinear Time Series Analysis"

Rodger Rich
5 years ago
Views:

1 Neural Network Method for Nonlinear Time Series Analysis Jinu Lee Queen Mary, University of London November 215 Job Market Paper Abstract This paper is concerned with approximating nonlinear time series by an artificial neural network based on radial basis functions. A new data-driven modelling strategy is suggested for the adaptive framework by combining the statistical techniques of forward selection, cross validation and information criterion. The proposed method is fast and simple to implement while avoiding some typical difficulties such as estimation and computation of nonlinear econometric models. Two applications are provided to illustrate the benefits of using the neural network method in time series analysis. First, the proposed modelling method is applied to a neural network test for neglected nonlinearity in conditional mean of univariate time series. A simulation study is carried out to show how the size of the test is improved in finite samples. Further, the new test is compared with altenative popular tests to demonstrate its superior power performance using a variety of nonlinear time series models. Second, the proposed method is applied to obtain a nonlinear forecasting model for daily S&P 5 returns. Forecast accuracy is compared with that of a linear model and other neural network models used in the literature. JEL classification: C12, C45, C51, C53, G17 Keywords: Artificial neural networks, radial basis function, data-driven modelling procedures, nonlinear time series, neglected nonlinearity, forecasting Acknowledgements: I am grateful to George Kapetanios and Andrew P. Blake for their valuable comments and supports. I would like to thank Liudas Giraitis and Emmanuel Guerre as well as the participants in the Econometrics seminar at Queen Mary for helpful discussions and constructive feedback. Corresponding author: School of Economics and Finance, Queen Mary, University of London, London, E1 4NS, United Kingdom. jinu.lee@qmul.ac.uk Tel: +44 () Fax: +44 ()

2 1 Introduction Given advances in computing power and the availability of large datasets, a nonparametric approach has been extensively considered as an important branch of the very large collection of nonlinear approaches in time series analysis. For instance, it includes smoothing techniques such as artificial neural networks (ANN), kernel regressions, orthogonal series expansions and splines. The methods which specify no particular parametric form and use smoothing parameters such as a bandwidth or the order of a series expansion, can capture a wide variety of nonlinearities with few assumptions. The benefit is obviously achieved in situations where little prior knowledge is available about a functional relationship given in a data set. However, such techniques entail well-known shortcomings such as computational complexity, the risk of overfitting and the ambiguous interpretation of parameters in practice. This paper proposes a new nonparametric method to avoid the potential difficulties as well as take the benefits of nonlinear approach. Particularly, this paper concentrates on the use of a single-hidden-layer ANN model with radial basis functions (RBFs), which is equivalent to an RBF series expansion. Since this form of neural network was first developed for exact mapping problems, it has been extended and proven to possess the property of best approximation (see e.g. Powell (1987), Broomhead and Lowe (1988), Girosi and Poggio (199) and Park and Sandberg (1991)). This ANN model consists of a finite number of RBF units in a single layer which are multiplied by their corresponding weights and linearly combined to approximate an output. The response (or activation) of an RBF unit is determined by the distance between an input (vector) and its centre (vector) and bounded by its radius (scalar). The centre chooses its location while the radius controls its width. Since there is a distinction between the role of the parameters, Bishop (1995) pointed out that this feature can lead to general strategies of two-stage procedures for estimating (or training) the neural network. Namely, centres and radii are first determined for RBF functions and being fixed while weights are estimated in a next stage. In so doing, the estimation of parameters in the nonlinear model boils down to a linear optimisation problem. With the strategy, Blake and Kapetanios (2, 23b) have suggested datadriven procedures for constructing an adaptive RBF network model in the context of nonlinearity testing. Particularly, a forward selection approach proposed by Orr (1995, 1996) is used with information criteria for adaptive model selection for a given dataset. After a finite set of possible RBF candidates for a given input data is first selected and ranked, the best subset of RBF units is determined by an 2

3 information criterion for model selection. The method has successfully performed in various econometric problems that require to approximate nonlinearities (see e.g. Blake and Kapetanios (23a, 27a,b) and Kapetanios and Blake (21)). Nonetheless, there is a critical issue to have to be addressed in the first stage of implementation of the strategy. RBF parameters are chosen by heuristic decision rules which are commonly used in the literature. For example, radii to be equal for centred RBF units in a hidden layer are assigned with the value of a twice maximum distance between input data points. The method may not be optimal by its nature. It depends only on input data while ignoring any information of output data. Moreover, the second stage for model selection is confined to the candidate set of the first stage by the forward selection method. Indeed, the problem may be analogous to a kernel regression in that the most important choice is how to select an optimal bandwidth for a kernel function in capturing nonlinear relations. 1 However, the effect of optimising RBF parameters on the performance of an RBF network model has not been investigated with the data-driven procedures in the literature. The objective of this study is to provide an alternative rigorous method for the first stage of the general strategy. Particularly, a cross validation approach is considered to select suitable radii to be equal for RBF functions which are being fixed with possible centres. This is motivated from generalising an RBF method originally introduced for an interpolation between input and output data. Choosing proper centres, of course, is likely to influence its performance. Many different ways have been suggested in the literature (see e.g. Haykin (1998)). Nevertheless, it is of importance to optimally tune the width of RBF functions once their positions are being fixed. Further, the next procedure of ranking RBF candidates is followed to evaluate the coupling effect of chosen positions and their width in terms of least losing the overall ability to approximate an unknown nonlinear function between input and output data. To access the usefulness of the proposed modelling method for an RBF series expansion, this study considers a misspecification test for nonlinearity. Note that the data-driven method has been developed in the literature for the use of RBF network in the context of diagnostic tests in econometrics. Particularly, this study focuses on a test of Blake and Kapetanios (23b) which aims to detect potential nonlinear properties in a conditional expectation model of time series. After constructing and estimating a linear-augmented RBF network for a given dataset, the estimates of 1 See Hutchinson (1994) and Bishop (1995) for further discussion of the relation between a RBF network and a kernel regression. 3

4 RBF weights are jointly tested for their significance under the null hypothesis of linearity. The previous work found that the test has good power properties with slight size distortions when the standard critical values of chi-square distribution are used for inference. To correct for size distortions or Type I error, a bootstrap technique has been suggested at the cost of a power loss or Type II Error. However, if RBF units were properly selected in location, width and size, the finite sample distribution of a corresponding test statistic would be well approximated by its theoretical asymptotic chi-square distribution. In other words, with appropriate nuisance parameters, the standard critical values would work well for describing rejection area of the test with desirable size properties. The additional costs of the resampling technique might not be necessary with regard to computation and power. In this sense, this study conducts Monte Carlo simulations to examine improvements in the finite sample properties of the test using the proposed modelling method. The degree of size distortions are investigated under the null hypothesis of linearity when the new data-driven procedures are used. Next, to investigate the relative performance of the new test, a comparison work is conducted using popular nonlinearity tests such as RESET test of Ramsey (1969), BDS test of Broock, Scheinkman, Dechert, and LeBaron (1996) and another neural network test of Lee, White, and Granger (1993). A variety of nonlinear time series models are considered as data generating processes which are commonly used in the economic applications such as self-exciting threshold autoregressive, smooth transition autoregressive, bilinear model and Markov-switching models. Since forecasting is one of main purposes for introducing ANNs in time series analysis, it is of considerable interest to explore how well the network method can perform as a nonlinear forecaster. Indeed, a similar idea of linearising ANNs or QuickNet to alleviate the difficulties of nonlinear methods has been suggested by White (26) and further studied by Kock and Teräsvirta (211, 214) with perceptron neural network models in predicting macroeconomic variables. However, no previous study has investigated forecasting capability of an RBF network using this data-driven modelling scheme. It is probably because the choice of RBF parameters has not been addressed yet using rigorous methods. Hence, this study considers an empirical example given in White (26) where the QuickNet algorithm is accessed with the single-layer perceptron neural network models commonly used in the economic and financial applications. Using the same data set of daily S&P 5 index and predictor variables as in White (26), it is possible to directly compare forecasting performance of the proposed method to that of the QuickNet without 4

5 replicating them, as well as a linear benchmark model in terms of point forecasts. The rest of this paper is organised as follows. Section 2 reviews the properties of RBF functions and the architecture of ANN model. In Section 3, the datadriven procedures are discussed for the model specification and the cross validation method is suggested for tuning RBF parameters. Section 4 briefly reviews the ANN test for neglected nonlinearity based on RBF functions as well as alternative tests. Section 5 analyses simulation results with the proposed neural network method in terms of size distortions and the finite sample distribution of test statistic. In addition a comparison work with alternative nonlinearity tests is presented. Section 6 illustrates the benefits of using the proposed method as a nonlinear forecasting tool applying to the daily returns of S&P 5 closing index. Section 7 concludes. 2 Radial basis function and artificial neural networks This section briefly reviews artificial neural networks (ANN) using radial basis functions (RBF). An RBF is a special case of function. Its response depends on the distance between an input vector and a centre vector. Among several popular types of radial functions such as a multiquadric and an inverse quadratic, for instance, the most commonly used is the Gaussian RBF, ) x c 2 ψ (x; c, τ) = exp (, (1) τ 2 where x is the input vector, c is a location parameter called as a centre, τ is a width parameter called as a radius and is an Euclidean distance. It is localised with the property that ψ (x) as x c. Namely, it is monotonically and symmetrically decreasing around c, and the decreasing rate is controlled by τ. The Gaussian density function in (1) is used as a type of RBF in this study. The RBF approach was originally used in solving an exact interpolation problem. For simplicity, consider an unknown mapping from a d dimensional input vector x t to a one-dimensional output (scalar) y t = F (x t ) : R d R where t = 1, 2,..., T. Powell (1987) introduced a linear combination of a set of T RBF functions, one for each data point, for the interpolation. Due to the high oscillatory behaviour of the model, it has been further concerned and modified to provide a smoothness in the presence of noise. Particularly, a theoretical framework has been developed for an approximation based on regularisation techniques that leads to the generalisation of the interpolating function (see Poggio and Girosi (199) and Girosi et al. (1995)). 5

6 Nonetheless, it is very expensive to implement a regularised model in practice given a large number of RBF functions. Its computational burden grows polynomially with the data size T as well as nonlinear optimisations are involved, see Haykin (1998) for the regularisation techniques and computational difficulties. Alternatively, the RBF method has been extended to perform a more general task of approximation in the ANN framework, which is of interest in this study (see Broomhead and Lowe (1988) and Moody and Darken (1989)). The complexity of the network model is to be represented with fewer RBF units in a hidden layer during estimating (or training in the terminology of ANNs) procedures. A typical feed-forward neural network with a single hidden layer has the following form: q f(x t ) = β + β j ψ (x t ; c j, τ j ), (2) j=1 where f( ) is an approximating function to F ( ), q is the number of hidden units which is (much) smaller than the size of data set T, β is a constant (or bias) and β j is a coefficient. Girosi and Poggio (199) derived using the regularisation theory that the model in (2) has the property of best approximation. This means that there always exists a choice of coefficients which approximate F ( ) better than all other possible choices. Furthermore, Park and Sandberg (1991) showed that the universal approximation property holds for the RBF network with only mild restrictions on the type of basis function. This is stronger than necessary for general RBF functions including the function in (1). Therefore, this result provides crucial theoretical foundations for the design of the RBF network for practical applications. Moreover, Niyogi and Girosi (1996) theoretically derived a generalisation error in a regression of y t on x t, E[(y f(x)) 2 ], in terms of bias and variance components. For any confidence parameter δ (, 1], such error is bounded by O ( 1/q ) + O ( (qd ln(qt ) ln δ)/t ) with probability greater than 1 δ. The result is similar as that derived by Barron (1994) for a single layer neural networks with squash functions which is one of the most commonly used ANN models in economics and finance. In a multiple regression analogy, the input vector x t contains explanatory variables, the output y t is a dependent variable, and the nonlinear function f(x t ) is the conditional expectation of y t given x t. For simplicity and notational convenience, a constant term will be ignored. Then, the regression given by the model in (2) can be rewritten as y = X q β q + e, (3) 6

7 where y = (y 1,..., y T ), X q is a T q matrix of RBF activations with centres c j and radii τ j, β q = (β 1,..., β q ), e = (ɛ 1,..., ɛ T ) and ɛ t iid(, σ 2 ɛ ) is an error term. A typical estimation strategy is to exploit the distinction between the roles of RBF parameters and weights. This feature which distinguishes this model from other ANNs leads to two-stage estimation procedures. RBF functions are being fixed in position, smoothness and number in the first stage while a model flexibility is achieved from coefficients of RBF functions in the next stage. Regularisation techniques such as a ridge regression may be optionally used to yield smoother estimates by penalising large coefficients. It is a simple modification to OLS to restrict a model flexibility, and coefficients are estimated by ˆβ q = (X qx q +ΛI q ) 1 X qy where I q is an identity matrix of size q and Λ is a diagonal matrix of regularisation parameters. 2 The values of Λ define the levels of penalty on corresponding non-zero coefficients of regression. When Λ is a zero matrix which means no penalty, the solution becomes the same as for OLS method. Finding optimal values for Λ given a dataset requires another complexity in terms of computation and nonlinear estimation (See e.g. Haykin (1998) and Orr (1996)). Besides, the primary issue of interest is how to construct the optimal matrix X q in (3) for the first stage of the strategy. Given a specific application, the objective is to adaptively specify position, smoothness and size of RBF functions used as best hidden units in the network model. Indeed, coefficients of the network are subjected to those obtained values at the second stage. If RBF functions are fixed with centres and radii which play little role in explaining a dependent variable, the network model may deviate from accurate approximations. If too many hidden units are involved, apart from the curse of dimensionality, a large number of parameters may allow the model to be over sensitive to data and yield a poor generalisation. For example, when the number of hidden units is the same as size as data, this method is equivalent to an exact interpolation. As an alternative way, this study considers the combination of forward selection and information criteria to retain the balance between accuracy and flexibility of the model following Orr (1995, 1996) and Blake and Kapetanios (2, 23a). This approach is very easy and fast to implement and produces a parsimonious model. However, none of the above papers concentrated on selecting radii for localised RBF functions apart from the use of heuristic decision rules. This study deals with the neglected issue to provide better quality for approximations of RBF network with the data-driven procedures. 2 See Appendix of Orr (1996) for deriving the formula. 7

8 3 Constructing and regularising the complexity of RBF neural network This section discusses the data-driven procedures to adaptively specify the complexity of RBF network. The algorithm is based on the procedures suggested by Blake and Kapetanios (2, 23b) using the nature of localised basis functions and the forward selection method of Orr (1995). 3.1 Data-driven modelling algorithm In the process of building up the model, three sets of parameters have to be determined which are centres, radii and number of RBF functions. Then OLS method is used to estimate weights of hidden units in the second stage. Namely, the first stage becomes a process of selecting nonlinear regressors X q = [Ψ 1 Ψ 2... Ψ q ] in (3) where Ψ j = [ψ j (x 1 ; c j, τ j ) ψ j (x 2 ; c j, τ j )... ψ j (x T ; c j, τ j )]. Using the forward selection approach, the modelling procedure can be summarised as follows: Step 1: Use the input data as possible centres such that c j {x 1, x 2,..., x T }. Step 2: Choose radii by the rule τ j = τ = 2 max( x t+1 x t ), and prepare a candidate set S of nonlinear regressors ψ j (x) S, S = {ψ(x; x 1, τ), ψ(x; x 2, τ),..., ψ(x; x T, τ)}. Step 3: Rank the candidates ψ(x) in S by their ability to reduce the variance of unexplained data when ψ(x) is individually used as a single regressor in (3), and construct the regressor matrix of all the ranked RBF units such that X T = [Ψ 1 Ψ 2... Ψ T ]. Step 4: Applying an information criterion, limit a size of X T using minimisation [ 1 with its penalty term P (k) such that ˆq = arg min log q {1,...,T } T (y X qβ q ) (y X q β q ) + P (k) ] where k is parameter number. Step 5: Estimate the coefficients by ˆβ ˆq = (X ˆqXˆq ) 1 X ˆqy. Note that centres are fixed using all available input data in Step 1 which is the conventional choice in the literature. Given the RBF candidates in Step 2, the model selection process is designed to combine appropriate RBF units in position and size through Step 3 and 4. The advantage of such procedures is that there is no need of complicated and time-consuming nonlinear optimisations. Application of the OLS method is sufficient for all the estimations. However, in Step 2, the radii are assumed to be constant and their value is associated with the maximum 8

9 distance between input data points. Intuitively, this may lead to suboptimal model specification in the next procedures. Therefore, this study proposes alternative approach to find width parameters for centred RBF units using data set. 3.2 Leave-one-out cross validation for selecting radius This subsection describes how to obtain a common radius for all possible RBF functions by the cross validation strategy at Step 2 taking an RBF interpolation method into account. The cross validation technique is a common tool to evaluate model fit in statistics. A leave-one-out cross validation (LOOCV) method is particularly emphasized in this study. To this end, the data set is split into two subsets which are a validation set of one data point x k and an estimation set of the rest of data points. In the interpolation between x t and y t, let f k (x t ) be an RBF interpolation model which consists of all the units in the candidate set S except for ψ(x; x k, τ). Given a quasi radius τ = τ, therefore, the linear system of T 1 equations can be solved out analytically. Then, the interpolant of the model for x k can be written as T f k (x k ) = ˆβ j ψ(x k ; x j, τ), (4) j=1,j k and its interpolation error is e k = y k f k (x k ). Therefore, the error variance under the LOOCV scheme is the function of τ, ˆσ 2 LOOCV (τ) = 1 T T [y k f k (x k ; τ)] 2, (5) k=1 where ˆσ LOOCV 2 is an estimator for the error variance. Since this can be interpreted as a cost function for a radius, the best value for possible RBF units can be obtained in a cost minimisation selecting τ such that ˆτ = arg min ˆσ LOOCV 2 (τ). Instead of the τ LOOCV technique, this procedure can be extended for a general leave-p-out cross validation procedure with more time-consuming computations. According to numerical experiments of Rippa (1999), a radius obtained by the LOOCV method in the RBF interpolation between scattered data is usually close to the actual optimum value of τ. Further, to reduce computational costs in (5) which becomes prohibitively expensive as the sample size of T increases, Rippa 9

10 (1999) developed a simplified algorithm based on the formula: T T [ ˆβT (k) ˆσ 2 LOOCV (τ) = 1 T k=1 e 2 k = 1 T k=1 X 1 T (k) ] 2, (6) where, given τ, X 1 T (k) is the kth diagonal element of the inverse matrix of X T and ˆβ T (k) denotes the k th element of the estimates of the coefficient vector ˆβ T which is obtained by solving the linear system y = X T β T. As a result, given a radius, the interpolation errors e 1, e 2,..., e T are obtained simultaneously without having to repeat the interpolation in (4) with T different sub-data sets of T 1 data points. This study uses the algorithm (6) for the LOOCV approach to optimise a radius in Step 2. Then, the possible RBF candidates collected in the set S = {ψ(x; x 1, ˆτ),..., ψ(x; x T, ˆτ)} are ready to be used for the forward selection and model selection procedures in Step 3 to 5. 4 Application to a misspecification test In this section, the RBF network and the new modelling procedures are applied for considering a misspecification test. This test detects potential nonlinearity in time series models. Primarily, the conditional mean of univariate time series {y t } T 1 is defined by a set of d-dimensional variables x t. In time series analysis, x t may contain lagged dependent and/or exogenous variables. 4.1 Neural network test for neglected nonlinearity As in Lee, White, and Granger (1993) (LWG) and Blake and Kapetanios (23b) who proposed the test for nonlinearity based on ANN models, we say that the process of interest y t is linear in mean conditional on x t if H : P r{e(y t x t ) = α x t } = 1, for some constant vector α. Then, the alternative is defined as H 1 : P r{e(y t x t ) = α x t } < 1, for all α. When the null hypothesis is significantly rejected, that indicates an occurance of nonlinear misspecification in the conditional mean model. From the hypotheses, using the ability of inherent flexibility and universal approximation of the RBF network, a generic form for the conditional expectation of y t on x t can be 1

11 written as q E(y t x t ) = α x t + β j ψ j (x t ). (7) j=1 For simplicity and expository purposes, suppose that the model for the series y t under the null hypothesis is as follows: y t = α x t + ɛ t, (8) where the errors ɛ t are i.i.d. random variables with finite variance. Therefore, under the alternative hypothesis, the nonlinear model can be written as y t = α x t + q β j ψ j (x t ) + ɛ t. (9) j=1 If there are omitted nonlinear effects in the regression in (8), then the RBF network in (9) should have significant explanatory power for variability of dependent variable. Clearly, a joint test of β 1 = = β q = in (9) provides evidence for detecting neglected nonlinearity in the conditional mean of y t. This regression-typed test is referred to as an RBFN test developed by Blake and Kapetanios (23b). An advantage of this test is that a standard Wald test statistic can be used together with the data-driven method to specify the model in (9) with no identification issue of RBF parameters arising. Hence, the test of q linear restrictions under the null hypothesis takes the form: ˆβ {R (W W) 1 R} 1 ˆβ ˆσ 2 χ 2 q, (1) where W is the matrix of regressors in (9), R is the matrix of selection for the coefficients of the linear model and the RBF series in (9), ˆβ ( = ˆβ1,..., ˆβ ) q is the parameter estimate and ˆσ 2 is the residual variance in (9). expected to be asymptotically chi-square distributed. The test statistic is 4.2 Size distortion problem According to finite sample properties of nonlinearity test in a simulation study of Blake and Kapetanios (23b), the RBFN test seems to suffer from a size distortion problem under the null hypothesis. The finite sample distribution of the test statistic in (1) seems not to be well approximated by its asymptotic chi-square 11

12 distribution. Particularly, this problem can become more severe by the choice of information criterion for model selection in the data-driven procedures. When a more parsimonious penalty term is adopted, the test results in less over-rejecting. Their simulations indicated that a criterion of Schwarz (1978) (BIC) is a desirable choice with minor size distortions comparing to the others such as Akaike (1977) (AIC), Hannan and Quinn (1979) (HQIC) and Guay and Guerre (26) (GG). In the context of choosing the correct number of hidden units, it may be possibility that the rank and selection decisions might be affected by a candidate set of imprecise RBF functions at the earlier steps. The size issue arises, regardless of the type of information criterion, if the test statistic in (1) is primarily sensitive to the choice of radii and centres. Blake and Kapetanios (23b) suggested a bootstrap resampling to correct size distortions which may be associated with the effect of assigning an ad hoc value to radii. However, apart from the increase of computational complexity, a disadvantage of this approach is that it appears to be a trade-off of improved size with a loss of power performance. Therefore, it is of considerable importance to investigate whether optimising hidden units for RBF network may solve a size distortion problem. Similarly, Lee, Xi, and Zhang (214) illustrated that, in the case of the LWG test, the regularising method for activations of randomly selected hidden units plays a vital role in the finite sample distribution of its test statistic. Hence, the following Monte Carlo simulations are conducted in this study to examine the performance of RBFN tests with the newly proposed method. If an RBF series expansion is properly selected taking into account position, smoothness and size, the critical values of standard chi-square distributions can be used for inference without leading to size distortions and resorting on bootstrapped critical values. 4.3 Alternative nonlinearity tests It is meaningful to investigate the relative performance of RBFN test with respect to some popular tests for nonlinearity. Particularly, a RESET test of Ramsey (1969), a BDS test of Broock, Scheinkman, Dechert, and LeBaron (1996) and another neural network test of Lee, White, and Granger (1993) (LWG) are considered in this study. Here, they are briefly reviewed in terms of testing procedures. 12

13 RESET Test Ramsey (1969) suggested a specification test for linear least-squares regression analysis. For detecting the presence of nonlinearity, the test focuses on specification errors in a linear model y t = α x t + ɛ t, t = 1,..., T using regression residuals ˆɛ t and polynomials of forecasts f t = ˆα x t. The first step is to compute the residuals and the sum of squared residuals e e = (ˆɛ 1,..., ˆɛ T )(ˆɛ 1,..., ˆɛ T ). In the second step, an alternative model is considered: y t = α x t + β 2 ft β k ft k + υ t, where some k 2 and υ t is an error term. The null hypothesis is H : β 2 = = β k =. Under H, the test statistic is [(e e v v)/(k 1)]/[v v/(t k)] where v = (ˆυ 1,..., ˆυ T ), which is approximately F (k 1, T k) distributed in large sample. According to Lee, White, and Granger (1993), further, collinearity can be circumvented by forming the principal components of (ft 2,..., ft k ) with p < k 1. Then, the regression of y t on these p principal components dropping the first one and x t is conducted to obtain the residuals u = (û 1,..., û T ). The test statistic is [(e e u u)/p ]/[u u/(t k)] which is approximately F (p, T k) distributed under the null hypothesis of β 1 = = β p =. BDS Test Brock, Dechert and Scheinkman (1987) originally developed a test for the null hypothesis of independence and identical distribution for detecting non-random chaotic dynamics. 3 However, it has been shown by many studies that the BDS test has power against a wide range of nonlinear alternatives in a portmanteau test. Given a series of interest y t, the BDS test statistic is based on the following correlation integral: C m,tm (ε) = 2 1 ε (Yt m, Y m T m (T m 1) 1 t<s T m s ) where T m = T m + 1 and Yt m = (y t, y t 1,..., y t m+1 ) is m history of the series, 1 ε (u, v) is an indicator function which equals to one if u v < ε, and zero otherwise where is the supnorm. The test statistic is defined as T (C m,tm (ε) C 1,Tm (ε) m ). Under the null hypothesis that y t is independently and identically distributed, this statistic is asymptotically distributed as normal NID (, σ 2 m(ε) ) with zero mean 3 This work was published later as Broock, Scheinkman, Dechert, and LeBaron (1996) 13

14 and a known variance of complicated form. The variance σ 2 m(ε) can be estimated consistently as documented by Broock, Scheinkman, Dechert, and LeBaron (1996). Hence, the BDS test statistic is defined as T (Cm,Tm (ε) C 1,Tm (ε) m ) σ 2 m (ε) d NID(, 1). Following Lee, White, and Granger (1993), instead of the series y t, this study uses the residuals obtained from a fitted linear model ˆɛ t = y t ˆα x t to detect remaining dependence and the presence of neglected nonlinear structure. If the null hypothesis is rejected, it is concluded that the linear model suffers from nonlinear misspecification. LWG Test Lee, White, and Granger (1993) proposed a neural network test for detecting neglected nonlinearity. The test works with regression residuals ˆɛ t and explanatory variables x t of a model y t = α x t + ɛ t to find out whether any of the residuals can be explained by nonlinear transformations of the variables. The test is based on the following auxiliary regression: q ˆɛ t = α x t + β j ψ(γ jx t ) + υ t, j=1 where ˆɛ t = y t ˆα x t are residuals estimated by OLS, ψ(z) = (1 + exp(z)) 1 is the logistic function and υ t is a error term. γ j is randomly chosen a priori using the uniform distribution on [-2, 2]. To avoid possible multicollinearity of ψ(γ jx t ) with x t and themselves as well as tedious computations, the authors suggested to choose q principal components of q hidden units ψ(γ jx t ) with q < q after dropping the first principal component, and use these principal components to run the regression for the LWG test. The Lagrange multiplier version of test statistic has property T R 2 d χ 2 (q ) where T is the sample size and R 2 is the coefficient of determination of the auxiliary model. 14

15 5 Monte Carlo experiments In this section, the finite sample performance of RBFN test for neglected nonlinearity is investigated via numerical simulations. The newly proposed selection method for a radius of RBF functions are evaluated in terms of size and power properties of the test, combining with the data-driven procedures of forward selection and information criteria. 5.1 Simulation design To generate sample data, this study considers several econometric models which are commonly used in time series analysis. These are an autoregressive model (AR), a self-exciting threshold autoregressive model (SETAR), a smooth transition autoregressive model (STAR), a bilinear model (BILIN) and a Markov-switching model (MARKOV). Their parameters are replicated from the experimental design of Blake and Kapetanios (23b). The data generating processes (DGP) are described as follows: DGPs for size test: 1) AR : y t = θ 1 y t 1 + ɛ t, θ 1 =,,,.6,.9. DGPs for power test: 1) SETAR1: y t =.8y t 1 1(y t 1 > ) +.8y t 1 1(y t 1 ) + ɛ t. 2) SETAR2: y t =.6y t 1 1(y t 1 > ) +.6y t 1 1(y t 1 ) + ɛ t. 3) Exponential STAR1 (ESTAR1): y t =.8y t (1 exp( yt 1))y 2 t 1 + ɛ t. 4) Exponential STAR2 (ESTAR2): y t =.6y t 1.8 (1 exp( yt 1))y 2 t 1 + ɛ t. 5) Logistic STAR1 (LSTAR1): y t =.8y t (1 + exp( y t 1 )) 1 y t 1 + ɛ t. 6) Logistic STAR2 (LSTAR2): y t =.6y t 1.8 (1 + exp( y t 1 )) 1 y t 1 + ɛ t. 7) BILIN1: y t =.7y t 1 ɛ t 1 + ɛ t. 8) BILIN2: y t = y t 1 ɛ t 1 + ɛ t. 9) MARKOV1: y t =.8y t 1 1(S t = 1).8y t 1 1(S t = 2) + ɛ t, with vec(p ) = (,,, ). 15

16 1) MARKOV2: y t =.6y t 1 1(S t = 1).6y t 1 1(S t = 2) + ɛ t, with vec(p ) = (,,, ). 11) MARKOV3: y t =.8y t 1 1(S t = 1).8y t 1 1(S t = 2) + ɛ t, with vec(p ) = (.9,, 5,.75). Here, the disturbances ɛ t are pseudo-standard normal random numbers, 1 denotes an indicator function which equals to one if the event inside its parentheses is true and zero otherwise, S t is a hidden two-state Markov chain taking values of 1 and 2 with transition matrix P. For simplicity, size properties are evaluated under the first order AR processes of null hypothesis. The other nonlinear processes are used to investigate the power properties of the tests. The second DGP for each time series model is designed to have a milder degree of nonlinearities than the first one. The parameters of MARKOV3 model are taken from those of Hamilton (1989). Sample sizes T are 1, 15, 2, 5 and 1. To minimise the effect of initial conditions, first 2 observations are discarded for each sample. Each sample is normalised by standard sample deviation for testing. A significant level is 5% and the number of replications is set to 1, as is usual in studies for neglected nonlinearity tests. To construct the architecture of RBF network, a lagged dependent variable is used as an explanatory variable, i.e. x t = y t 1. A radius is searched for in an arbitrary interval (, 2] by the LOOCV algorithm given in Section 3.2. Since the choice of information criterion is also an important factor to choose the number of hidden units, this study reconsiders the four information criteria of AIC, BIC, HQIC and GG. They have the penalty terms of 2k, k ln T, 2k ln(ln T ) and k+ 2k(ln ln T ) respectively where k denotes the number of parameters in a linear-augmented ANN model in (9). 5.2 Results of size tests Table 1 reports the size results for RBFN tests using the proposed methods for RBF parameters in the data-driven modelling procedures. The method is labelled with RBFN-LOOCV. The numbers are the rejection frequencies under the null hypothesis of linearity in the case of AR processes at 5% significant level. Here, BIC is considered as the penalty term for model selection in simulations following the original testing procedures of Blake and Kapetanios (23b) labelled with RBFN- BK for the purpose of comparison. In common, the rejection rates vary depending on the linear processes and the sample size in Table 1. They tend to decrease to 5% level as the sample size increases for all the tests. 16

17 Compared to the benchmark RBFN-BK, RBFN-LOOCV shows better size properties for all the cases regardless of the process under H. The size gap becomes much more substantial as the AR coefficient is close to one (θ 1 =.9). Overall, the size results of RBFN-LOOCV test are acceptable in that the values are inside the 95% confidence interval (.365,.635). 4 It supports the expectation that optimising the radius by the cross validation method helps to mitigate its size distortions. 5.3 Finite sample distribution of test statistic This study additionally investigates the histograms of the test statistic under the null hypotheses by the different modelling methods. To this end, RBF network models are restricted to have a single hidden unit only, i.e. q = 1 in (9). The reason is that almost every model chose the single best RBF unit as additive nonlinearity under the linear processes in the simulations in Section 5.2, even if AIC criterion is used. Since choosing more than two hidden units happens rarely, the restriction of two RBF units, q = 2 in (9), is to be considered as the second scenario. The parameters of θ 1 =,,.9 for AR(1) models and the sample size of T = 2 are used in this simulations. Figure 1 presents the histograms of the test statistic in (1) when the model has a single RBF unit in the layer. At a glance, it is observed that the sample distribution produced by RBFN-LOOCV (histogram) is very close to its asymptotic χ 2 1 distribution (solid red line), regardless of the degree of AR coefficients. RBFN-BK method leads to fatter right tails which are expected from the overrejected results in Section 5.2. Particularly, as the true process is closer to a unit root, the histogram of RBFN-BK is more seriously deviating from the asymptotic line. Next, Figure 2 displays the histograms of the test statistic in the cases of two hidden units used for RBF network models. Overall, they look similar to the asymptotic χ 2 2 distribution. It is also shown that RBFN-LOOCV method leads to the best shape of distributions among the others. With various sample sizes of T = 5, 1, 15, 2, 25, 5, furthermore, Figure 3 and Figure 4 compare the finite sample histograms between the RBFN-BK and RBFN-LOOCV tests under the AR(1) model with θ 1 =.9. Unlike the wide gaps resulted from the former test, it is seen that the latter consistently yields the figures of histograms in accordance with the asymptotic distributions regardless of sample size. 5 4 Since rejection can be considered as sample proportion with success probability of.5, the standard error is (.5.95)/1.69. Hence, the 95% confidence interval is.5 ± In the case of RBFN-LOOCV method, the test results under the null of AR(1) model with The 17

18 Therefore, it is shown that the RBFN-LOOCV test has good size not only at 5% level but also across the entire distribution using the standard critical values of χ 2 q distribution. The results confirm that the proper selection method for RBF parameters can cure the size distortions of the RBFN test in the context of the data-driven modelling procedures. 5.4 Performance of new RBFN test This study further investigates the performance of the RBFN-LOOCV test. The restriction on the number of possible hidden units is no longer valid specifying model complexity. All the four information criteria of AIC, BIC, HQIC and GG are again considered to select the number of RBF units in the modelling procedures. For size tests, an AR(1) process is considered with θ 1 = as in Blake and Kapetanios (23b). In the first row of Table 2, BIC is still found to be the best choice in terms of size level; whereas, the other criteria lead to slight over-rejections under the null hypothesis. This reconfirms the importance of the information criterion choice such that the size distortions can be reduced in some degree by using a more parsimonious penalty term in the data-driven procedures. Power results vary considerably across the nonlinear alternatives; the rejections tend to increase as the sample size grows. Regardless of penalty term choices, the tests detect the second process of each nonlinear time series model less efficiently than the first one except for BILIN model. The RBFN tests are the most powerful against BILIN, followed by SETAR and LSTAR series. Both ESTAR and MARKOV alternatives are relatively harder to be detected by the network model. Particularly, the rejections in the case of MARKOV processes seem to be increasing more slowly with the large samples than those in the case of ESTAR series. The reason may be that they are governed by unobserved Markov-process rather than lagged or other observable variables. 5.5 Comparison with alternative tests for nonlinearity In this subsection, the performance of the proposed RBFN test is compared with that of some popular tests for neglected nonlinearity such as RESET, BDS and LWG tests presented in Section 4.3. As in the comparison study of Lee, White, and Granger (1993), RESET test is parameterised with restriction k = 5 for the θ 1 =,, also show better sample histograms of test statistic across the sample sizes. The results are available upon requests. 18

19 power of polynomials and p = 1 for a second principal component to avoid a collinearity. BDS test is implemented for dimensions m = 2 and a distance ε = 1 in simulations. For LWG test, weights γ j of q logistic functions are randomly generated from the uniform distribution on [-2, 2]. The values of (q, q ) are set to (1,3) and (1,3) where q indicates the number of principal components with dropping a first component. They are labelled with LWG1 and LWG2, respectively. For RBFN test, both the LOOCV method for a radius and the penalty term of BIC for model selection are used in the testing procedures due to their best performance results in the simulations in Section 5.2 and 5.4. The first row of Table 3 and Table 4 presents the rejection frequencies of linearity hypothesis in 1 replications by models and sample sizes (T = 1, 2, 5, 1). First, under the null of AR(1) process (θ 1 = ), it is seen that RESET tests show under-rejections while BDS tests suffer from the large degree of over-rejections. However, the rates seem to converge towards the nominal level of 5% with the large samples. Two LWG tests show some degree of under-rejections, particularly with the large samples. RBFN tests show slight over-rejections. For all the samples, as a whole, LWG1 and RBFN tests seem to have acceptable size results in the simulations. Power results vary considerably among the tests in Table 3 and Table 4. It is clearly shown that the neural network tests provide superior performance regardless of sample sizes and types of nonlinear process. These results confirm the universal approximation ability of the ANN models to arbitrary unknown functions in general. Particularly, RBFN tests show the best quality of detection of nonlinearity against SETAR, LSTAR and BILIN alternatives while LWG tests perform well against ESTAR processes. The rejection results indicate little difference between LWG1 and LWG2 tests due to the different number of hidden units. Although BDS tests outperform the ANN tests against MARKOV processes, it appears that RESET and BDS tests yield relatively lower power performance comparing to the ANN tests. Overall, RBFN tests seem to be the best performer in detecting the presence of the considered nonlinearities with proper size results in the finite sample simulations in this comparison study. 6 Application to forecasting time series In this section, the proposed method is applied for estimating and forecasting daily returns of the S&P 5 index. This exercise is designed to facilitate a comparison of the performance of our nonlinear approach with approaches pro- 19

20 posed in White (26) without replicating their results. The daily data set of S&P 5 index is obtained from a website finance.yahoo.com for the period from 22 July 1996 to 21 July 24. The observations from 22 July 23 to 21 July 24 are reserved to evaluate out-of-sample predictions regarding to point forecasts. A dependent variable is a percentage return on day t, y t = 1(Pt C Pt 1)/P C t 1 C where Pt C is a daily closing index value. A predictor set is as x t = (y t 1, y t 2, y t 3, y t 1, y t 2, y t 3, r t 1, r t 2, r t 3 ) where r t measures a market volatility by a daily range r t = 1(Pt H Pt L )/Pt L. Here Pt H and Pt L are the maximum and minimum value of the index on day t, respectively. In this example, the maximum number of RBF candidates to be selected in the modelling procedures is limited for the comparison purpose. Namely, the network model has q c hidden units in a single layer which is finally selected by either AIC or BIC criterion among a limited maximum number q max of ranked RBF candidates. 6.1 Out-of-sample forecasts and criteria For obtaining point forecasts, this study considers a direct method by which multiperiod-ahead forecasts are directly made based on information known at the forecast origin. The direct forecasting model is of the form y t+h = f(x t ; θ h ) + ɛ t+h (11) where h denotes the forecast horizon, θ h is a vector of unknown parameters and ɛ t+h is an error term. At time period t, the estimate ˆθ h is obtained as the minimiser, arg min t s=t [y t+h f(x t ; θ h )] 2 where t denotes the first observation used θ h for estimating this model. Consequently, for period-by-period forecasts, a prediction model has to be specified and estimated at each forecast horizon. The forecasts are conditional only on observations up to time t. Therefore, this method is applied under the assumption that the relationship between predictors and target variables is sufficiently stable as it can be used in next time frames. To evaluate out-of-sample performance, the mean square forecast error (MSFE) is used to measure forecasting accuracy. It is defined as the average forecast error variance MSF E = 1 H t+h (ŷ t+h y t+h ) 2 t where y t+h and ŷ t+h stand for actual and predicted values at time t + h respectively 2

21 and H is the total number of periods over which forecasting is performed. Since a correct sign prediction is important in financial applications, in addition, a success ratio (SR) for directional accuracy for forecasting models is defined as SR = 1 H t+h 1(ŷ t+h y t+h > ) t where 1 denotes an indicator function which equals to one if the event inside its parentheses is true and zero otherwise. This criterion can reflect the market timing ability of the prediction models which may provide a signal for buying or selling. 6.2 Comparison with White (26) method Table 5 provides the estimation and forecasting results for this example. As for in-sample fitting, all the network models outperform the linear model in terms of R 2 and MSE. As expected, the more RBF units involved in a hidden layer yield better fitting performance. Particularly, when the new modelling procedures are used, the RBF network yield the best results for the return series regardless of the information criteria. For instance, having hidden units of q c = 5, 2, 5, the RBFCV models deliver R 2 =.69, MSE=1.574, R 2 =62, MSE=1.417 and R 2 =76, MSE=1.224, respectively. These fitting results outperform those of White (26) where R 2 =.45, MSE=1.618, R 2 =.95, MSE=1.532 and R 2 =53, MSE=1.265 are the corresponding maximum figures obtained by using the Logistic, Ridgelet or Polynomial QuickNet models. Therefore, in this exercise, it is found that the proposed RBF network method can efficiently perform a competitive job of approximations. As for the out-of-sample forecasts in Table 5, the best fitting RBFCV models seem to perform slightly better than the linear model in terms of MSFE and SR criteria, particularly the RBFCV-BIC model which yields MSFE=5. In comparison, the optimal Logistic and Ridgelet QuickNet method by White (26) underperform the linear model quite poorly with MSFE=.643 and.886, respectively. Therefore, this example gives substantial evidence that the proposed method could improve the approximation performance of the RBF network balancing between in-sample fit and out-of-sample prediction as well as outperforming the other ANN methods. 21

22 7 Conclusion This study has presented a nonparametric technique for approximating unknown nonlinear relationship via the RBF neural network. New modelling and estimation procedures have been proposed addressing the critical issue of optimising RBF parameters of centres and radii in the data-driven method. Particularly, the cross validation strategy in the RBF interpolation has been suggested for obtaining adaptive hidden units in terms of radii and centres. The methods of forward selection and information criteria are sequentially applied to regularise the complexity of RBF neural network. To highlight the benefits of the proposed modelling method, a misspecification test for neglected nonlinearity in univariate time series has been considered in the study. The simulation results found that optimising the RBF parameters could cure the size distortion problem of the RBFN test with an appropriate information criterion in the modelling procedures. Particularly, when both the leave-one-out cross validation algorithm and the BIC penalty term are applied, the finite sample distribution of a test statistic approaches very closely to its asymptotic chi-square distribution. Unlike the finding of the literature, therefore, it is supported that the chi-squared critical values can be used for making inference without resampling techniques such as bootstrapping method for proper critical values which require the extra costs with regard to computation and power loss. Furthermore, the simulation results demonstrated that the proposed RBFN test gains its competitiveness over the alternative nonlinearity tests of RESET, BDS and LWG in terms of power performance. The RBFN test can be encouraged as a complementary diagnostic tool to nonlinearity tests in practice. As the second application, this study considered the forecasting exercise for daily S&P 5 returns used in White (26). In terms of estimating and forecasting accuracy, the comparison exercise showed that the RBF network method outperforms the other ANN methods suggested in White (26) and the linear model. Taking into account the advantages of mitigating the practical difficulties of construction and estimation, it may be worthwhile to view the proposed forecasting method as a useful nonlinear approach. Therefore, it is certainly of interest to learn more about the properties of the proposed neural network method conducting additional investigations in various applications which may need approximation of unknown underlying dynamics as well as forecasts for economic and financial variables. 22

A radial basis function artificial neural network test for ARCH

Economics Letters 69 (000) 5 3 www.elsevier.com/ locate/ econbase A radial basis function artificial neural network test for ARCH * Andrew P. Blake, George Kapetanios National Institute of Economic and