Autoregressive Approximation in Nonstandard Situations: Empirical Evidence

Similar documents
Department of Econometrics and Business Statistics

Model selection using penalty function criteria

Time Series: Theory and Methods

Autoregressive Moving Average (ARMA) Models and their Practical Applications

Lecture 2: Univariate Time Series

Estimating Moving Average Processes with an improved version of Durbin s Method

NOTES AND PROBLEMS IMPULSE RESPONSES OF FRACTIONALLY INTEGRATED PROCESSES WITH LONG MEMORY

Automatic Autocorrelation and Spectral Analysis

The Generalized Cochrane-Orcutt Transformation Estimation For Spurious and Fractional Spurious Regressions

{ } Stochastic processes. Models for time series. Specification of a process. Specification of a process. , X t3. ,...X tn }

at least 50 and preferably 100 observations should be available to build a proper model

Midterm Suggested Solutions

On Moving Average Parameter Estimation

ECON 616: Lecture 1: Time Series Basics

A time series is called strictly stationary if the joint distribution of every collection (Y t

Univariate ARIMA Models

Introduction to ARMA and GARCH processes

Chapter 3: Regression Methods for Trends

University of Oxford. Statistical Methods Autocorrelation. Identification and Estimation

Econ 623 Econometrics II Topic 2: Stationary Time Series

Some Time-Series Models

Differencing Revisited: I ARIMA(p,d,q) processes predicated on notion of dth order differencing of a time series {X t }: for d = 1 and 2, have X t

Gaussian processes. Basic Properties VAG002-

Economics Division University of Southampton Southampton SO17 1BJ, UK. Title Overlapping Sub-sampling and invariance to initial conditions

A COMPARISON OF ESTIMATION METHODS IN NON-STATIONARY ARFIMA PROCESSES. October 30, 2002

Elements of Multivariate Time Series Analysis

Time Series 2. Robert Almgren. Sept. 21, 2009

AR, MA and ARMA models

THE PROCESSING of random signals became a useful

Problem Set 2: Box-Jenkins methodology

A test for improved forecasting performance at higher lead times

Lecture 1: Fundamental concepts in Time Series Analysis (part 2)

The Identification of ARIMA Models

3. ARMA Modeling. Now: Important class of stationary processes

10. Time series regression and forecasting

Ch. 15 Forecasting. 1.1 Forecasts Based on Conditional Expectations

covariance function, 174 probability structure of; Yule-Walker equations, 174 Moving average process, fluctuations, 5-6, 175 probability structure of

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

Minimum Message Length Autoregressive Model Order Selection

Modeling Long-Memory Time Series with Sparse Autoregressive Processes

Time Series I Time Domain Methods

Cointegrated VAR s. Eduardo Rossi University of Pavia. November Rossi Cointegrated VAR s Financial Econometrics / 56

Time Series Analysis. Correlated Errors in the Parameters Estimation of the ARFIMA Model: A Simulated Study

New Introduction to Multiple Time Series Analysis

Lecture 3: Autoregressive Moving Average (ARMA) Models and their Practical Applications

On the robustness of cointegration tests when series are fractionally integrated

Estimation of Parameters in ARFIMA Processes: A Simulation Study

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Parameter estimation: ACVF of AR processes

A TIME SERIES PARADOX: UNIT ROOT TESTS PERFORM POORLY WHEN DATA ARE COINTEGRATED

Forecasting Levels of log Variables in Vector Autoregressions

Applied time-series analysis

Statistical and Adaptive Signal Processing

ADAPTIVE FILTER THEORY

Ch. 19 Models of Nonstationary Time Series

If we want to analyze experimental or simulated data we might encounter the following tasks:

COMPUTER ALGEBRA DERIVATION OF THE BIAS OF LINEAR ESTIMATORS OF AUTOREGRESSIVE MODELS

TAKEHOME FINAL EXAM e iω e 2iω e iω e 2iω

Forecasting 1 to h steps ahead using partial least squares

Unit Root Tests Using the ADF-Sieve Bootstrap and the Rank Based DF Test Statistic: Empirical Evidence

STAT 443 Final Exam Review. 1 Basic Definitions. 2 Statistical Tests. L A TEXer: W. Kong

Single Equation Linear GMM with Serially Correlated Moment Conditions

Lecture 7: Model Building Bus 41910, Time Series Analysis, Mr. R. Tsay

MCMC analysis of classical time series algorithms.

The Restricted Likelihood Ratio Test at the Boundary in Autoregressive Series

Thomas J. Fisher. Research Statement. Preliminary Results

Model selection criteria Λ

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Linear models. Chapter Overview. Linear process: A process {X n } is a linear process if it has the representation.

IDENTIFICATION OF ARMA MODELS

Empirical Market Microstructure Analysis (EMMA)

TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA

A Study on "Spurious Long Memory in Nonlinear Time Series Models"

ITSM-R Reference Manual

Ch 6. Model Specification. Time Series Analysis

M O N A S H U N I V E R S I T Y

Econometric Forecasting

Single Equation Linear GMM with Serially Correlated Moment Conditions

7. Forecasting with ARIMA models

Econometría 2: Análisis de series de Tiempo

Levinson Durbin Recursions: I

Lessons in Estimation Theory for Signal Processing, Communications, and Control

A COMPLETE ASYMPTOTIC SERIES FOR THE AUTOCOVARIANCE FUNCTION OF A LONG MEMORY PROCESS. OFFER LIEBERMAN and PETER C. B. PHILLIPS

Forecasting. Simon Shaw 2005/06 Semester II

The autocorrelation and autocovariance functions - helpful tools in the modelling problem

ARIMA Modelling and Forecasting

Introduction to Time Series Analysis. Lecture 11.

Covariance Stationary Time Series. Example: Independent White Noise (IWN(0,σ 2 )) Y t = ε t, ε t iid N(0,σ 2 )

Multivariate Time Series

Testing Restrictions and Comparing Models

Variable Targeting and Reduction in High-Dimensional Vector Autoregressions

5: MULTIVARATE STATIONARY PROCESSES

Testing structural breaks vs. long memory with the Box-Pierce statistics: a Monte Carlo study

System Identification

Advanced Econometrics

An algorithm for robust fitting of autoregressive models Dimitris N. Politis

7. MULTIVARATE STATIONARY PROCESSES

Expressions for the covariance matrix of covariance data

Econometrics II Heij et al. Chapter 7.1

Levinson Durbin Recursions: I

Transcription:

Autoregressive Approximation in Nonstandard Situations: Empirical Evidence S. D. Grose and D. S. Poskitt Department of Econometrics and Business Statistics, Monash University Abstract This paper investigates the empirical properties of autoregressive approximations to two classes of process for which the usual regularity conditions do not apply; namely the non-invertible and fractionally integrated processes considered in Poskitt (). In that paper the theoretical consequences of fitting long autoregressions under regularity conditions that allow for these two situations was considered, and convergence rates for the sample autocovariances and autoregressive coefficients established. We now consider the finite-sample properties of alternative estimators of the AR parameters of the approximating AR(h) process and corresponding estimates of the optimal approximating order h. The estimators considered include the Yule-Walker, Least Squares, and Burg estimators. AMS subject classifications: Primary M: Secondary G, G Key words and phrases: Autoregression, autoregressive approximation, fractional process, non-invertibility, order selection, asymptotic efficiency. Version: September, Preliminary Draft: Not to be quoted without permission Corresponding author: Don Poskitt, Department of Econometrics and Business Statistics, Monash University, Victoria, Australia Tel.:+--99-97; fax:+--99-7. E-mail address: Don.Poskitt@Buseco.monash.edu.au

Nonstandard Autoregressive Approximation Introduction The so-called long memory, or strongly dependent, processes have come to play an important role in time series analysis. Statistical procedures for analyzing such processes have ranged from the likelihood-based methods studied in Fox and Taqqu (9), Sowell (99) and Beran (99), to the non-parametric and semi-parametric techniques advanced by Robinson (99), among others. These techniques typically focus on obtaining an accurate estimate of the parameter (or parameters) governing the long-term behaviour of the process, and while maximum likelihood is asymptotically efficient in this context, that result as always depends on the correct specification of a parametric model. An alternate approach looks for an adequate approximating model; with a finite-order autoregression being a computationally convenient candidate whose asymptotic properties in the context of certain classes of data generating processes are well-known. However, an exception has, until recently, been the classes of processes exhibiting strong dependence, in which case standard asymptotic results no longer apply. Yet it is in these cases that it might be most useful to have recourse to a valid approximating model; either in its own right, or as the basis for subsequent estimation. For instance, Martin and Wilkinson (999) consider an efficient method of moments, or indirect, estimator constructed using an autoregression as an auxiliary model. The consequences of fitting an autoregression to processes exhibiting long memory is considered by Poskitt () under regularity conditions that allow for both non-invertible and fractionally integrated processes. This paper examines the empirical properties of the AR approximation, particularly the finite-sample properties of alternative estimators of the AR parameters of the approximating AR(h) process and corresponding estimates of the optimal approximating order h. Following Poskitt (), let y(t) for t Z denote a linearly regular, covariance-stationary process, y(t) = k(j)ε(t j) (.) j= where ε(t), t Z, is a zero mean white noise process with variance σ and the impulse response coefficients satisfy the conditions k() = and j k(j) <. ε(t) further satisfies Assumption of Poskitt (); implying that the minimum mean squared error predictor of y(t) is the linear predictor ȳ(t) = ϕ(j)y(t j). (.) j= The minimum mean squared error predictor of y(t) based only on the finite past is then h ȳ h (t) = φ h (j)y(t j) ; (.) j=

Nonstandard Autoregressive Approximation where the coefficients φ h (),..., φ h (h) satisfy the Yule-Walker equations h j= φ h (j)γ(j k) = δ k σh, k =,,..., h, (.) in which φ h () =, γ(τ) = γ( τ) = E[y(t)y(t τ)], τ =,,... is the autocovariance function of the process y(t), δ k is Kronecker s delta, and σ h = E[ ɛ h (t) ] (.) is the minimising value of the prediction error variance associated with the prediction error ɛ h (t) = h φ h (j)y(t j). (.) j= The use of finite-order AR models to approximate an unknown (but suitably regular) process is accordingly based on the idea that the optimal predictor ȳ h (t) determined from the autoregressive model of order h will form a good approximation to the infinite-order predictor ȳ(t) for sufficiently large h. However, established results on the estimation of autoregressive models when h with the sample size T are generally built on the assumption that the process admits an infinite autoregressive representation with coefficients that tend to zero at an appropriate rate, which is to say (i) the transfer function associated with the Wold representation (.) is invertible; and (ii) the coefficients of (.), or, equivalently, the autoregressive coefficients in (.), satisfy a suitable summability condition. This theory obviously precludes non-invertible processes (which of course fail condition (i), and so do not even have a infinite-order AR representation), and fractionally integrated processes, which fail condition (ii). Fractionally integrated processes, for which the notation I(d) is commonly used, were introduced by Granger and Joyeux (9), and were independently described in Hosking (9), and it has been shown that they exhibit dynamic behaviour very similar to that observed in many empirical time series. The class of fractionally integrated I(d) processes can be characterized by the specification y(t) = κ(l) ( L) d ε(t) where L is the lag operator and κ(z) = j κ(j)zj. A detailed description of the properties of such processes can be found in Beran (99). For our purposes here we note here that the impulse response coefficients of the Wold representation (.) characterized by k(z) are not absolutely summable for any d >. Furthermore, the autocovariance function declines at a hyperbolic rate, γ(τ) Cτ d, C, as τ, and not at an exponential rate as it would for a stable and invertible ARM A process. However, provided κ(z) is absolutely summable, it can be shown that the coefficients of k(z) are square-summable ( j k(j) < ) if d <., in which case y(t) is well-defined as the limit in mean square of a covariancestationary process.

Nonstandard Autoregressive Approximation Poskitt () derives asymptotic results for autoregressive modelling in both the noninvertible and fractionally-integrated cases, providing a theoretical underpinning for the use of finite-order autoregressive approximations in these instances. We now consider the finitesample properties of alternative estimators of the AR parameters of the approximating AR(h) process and corresponding estimates of the optimal approximating order h. The paper proceeds as follows. In Section we outline the various autoregressive estimation techniques to be considered. Section lists some of the fundamental results that underly the statistical properties of the estimators considered in Section. Details of the simulation study are given in Section. Results are presented in Section. Section 7 closes the paper. Model Fitting We wish to fit an autoregression of order h to a realisation of T observations from an unknown process, denoted y t, t =,..., T. The model to be estimated is therefore h y t = φ h (j)y t j + e t ; (.) j= which we may write as e t = Φ(L)y t, where Φ(z) = + φ h ()z + + φ h (h)z h is the h ) th -order prediction error filter. We define the normalised parameter vector φ h = ( φ h where φ h = (φ h (),..., φ h (h)). A variety of techniques have been developed for estimating autoregressions. MATLAB, for instance, offers at least five, including the standards, Yule-Walker and Least Squares, plus two variants of Burg s (9) algorithm, and a Forward-Backward version of least squares. Each of these techniques is reviewed below. Note that in the following ˆφ h merely indicates an estimator of φ h ; in this section it will be clear from the context which estimator is meant.. Estimation Procedures for Autoregression.. Yule-Walker As already observed, the true AR(h) coefficients (i.e., those yielding the minimum mean squared error predictor based on y t,..., y t h ) correspond to the solution of the Yule-Walker equations (.). Rewriting (.) in matrix-vector notation yields Γ h φ h = γ h (.) where Γ h is the Toeplitz matrix with (i, j) th element equal to γ(i j), i, j =,..., h (which for convenience we will write as Γ h = toeplitz (γ(),..., γ(h ))), and γ h = (γ(),..., γ(h)). Yule-Walker estimates of the parameters of (.) are obtained by substituting the sample autocorrelation function (ACF) into (.) and solving for ˆφ h : ˆφ h = R h r h

Nonstandard Autoregressive Approximation where R h = toeplitz (r(),..., r(h )), r h = (r(), r(),..., r(h)), r(k) = ˆγ(k)/ˆγ(), and ˆγ(k) = T T t=k+ (y t ȳ) (y t k ȳ) is the sample autocovariance at lag k. The innovations variance is then estimated as h ˆσ h = ˆγ() + ˆφ h (j)ˆγ(j). j= This estimator has the advantage that it can be readily calculated without requiring matrix inversion via Levinson s (97) recursion, and being based on Toeplitz calculations the corresponding filter ˆΦ h (z) will be stable. However, while the Yule-Walker equations give the minimum mean squared error predictor given the actual ACF of the underlying process, this is not the case when based on sample autocorrelations. Hence the Yule-Walker variance estimate ˆσ h does not in general minimize the empirical mean squared error. We also note that the Yule-Walker estimator of φ h is well known to suffer from substantial bias in finite samples, even relative to the Least-Squares approach discussed below. Tjøstheim and Paulsen (9) present theoretical and empirical evidence of this phenomenon and show that when y(t) is a finite autoregression then the first term in an asymptotic expansion of the bias of ˆφ h has order of magnitude O(T ) but the size of the constant varies inversely with the distance of the zeroes of the true autoregressive operator from the unit circle. Hence, when the data generating mechanism shows strong autocorrelation it is possible for the bias in the Yule-Walker coefficient estimates to be substantial. Given that fractional processes can display long-range dependence with autocovariances that decay much slower than exponentially, similar effects are likely to be manifest when employing the Yule-Walker estimates under the current scenario a-fortiori; and indeed Poskitt () presents some Monte-Carlo evidence demonstrating this effect... Least-Squares Least Squares is perhaps the most commonly-used estimation technique, with implementations on offer in just about every numerical package and application. In this case (.) is fitted by minimizing the sum of squared errors T t=h+ ê t, where ê t = y t ŷ t, and ŷ t = h ˆφ h (j)y t j j= is the h th -order linear predictor. In other words, the forward prediction error is minimized in the least squares sense. This corresponds to solving the normal equations M h ˆφh = m h Often referred to as Durbin-Levinson recursion (see Durbin, 9). For a summary of the algorithm see Brockwell & Davis,..

Nonstandard Autoregressive Approximation where and T M h = t=h+ y t. y t h T m h = t=h+ ) (y t... y t h y t y t. y t h. Note that, following standard practice, the estimator presented here is based on the last T h values of y; i.e., on y t, t = h +,..., T, making the effective sample size T h. An obvious alternative is to take the range of summation for the least squares estimator as t =,..., T, and assume the pre-sample values y h,..., y are zero. The effect of the elimination of the initial terms is, for given h, asymptotically negligible, but may well have significant impact in small samples. By way of contrast with the Yule-Walker estimator, the least square estimate of the variance, ˆσ = (T h) T t=h+ (y t ŷ t ) minimizes the observed mean squared error but there is no guarantee that ˆΦ h (z) will be stable. While the Least Squares and Yule-Walker estimators of φ h and σh are shown to be asymptotically equivalent under the regularity conditions employed in Poskitt (), that paper, and the more extensive work presented here, shows their finite-sample behaviour to be quite different... Least-Squares (Forward-backward) The conventional least squares approach discussed above obtains ˆφ h such that the sum of squared forward prediction errors is minimized. SSE = T t=h+ y t + h φ(j)y t j j= However, we can also define a E based on the equivalent time-reversed formulation; i.e., we now minimize the sum of squared backward prediction errors, SSE = T h t= y t + h φ(j)y t+j The combination of the two yields forward-backward least squares (FB), sometimes called the modified covariance method, in which ˆφ h is obtained such that SSE + SSE is j=.

Nonstandard Autoregressive Approximation minimized. The normal equations are now M h ˆφh = m h with T M h = t=h+ y t. y t h and T m h = ) T h (y t... y t h + t=h+ t= y t+. y t+h y t y t y T h t y t+. +.. t= y t y t h y t y t+h ) (y t+... y t+h This may be thought of as stacking a time-reversed version of y t ; i.e., y t for t = T h,...,, on top of y t, t = h +,..., T, and regressing the resulting (T h)-vector on its first h lags. See Kay (9, Chpt.7) or Marple (97, Chpt.) for further details... Burg s method The Burg estimator for the coefficients of an autoregressive process (Burg, 97, 9) is not well known in the econometrics literature. It does however, have several nice features, chief among which is that parameter stability is imposed without the sometimes large biases involved in Yule-Walker estimation. As we shall see, its properties in that regard tend to mimic those of Least-Squares; making it something of a best of both worlds estimator. The estimator is based on a maximum entropy approach to spectral analysis, resulting in prediction error filter coefficients which solve Γ h φ h = v h (.) where in this case Γ h = toeplitz (γ(),..., γ(h)), and v h = (v h,,..., ). v h is the output power of prediction error filter Φ h ; that is, the mean-squared error of the order-h autoregression. Burg (9) outlined a recursive scheme for solving (.); later formalized by Andersen (97). Essentially, (.) is solved via Durbin-Levinson recursion as per the Yule-Walker procedure, except that the partial autocorrelation coefficient at each stage (m, say) is now obtained by minimizing the average of the forward and backward mean squared prediction errors described in the previous subsection. The so-called geometric Burg procedure differs only in that the m th -order partial autocorrelation coefficient φ mm is calculated as φ mm = T m t= b mt b mt / T m t= b mt b mt rather than as φ mm = T m t= b mt b mt / T m t= ( b mt + b mt) Fortran code implementing Andersen s procedure is given in Ulrych and Bishop (97).

Nonstandard Autoregressive Approximation 7 (notation as per Andersen). This corresponds to obtaining the m th -order PAC by minimizing the geometric rather than arithmetic mean of the forward and backward MSE.. Selecting the optimal AR order Undoubtedly more important in determining the accuracy or otherwise of any autoregressive approximation than a particular choice of model fitting technique, is the choice of h, the order of the approximating model. If we suppose, for a moment, that the order-h prediction error variance, σh, is known to us (i.e., we know the theoretical ACF), then we might also suppose the existence of an optimal order for the AR approximation, h ; where h corresponds to that value of h which minimizes a suitably-penalized function of σh. This value may then be taken as the basis for comparison of the estimation techniques under consideration. Since we now require a model selection criterion, an obvious candidate is that due to Akaike (97); namely, AIC(h) = ln(σ h ) + h/t, where the parameter σh replaces the MSE measure used in empirical applications. Further criteria in this style were subsequently proposed by Schwarz (97) and Hannan and Quin (979). Schwarz s criterion, in particular, is known to be consistent if the true model (DGP) is among those in the selection set. In our case, of course, we are looking for a way to set the order of an approximating model, so consistency arguments along these lines cannot apply. Poskitt () instead considers the figure of merit function L T (h) = (σ h σ ) + hσ /T proposed by Shibata (9) in the context of fitting autoregressive models to a truly infiniteorder process. Shibata showed that if an AR(h) model is fitted to a stationary Gaussian process that has an AR( ) representation and this model is then used to predict an independent realization of the same process then the difference between the mean squared prediction error of the fitted model and the innovation variance converges in probability to L T (h). We might therefore define h T as the value of h that minimizes L T (h). Simulation Experiment We will initially focus our attention on the simplest of non-invertible and fractionally integrated processes: in the first instance the first-order moving-average process y(t) = ε(t) ε(t ) ; (.) and in the second, the fractional noise process y(t) = ε(t) / ( L) d, < d <.. (.) We cannot, of course, minimize σ h itself, as this is monotonic decreasing in h, and in fact equals σ in the limit as h.

Nonstandard Autoregressive Approximation In both cases ε(t) will be taken to be Gaussian white noise with unit variance. The theoretical ACF s of these processes are well known: for (.) we have γ() =, γ() =, and zero otherwise. For (.) the ACF is as given in (for instance) Brockwell and Davis (99,.), and accordingly very simply computed, for k >, via the recursion initialized at γ() = Γ( d)/γ ( d). γ(k) = γ(k ) k + d k d, Knowledge of the ACF allows both simulation of the process itself and computation of the coefficients of the h-step ahead linear filter via Levinson recursion. As we might expect, for the simple models considered here the coefficient solutions simplify very nicely: for model (.) we have: φ h (j) = h + j h + while for the fractional noise process (.) the coefficients are given by the recursion (j d)(h j) φ h (j + ) = φ h (j), j =,,,... (j + )(h d j) We also note that for model (.) the prediction error variance falls out of the Levinson recursion as from which we can quickly deduce For the fractional noise models we have σ h { σh = σ + } h + h T = T. Γ(h + )Γ(h + d) = σ Γ, (h + d) in which case h T is obtained most simply by calculating L T (h) for h =,,..., stopping when the criterion starts to increase. The first stage of the simulation experiment is based around comparing the properties of alternative estimators of the parameters of the optimal autoregressive approximation, where optimal is here defined in terms of minimizing Shibata s figure of merit, L T (h). In the second stage we consider the problem of empirically choosing the order of the approximation, viewed both as a problem in model selection, and in terms of estimating the theoretically optimal order h T., Although there are any number of selection criteria developed for this purpose (the three mentioned in. being only some of the better known examples), we shall begin by considering the most obvious choice, the Akaike information criterion, or AIC. Accordingly, having obtained h for each model and sample size, we then estimate it by finding the value h that minimizes the empirical AIC AIC(h) = ln(ˆσ h ) + h/t,

Nonstandard Autoregressive Approximation 9 where ˆσ h is the mean squared error delivered by each of the five estimation techniques. We will denote this empirically optimal value by ĥaic T.. Monte Carlo Design The simulation experiments presented here are based on a total of five data generating mechanisms: the non invertible moving average process (.), and the fractional noise process (.) with d =.,.,.7 and., labelled as follows: Model Description MA Non invertible MA() as per (.) FN Fractional Noise as per (.), with d =. FN Fractional Noise as per (.), with d =. FN7 Fractional Noise as per (.), with d =.7 FN Fractional Noise as per (.), with d =. The fractional noise processes listed here are all stationary with increasing degrees of longrange dependence; however, for d <. the distribution of T (ˆγ T (τ) γ(τ)) is asymptotically normal, while for d. the autocovariances are no longer even T consistent. Results for the d =. case are therefore expected to differ qualitatively from those for which d >.. For all processes ε(t) is standard Gaussian white noise (σ =.). For each process we considered sample sizes T =,,, and. The maximum AR order for the model search phase was set at H T = T, and all results based on N = replications. The optimal autoregressive approximation for each DGP and sample size is obtained by calculating h T as per., with the parameters (the coefficient vector φ h = (φ h (),..., φ h (h )) and the corresponding MSE σ h ) following as outlined above. The experiment is conducted as follows: for each replication r =,,..., N:. A data vector of length T is generated according to the selected design.. The parameters of the AR(h ) approximation are estimated by each of the five methods described in section. These estimates will be denoted ˆφ h and ˆσ h.. The best empirical AR order, ĥ, is obtained for each of the five methods by estimating autoregressions of all orders h =,,..., H T and computing the corresponding AIC. ĥ is taken to be the value of h that yields the smallest AIC in each case.. Summary statistics are computed as described below in., and saved for subsequent computation of the empirical distributions. See Hosking (99) for details. We omit the subscript T when using h in this context. Yule-Walker (), Least-Squares (), Burg s Method (BURG), Geometric Burg (GEOB), and Forward-Backward least squares (FB).

Nonstandard Autoregressive Approximation. Summary statistics In the first phase we set h = h T, and then calculate, for each Monte Carlo realization and each estimation technique,. the estimated coefficients ˆφ h = ( ˆφ h (),..., ˆφ h (h)). the average error ē = (T h) T t=h+ êt, where ê t = y t + h ˆφ h (j)y t j j=. the residual variance s = (T h) T t=h+ (ê t ē). the estimation error ˆφ h (j) φ h (j), j =,..., h. the squared and absolute estimation error ( ) ˆφh (j) φ h (j) and ˆφh (j) φ h (j), j =,..., h. the sum of the error in estimating the vector φ h : h ( ˆφh (j) φ h (j)) j= 7. and the sum of the relative error in estimating φ h : h j= ˆφ h (j) φ h (j) φ h (j). In the second phase ĥ is obtained for each Monte Carlo realization and each estimation technique as the value of h corresponding to the smallest AIC(h), h =,,..., H T. Empirical Distributions The N realized values for each statistic described in are used to obtain a kernel density estimate of the associated distribution. The bandwidth is 7% of the over-smoothed bandwidth (see Wand and Jones, 99); i.e., h =.7 N s(x) where s(x) is the standard deviation of the N series x.. The optimal AR order The relative frequency of occurrence of the empirical order selected by minimizing the AIC is presented in Table, and in Figures and.

Nonstandard Autoregressive Approximation Table displays the AR order selected by minimizing AIC, ĥaic T, averaged over N = Monte-Carlo realizations, by estimation method, model, and sample size. Shibata s h, and the theoretical h AIC are included for comparative purposes. Figure presents the relative frequency of ĥaic T for the Least-Squares, Forward-Backward, Yule-Walker, and Burg estimators when T = ; plots the same quantities for T =. The maximum order is H T = T in each case. The results for Geometric-Burg are indistinguishable from those for Burg on this scale, and so are omitted for clarity. It is notable that that the average AIC-selected order is generally quite close to h T, and in all cases much closer to h T than to h AIC. ĥaic T is invariably largest for, and smallest for, with the difference diminishing with increasing sample size. However, the distribution of ĥaic T is highly skewed to the right, with the degree of skewness being greatest for smaller d and least for the non-invertible Moving Average. The dispersion of ĥaic T about h T is correspondingly large, increasing with d, and being greatest for the non-invertible MA. The figures also show that the higher average ĥaic T for is caused by a greater proportion of large orders being selected, with, for T =, the distribution of ĥ AIC T for not quite falling away to zero by h = H T. As T increases the difference between the distributions of ĥaic T for each of the five estimators becomes negligible, and the distributions become more concentrated around h T, in accordance with the predictions of Section of Poskitt ().. Autoregressive coefficients Turning to the empirical distributions of the coefficient estimates themselves, we focus on the estimation error (bias) in the first and last coefficients; i.e., (ˆφ h () φ()) and ( ˆφ h (h) φ(h)) respectively, and the sum of the estimation errors in all h coefficients, h j= ( ˆφ h (j) φ(j)). h equals h T in all cases. The density estimates are constructed from the simulated values using a Gaussian kernel with bandwidth equal to 7% of the over-smoothed bandwidth i.e.,.7 ξ (/N) where ξ is the empirical standard deviation of the relevant quantity observed over the N replications, see Wand and Jones (99). Although results are obtained for all five estimators, only the Yule-Walker results are distinguishable from Least-Squares in the plots, so only these two are presented graphically. Each figure also includes a plot of the Normal distribution with zero mean and variance equal to the largest observed variance of the quantity being plotted. Beginning with the behaviour of the various estimators of the first and last coefficients (figures ), we observe that, as we might expect, departures from normality worsen as d increases, with the worst case represented by the non-invertible MA. More notable is that the degree of non-normality increases with sample size; the distributions become noticeably less symmetric, with, for the distribution ˆφ h (), an interesting bump appearing on the left-hand side. Only for the FN and MA models is there much difference between the estimators, with the Yule-Walker results typically appearing less normal than the Least- Squares. Turning to the distribution of the total coefficient error h j= ( ˆφ h (j) φ(j)), comparison of the estimated distributions (figures ) with a normal curve of error with zero mean

Nonstandard Autoregressive Approximation and variance ξ indicates that when d =. the distribution is reasonably close to normal for all estimators. When d >., however, the presence of the Rosenblatt process in the limiting behaviour of the underlying statistics is manifest in a marked distortion in the distribution relative to the shape anticipated of a normal random variable, particularly in the right hand tail of the distribution. This distortion is still present when T = and does not disappear asymptotically. The situation with the moving average process is a little different (figure ), firstly in that the marked skew to the right is not evident, and secondly in the degree of difference between Yule-Walker and the other estimators. The Yule-Walker estimator seems to result in a considerable negative bias in the coefficient estimates when summed over the coefficient vector, and this bias becomes worse as T increases. Finally, we consider the standardised coefficient difference, ˆϕ λ,t = T λ h Γ h(ˆφ h φ h )/(λ h (Φ h h Φ h )λ h), shown by Poskitt () to have a standard normal limiting distribution (notation as per Poskitt,, ). Figure plots the observed distribution of ˆϕ λ,t, h = h T, with λ h = (,,...,, ), and ˆφ h obtained from realizations of the fractionally-integrated process y(t) = ε(t)/( z) d, with d =. and., for T =. The empirical distributions are overlayed with a standard normal density. Although some bias is still apparent even at this sample size, more so for the Yule-Walker estimator than for Least-Squares, kurtosis and skewness of the type observed previously with this process has now gone. Figure plots the same quantities for d =.7 and varying T. Conclusion

Nonstandard Autoregressive Approximation Appendix A: Figures Relative frequency (a) FN. FB Burg h * (b) FN. Relative frequency Relative frequency h * (c) FN7.. h * (d) FN.. h * (e) MA. h * Selected order h Figure : Relative frequency of occurrence of h AIC T, T =, for the fractional noise process y(t) = ε(t)/( z) d with (a) d =. (b) d =. (c) d =.7 and (d) d =. and (e) the moving average process y(t) = ε(t) ε(t ).

Nonstandard Autoregressive Approximation Relative frequency... (a) FN h * (b) FN Burg.. h * (c) FN7 Relative frequency... h * (d) FN. Relative frequency... h * (e) MA h * Selected order h Figure : Relative frequency of occurrence of h AIC T, T =, for the fractional noise process y(t) = ε(t)/( z) d with (a) d =. (b) d =. (c) d =.7 and (d) d =. and (e) the moving average process y(t) = ε(t) ε(t ).

Nonstandard Autoregressive Approximation (a) d=. (b) d=. N(,ξ ) N(,ξ ).... (c) d=. (d) MA N(,ξ ) N(,ξ ).... Figure : Empirical distribution of ( ˆφ h () φ h ()) for the fractional noise process y(t) = ε(t)/( z) d with (a) d =. (b) d =. (c) d =., and (d) the moving average process y(t) = ε(t) ε(t ), h = h T and T =. (a) d=. N(,ξ ) (b) d=. N(,ξ )........ (c) d=. N(,ξ ) (d) MA N(,ξ )........ Figure : Empirical distribution of ( ˆφ h () φ h ()) for the fractional noise process y(t) = ε(t)/( z) d with (a) d =. (b) d =. (c) d =., and (d) the moving average process y(t) = ε(t) ε(t ), h = h T and T =.

Nonstandard Autoregressive Approximation (a) d=. N(,ξ ) (b) d=. N(,ξ ).......... (c) d=. N(,ξ ) (d) MA N(,ξ ).......... Figure : Empirical distribution of ( ˆφ h () φ h ()) for the fractional noise process y(t) = ε(t)/( z) d with (a) d =. (b) d =. (c) d =., and (d) the moving average process y(t) = ε(t) ε(t ), h = h T and T =. (a) d=. N(,ξ ) (b) d=. N(,ξ )........ (c) d=. (d) MA N(,ξ )........ Figure : Empirical distribution of ( ˆφ h () φ h ()) for the fractional noise process y(t) = ε(t)/( z) d with (a) d =. (b) d =. (c) d =., and (d) the moving average process y(t) = ε(t) ε(t ), h = h T and T =.

Nonstandard Autoregressive Approximation 7 (a) d=. (b) d=. N(,ξ ) N(,ξ ).......... (c) d=. (d) MA.......... Figure 7: Empirical distribution of ( ˆφ h (h) φ h (h)) for the fractional noise process y(t) = ε(t)/( z) d with (a) d =. (b) d =. (c) d =., and (d) the moving average process y(t) = ε(t) ε(t ), h = h T and T =. (a) d=. (b) d=. N(,ξ ) N(,ξ )........ (c) d=. (d) MA N(,ξ ) N(,ξ )........ Figure : Empirical distribution of ( ˆφ h (h) φ h (h)) for the fractional noise process y(t) = ε(t)/( z) d with (a) d =. (b) d =. (c) d =., and (d) the moving average process y(t) = ε(t) ε(t ), h = h T and T =.

Nonstandard Autoregressive Approximation (a) d=. N(,ξ ) (b) d=. N(,ξ ).......... (c) d=. (d) MA N(,ξ ).......... Figure 9: Empirical distribution of ( ˆφ h (h) φ h (h)) for the fractional noise process y(t) = ε(t)/( z) d with (a) d =. (b) d =. (c) d =., and (d) the moving average process y(t) = ε(t) ε(t ), h = h T and T =. (a) d=. N(,ξ ) (b) d=. N(,ξ )...... (c) d=. N(,ξ ) (d) MA N(,ξ )...... Figure : Empirical distribution of ( ˆφ h (h) φ h (h)) for the fractional noise process y(t) = ε(t)/( z) d with (a) d =. (b) d =. (c) d =., and (d) the moving average process y(t) = ε(t) ε(t ), h = h T and T =.

Nonstandard Autoregressive Approximation 9 (a) d=. N(,ξ ) (b) d=. N(,ξ )............ (c) d=.7 N(,ξ ) (d) d=. N(,ξ )............ Figure : Empirical distribution of h j= ( ˆφ h (j) φ(j)) for the fractional noise process y(t) = ε(t)/( z) d with (a) d =. (b) d =. (c) d =.7 and (d) d =., h = h T =,, and respectively, and T =. 7 (a) d=. N(,ξ ) 7 (b) d=. N(,ξ ).......... 7 (c) d=.7 N(,ξ ) 7 (d) d=. N(,ξ ).......... Figure : Empirical distribution of h j= ( ˆφ h (j) φ h (j)) for the fractional noise process y(t) = ε(t)/( z) d with (a) d =. (b) d =. (c) d =.7 and (d) d =., h = h T =, 7, and respectively, and T =.

Nonstandard Autoregressive Approximation 7 (a) d=. N(,ξ ) (b) d=. N(,ξ )......... 7 (c) d=.7 N(,ξ ) (d) d=. N(,ξ )......... Figure : Empirical distribution of h j= ( ˆφ h (j) φ h (j)) for the fractional noise process y(t) = ε(t)/( z) d with (a) d =. (b) d =. (c) d =.7 and (d) d =., h = h T =, 9, and respectively, and T =.. (a) T =. (b) T =............. (c) T =..... (d) T = N(,ξ ) Figure : Empirical distribution of h j= ( ˆφ h (j) φ h (j)) for the moving average process y(t) = ε(t) ε(t ) for h = h T and (a) T = (b) T = (c) T = and (d) T =.

Nonstandard Autoregressive Approximation... (a) d=. N(,)... (b) d=. N(,)............ Standardised value Standardised value Figure : Observed distribution of ϕ λ,t for (a) y(t) = ε(t)/( z)., and (b) y(t) = ε(t)/( z)., when λ h = (,,...,, ), T =. (a) T = (b) T =.... N(,).... N(,) (c) T = (d) T =.... N(,).... N(,) Figure : Observed distribution of ϕ λ,t for y(t) = ε(t)/( z).7, when λ h = (,,...,, ), for (a) T = (b) T = (c) T = and (d) T =.

Nonstandard Autoregressive Approximation Appendix B: Tables Table AIC-based estimates of h. Average in N = replications, by estimation method, model, and sample size. Estimation method Model: y(t) = T h h AIC FB Burg GB..... 9 9. 7..7.. ( L)ε(t) 9..9.... 9.9.7.7.7.99 9.9..9.9..7.7...7.7.9.7.7 ε(t) ( L)...9..99.99..9......9.. ε(t) ( L). ε(t) ( L).7 ε(t) ( L)..7.7..9.9.7..7...9.7.9.77.77 7 7.9 7.9 7. 7. 7. 9 7.9.......9...9.9.....7...9.....99.7.7.7.7...7.....7...7.99.9.9.9 7 9.9 9. 9. 9. 9..7....

Nonstandard Autoregressive Approximation Table Estimation error in the coefficients, averaged over the coefficient vector (h h j= ( ˆφ h (j) φ h (j)), h = h T ). Average in N = replications, by estimation method, model, and sample size. Estimation method Model d T h FB Burg GB MA.7 -. -.9.7.7 9 -. -.9 -.7.. -.9 -. -. -. -. -. -. -. -.9 -.7 -.7 -. -.7 -.9 -.97 FN...7....7..77.77.7..97...............7.7...7..7.99....9..9.9 7..... 9..77....7.7.7.9....7.9...9.....7.79.7....7.........7.7.99.7.........9.....7...

Nonstandard Autoregressive Approximation Table Mean-squared estimation error in the coefficients, averaged over the coefficient vector (h h j= ( ˆφ h (j) φ h (j)), h = h T ). Average in N = replications, by estimation method, model, and sample size. Estimation method Model d T h FB Burg GB MA.9.9.7.. 9.7.......9....99........ FN...9.7.7.7...........................7..7...7..7.. 7..... 9......7.7..9.7.....7.7..9...............79..7.7.7.9.9.7.................

Nonstandard Autoregressive Approximation Table Estimation error in the first coefficient (ˆφ h () φ h (), h = h T ). Average in N = replications, by estimation method, model, and sample size. Estimation method Model d T h φ h () FB Burg GB MA.7 -. -. -. -. -. 9.9 -.7 -. -. -.77 -.7.9 -. -. -. -. -..9 -. -. -. -. -..9 -. -. -. -. -. FN. -.9..7.7.7. -.9.7.... -.9..... -. -.9 -. -. -. -. -.9 -. -. -. -. -.. -.9...9.99.9 -....99.97.97 -...... 7 -...7... 9 -.......7 -....9.. -...9... -..9.... -.9..7... -.7..9.... -.9.7...7.7 -.9.7 -.7... -.. -.... -.7. -.... -.9. -....

Nonstandard Autoregressive Approximation Table Estimation error in the partial autocorrelation coefficient (ˆφ h (h) φ h (h), h = h T ). Average in N = replications, by estimation method, model, and sample size. Estimation method Model d T h φ h (h) FB Burg GB MA.9. -.7...7 9. -. -. -. -. -..7 -. -. -. -.9 -.. -. -.7 -. -. -.. -.9 -.7 -.9 -.9 -.9 FN. -.9..7.7.7. -.9.7.... -.9..... -.7..... -....... -.7..9...7 -...9... -...... 7 -..9..9.9.9 9 -.......7 -.9..9... -...7.7.. -..99..99.99.99 -.9.7.7.7.7.7 -....9.9.9. -.7.77....7 -.99..... -...7... -.7..... -...9...

Nonstandard Autoregressive Approximation 7 References Akaike, H. (97). Statistical predictor identification. Annals of Institute of Statistical Mathematics 7. Andersen, N. (97). On the calculation of filter coefficients for maximum entropy spectral analysis. Geophysics 9 9 7. Beran, J. (99). Statistics for Long Memory Processes. Chapman and Hall, New York. Beran, J. (99). Maximum likelihood estimation of the differencing parameter for invertible short and long memory autoregressive integrated moving average models. Journal of the Royal Statistical Society B 7 7. Brockwell, P. L. and Davis, R. A. (99). Time Series: Theory and Methods. nd ed. Springer Series in Statistics, Springer-Verlag, New York. Burg, J. (97). Maximum entropy spectral analysis. Paper presented at the 7th International Meeting, Society of Exploratory Geophysics, Oklahoma City, Oklahoma. Burg, J. (9). A new analysis technique for time series data. Paper presented at the Advanced Study Institute on Signal Processing, N.A.T.O., Enschede, Netherlands. Durbin, J. (9). The fitting of time series models. Review of International Statistical Institute. Fox, R. and Taqqu, M. S. (9). Large sample properties of parameter estimates for strongly dependent stationary gaussian time series. Annals of Statistics 7. Granger, C. W. J. and Joyeux, R. (9). An introduction to long-memory time series models and fractional differencing. Journal of Time Series Analysis 9. Hannan, E. J. and Quin, B. G. (979). The determination of the order of an autoregression. Journal of Royal Statistical Society B 9 9. Hosking, J. R. M. (9). Fractional differencing. Biometrika 7. Hosking, J. R. M. (99). Asymptotic distributions of the sample mean, autocovariances, and autocorrelations of long memory time series. Journal of Econometrics 7. Kay, S. M. (9). Modern Spectral Estimation. Prentice-Hall. Levinson, N. (97). The Wiener RMS (root mean square) error criterion in filter design and prediction. Journal of Mathematical Physics 7. Marple, S. L. (97). Digital Spectral Analysis with Applications. Prentice-Hall. Martin, V. L. and Wilkinson, N. P. (999). Indirect estimation of ARFIMA and VARFIMA models. Journal of Econometrics 9 9 7.

Nonstandard Autoregressive Approximation Poskitt, D. (). Autoregressive approximation in non-standard situations: The invertible and fractionally-integrated cases. working paper /, Department of Econometrics and Business Statistics, Monash University, Melbourne, Australia. Robinson, P. M. (99). Log periodogram regression of time series with long memory. Annals of Statistics 7. Schwarz, G. (97). Estimating the dimension of a model. Annals of Statistics. Shibata, R. (9). Asymptotically efficient selection of the order of the model for estimating parameters of a linear process. Annals of Statistics 7. Sowell, F. (99). Maximum likelihood estmation of stationary univariate fractionally integrated time series models. Journal of Econometrics. Tjøstheim, D. and Paulsen, J. (9). Bias of some commonly-used time series estimators. Biometrika 7 9. Ulrych, T. J. and Bishop, T. N. (97). Maximum entropy spectral analysis and autoregressive decomposition. Reviews of Geophysics and Space Physics. Wand, M. and Jones, M. (99). Kernel Smoothing. Chapman and Hall.