Residual Bootstrap for estimation in autoregressive processes

Size: px

Start display at page:

Download "Residual Bootstrap for estimation in autoregressive processes"

Baldric Anderson
6 years ago
Views:

1 Chapter 7 Residual Bootstrap for estimation in autoregressive processes In Chapter 6 we consider the asymptotic sampling properties of the several estimators including the least squares estimator of the autoregressive parameters and the gaussian maximum likelihood estimator used to estimate the parameters of an ARMA process. The asymptotic distributions are often used for statistical testing and constructing confidence intervals. However the results are asymptotic, and only hold (approximately, when the sample size is relatively large. When the sample size is smaller, the normal approximation is not valid and better approximations are sought. Even in the case where we are willing to use the asymptotic distribution, often we need to obtain expressions for the variance or bias. Sometimes this may not be possible or only possible with a excessive effort. The Bootstrap is a power tool which allows one to approximate certain characteristics. To quote from Wikipedia Bootstrap is the practice of estimating properties of an estimator (such as its variance by measuring those properties when sampling from an approximating distribution. Bootstrap essentially samples from the sample. Each subsample is treated like a new sample from a population. Using these new multiple realisations one can obtain approximations for CIs and variance estimates for the parameter estimates. Of course in reality we do not have multiple-realisations, we are sampling from the sample. Thus we are not gaining more as we subsample more. But we do gain some insight into the finite sample distribution. In this chapter we will details the residual bootstrap method, and then show that the asymptotically the bootstrap distribution coincides with asymptotic distribution. The residual bootstrap method was first proposed by J. P. Kreiss (Kreiss (997 is a very nice review paper on the subject, (see also Franke and Kreiss (992, where an extension to AR( processes is also given here. One of the first theoretical papers on the bootstrap is Bickel and Freedman (98. There are several other boostrapping methods for time series, these include bootstrapping the periodogram, block bootstrap, bootstrapping the Kalman filter (Stoffer and Wall (99, Stoffer and Wall (2004 and Shumway and Stoffer (2006. These methods have not only been used for variance estimation but also determining orders etc. At this point it is worth mentioning methods Frequency domain approaches are considered in Dahlhaus and Janas (996 and Franke and Härdle (992 (a review of subsampling methods can be found in Politis et al. (

2 7. The residual bootstrap Suppose that the time series {X t } satisfies the stationary, causal AR process p X t = φ j X t j + ε t, j= where {ε t } are iid random variables with mean zero and variance one and the roots of the characteristic polynomial have absolute value greater than ( + δ. We will suppose that the order p is known. The residual bootstrap for autoregressive processes (i Let ˆΓ p = n X t X t and ˆγ p = n X t X t, (7. where X t = (X t,...,x t p+. We use ˆφ n = (ˆφ,..., ˆφ p == φ = (φ,...,φ p. ˆΓ p ˆγ p as an estimator of (ii We create the bootstrap sample by first estimating the residuals {ε t } and sampling from the residuals. Let p ˆε t = X t φ j X t j. (iii Now create the empirical distribution function based on ˆε t. Let ˆF n (x = I (,ˆεt](x. j= we notice that sampling from the distribution ˆF n (x, means observing ˆε t with probability (. (iv Sample independently from the distribution ˆF n (x n times. Label this sample as {ε + k }. (v Let X k = ε + k for k p and p X + k = φ j X + k j + ε k, p < k n. j= (vi We call {X + k }. Repeating step (vi,v N times gives us N bootstrap samples. To distinguish each sample we can label each bootstrap sample as ({(X + k (i }; i = p +,...,n. (vii For each bootstrap sample we can construct a bootstrap matrix, vector and estimator (Γ + p (i, (γ p + (i and (ˆφ + n (i = ((Γ + p (i (γ p + (i. (viii Using (ˆφ + n (i we can estimate the variance of ˆφ n φ with n n j= ((ˆφ + n (i ˆφ n and the distribution function of ˆφ n φ. 79

3 7.2 The sampling properties of the residual bootstrap estimator In this section we show that the distribution of n(ˆφ + n ˆφ n and n(ˆφ n φ asymptotically coincide. This means that using the bootstrap distribution is no worse than using the asymptotic normal approximation. However it does not say the bootstrap distribution better approximates the finite sample distribution of (ˆφ n φ, to show this one would have to use Edgeworth expansion methods. In order to show that the distribution of the bootstrap sample n(ˆφ + n ˆφ n asymptotically coincides with the asymptotic distribution of n(ˆφ n φ, we will show convergence of the distributions under the following distance d p (H, G = inf {E(X Y X H,Y G p } /p, where p >. Roughly speaking, if d p (F n, G n 0, then the limiting distributions of F n and G n are the same (see Bickel and Freedman (98. The case that p = 2 is the most commonly used p, and for p = 2, this is called Mallows distance. The Mallows distance between the distribution H and G is defined as d 2 (H, G = inf {E(X Y X H,Y G 2 } /2, we will use the Mallow distance. To reduce notation rather than specify the distributions, F and G, we let d p (X, Y = d p (H, G, where the random variables X and Y have the marginal distributions H and G, respectively. We mention that distance d p satisfies the triangle inequality. The main application of showing that d p (F n, G n 0 is stated in the following lemma, which is a version of Lemma 8.3, Bickel and Freedman (98. Lemma 7.2. Let α, α n be two probability measures then d p (α n, α 0 if and only if E αn ( X p = x p α n (dx E α ( X p = x p α(dx n. and the distribution α n converges weakly to the distribution α. Our aim is to show that d 2 ( n(ˆφ+ n ˆφ n, n(ˆφ n φ 0, which implies that their distributions asymptotically coincide. To do this we use ( n(ˆφ n φ = nˆγ p (ˆγ p ˆΓ p φ ( n(ˆφ + n ˆφ = n(γ + p (γ + p Γ + p ˆφ n. Studying how ˆΓ p, ˆγ p, Γ + p and γ + p are constructed, we see as a starting point we need to show d 2 (X + t, X t 0 t, n, d 2 (Z + t, Z t 0 n. We start by showing that d 2 (Z + t, Z t 0 80

4 Lemma Suppose ε + t is the bootstrap residuals and ε t are the true residuals. Define the discrete random variable J = {p +,...,n} and let P(J = k = n p. Then and E ( (ˆε J ε J 2 X,...,X n = Op ( n (7.2 d 2 ( ˆF n, F d 2 ( ˆF n, F n + d 2 (F n, F 0 as, (7.3 where F n = n n I (,ε t(x, ˆF n (x = n n p I (,ˆε t](x are the empirical distribution function based on the residuals {ε t } n p and estimated residuals {ˆε t } n p, and F is the distribution function of the residual ε t. PROOF. We first show (7.2. From the definition of ˆε + J and ε J we have E( ˆε J ε J 2 X,...,X n = = = p j,j 2 = (ˆε t ε t 2 p ( [ˆφ j φ j ]X t j 2 j= [ˆφ j φ j ][ˆφ j2 φ j2 ] X t j X t j2. Now by using (5.27 we have sup j p ˆφ j φ j = O p (n /2, therefore we have E ˆε J ε J 2 = O p (n /2. We now prove (7.3. We first note by the triangle inequality we have d 2 (F, F n d 2 (F, F n + d 2 ( ˆF n, F n. By using Lemma 8.4, Bickel and Freedman (98, we have that d 2 (F n, F 0. Therefore we need to show that d 2 ( ˆF n, F n 0. It is clear by definition that d 2 ( ˆF n, F n = d 2 (ε + t, ε t, where ε + t is sampled from ˆF n = n n I (,ˆε t(x and ε t is sampled from F n = n n I (,ε t(x. Hence, ε t ε + t have the same distribution as ε J and ˆε J. We now evaluate d 2 (ε + t, ε t. To evaluate d 2 (ε + t, ε t = inf ε + t ˆF n, ε t F n E ε + t ε t we need that the marginal distributions of (ε + t, ε t are ˆF n and F n, but the infimum is over all joint distributions. It is best to choose a joint distribution which is highly dependent (because this minimises the distance between the two random variables. An ideal candidate is to suppose that ε + t = ˆε J and ε t = ε J, since these have the marginals ˆF n and F n respectively. Therefore d 2 ( ˆF n, F n 2 = inf ε + t ˆF n, ε t F n E ε + t ε t 2 E ( (ˆε J ε J 2 X,...,X n = Op ( n, where the above rate comes from (7.2. This means that d 2 ( ˆF n, F n P 0, hence we obtain (7.3. Corollary 7.2. Suppose ε + t is the bootstrapped residual. Then we have E ˆFn ((ε + t 2 X,...,X n P E F (ε 2 t 8

5 PROOF. The proof follows from Lemma 7.2. and Lemma We recall that since X t is a causal autoregressive process, there exists some coefficients {a j } such that X t = a j ε t j, where a j = a j (φ = [A(φ j ], = [A j ], (see Lemma Similarly using the estimated parameters ˆφ n we can write X t + as X + t = a j (ˆφ n ε + t j, where a j (ˆφ n = [A(ˆφ n j ],. We now show that d 2 (X + t, X t 0 as n and t. Lemma Let J p+,...,j n be independent samples from { +,...,n} with P(J i = k = n p. Define Y + t = j=p+ a j (ˆφ n ε + J t j, Ỹ + t = j=p+ a j (ˆφ n ε + J t j, Ỹ t = j=p+ a j ε Jt j, Y t = Ỹt + j=t+p+ where ε Jj is a sample from {ε p+,...,ε n } and ˆε J is a sample from {ˆε p+,..., ˆε n }. Then we have a j ε t j, E ( (Y + t Ỹ + t 2 X,...,X n = Op ( n, d 2(Y + t, Ỹ + t 0 n, (7.4 and E ( (Ỹ + t Ỹt 2 X,...,X n = Op ( n, d 2(Ỹ + t, Ỹt 0 n, (7.5 E ( (Ỹt Y t 2 X,...,X n Kρ t, d 2 (Ỹt, Y t 0 n. (7.6 PROOF. We first prove (7.4. It is clear from the definitions that E ( (Y + t Ỹ + t 2 X,...,X n ([A(φ j ], [A(ˆφ n j ], 2 E((ε + j 2 X,...,X n. (7.7 Using Lemma 7.2. we have that E((ε + j 2 X,...,X n is the same for all j and E((ε + j 2 X,...,X n P E(ε 2 t, hence we will consider for now ([A(φ j ], [A(ˆφ n j ], 2. Using (5.27 we have (ˆφ n φ = O p (n /2, therefore by the mean value theorem we have [A(φ A(ˆφ n = (ˆφ n φd K n D (for some random matrix D. Hence A(ˆφ n j = (A(φ + K ( n Dj = A(φ j + A(φ K j n 82

6 (note these are heuristic bounds, and this argument needs to be made precise. Applying the mean value theorem again we have A(φ j ( + A(φ K n D j = A(φ j + K n D A(φj ( + A(φ K n Bj, where B is such that B spec K n D. Altogether this gives [A(φ j A(ˆφ n j ], K n D A(φj ( + A(φ K n Bj. Notice that for large enough n, ( + A(φ K n Bj is increasing slower (as n than A(φ j is contracting. Therefore for a large enough n we have for any +δ [A(φ j A(ˆφ n j ], < ρ <. Subsituting this into (7.7 gives E ( (Y + t Ỹ + t 2 X,...,X n hence d 2 (Ỹ + t, Y + t 0 as n. We now prove (7.5. We see that E ( (Ỹ + t K n /2 E((ε+ t 2 K n /2ρj, ρ j = O p ( n 0 n. Ỹt 2 X,...,X n = a 2 je(ˆε Jt j ε Jt j 2 = E(ˆε Jt j ε Jt j 2 Now by substituting (7.2 into the above we have E(Ỹ t + means that d 2 (Ỹ t +, Ỹt 0. Finally we prove (7.6. We see that E ( (Ỹt Y t 2 X,...,X n = j=t+ a 2 j. (7.8 Ỹt 2 = O(n, as required. This a 2 je(ε 2 t. (7.9 Using (2.7 we have E(Ỹt Y t 2 Kρ t, thus giving us (7.6. We can now almost prove the result. To do this we note that (ˆγ p ˆΓ p φ = ε t X t, (γ + p Γ + p ˆφ n = ε + t X+ t. (7.0 Lemma Let Y t, Y t +, Ỹ t + and Ỹt, be defined as in Lemma Define Γ p and Γ + p, γ p and γ p + in the same way as ˆΓ p and ˆγ p defined in (7., but using Y t and Y t + defined in Lemma 7.2.3, rspectively, rather than X t. We have that d 2 (Y t, Y + t {E(Y t Y + t 2 } /2 = O p (K(n /2 + ρ t, (7. 83

7 d 2 (Y t, X t 0, n, (7.2 and d 2 ( n( γp Γ p φ, n( γ + p Γ + p ˆφ n ne ( ( γ p Γ p φ ( γ + p Γ + p ˆφ n 2 0 n, (7.3 where Γ p, Γ + p, γ p and γ p + are defined in the same was as ˆΓ p, Γ + p, ˆγ p and γ p +, but with {Y t } replacing X t in Γ p and γ p and {Y t + } replacing X t + in Γ + p and γ p +. Furthermore we have E Γ + p Γ p 0, (7.4 d 2 ( ( γp Γ p φ, (γ p Γ p φ 0, E Γ p ˆΓ p 0 n. (7.5 PROOF. We first prove (7.. Using the triangle inequality we have {E ( (Ỹt Y + t 2 X,...,X n } /2 { ( E(Y t Ỹt 2 X,...,X n } /2 + { ( E(Ỹt Ỹ + t 2 X,...,X n } /2 +{E ( (Ỹ + t Y + t 2 X,...,X n } /2 = O(n /2 + ρ t, where we use Lemma we get the second inequality above. Therefore by definition of d 2 (X t, X + t we have (7.. To prove (7.2 we note that the only difference between Y t and X t is that the {ε Jk } in Y t, is sampled from {ε p+,...,ε n } hence sampled from F n, where as the {ε t } n in X t are iid random variables with distribution F. Since d 2 (F n, F 0 (Bickel and Freedman (98, Lemma 8.4 it follows that d 2 (Y t, X t 0, thus proving (7.2. To prove (7.3 we consider the difference ( γ p Γ p φ ( γ + p Γ + p ˆφ n and use (7.0 to get n { } ε t Y t ε + t Y+ t = n {(ε t ε +t Y t + ε +t (Y t Y +t }, where we note that Y t + = (Y t +,...,Y t p + and Y t = (Y t,...,y t p. Using the above, and taking conditional expectations with respect to {X,...,X n } and noting that conditioned on {X,...,X n }, (ε t ε + t are independent of X k and X + k for k < t we have { ( { 2 } /2 n E ε t Y t ε + t t } Y+ X,...,X n I + II where I = II = n {E ( (ε t ε + t 2 X,...,X n } /2 {E(Yt X 2,...,X n } /2 = {E ( (ε t ε + t 2 X,...,X n } /2 n n {E((Yt X 2,...,X n } /2 {E((ε + t 2 X,...,X n } /2 {E((Y t Y t + 2 X,...,X n } /2 = {E((ε + t 2 X,...,X n } /2 n {E((Y t Y t + 2 X,...,X n } /2. 84

8 Now by using (7.2 we have I Kn /2, and (7.3 and Corollary 7.2. we obtain II Kn /2, hence we have (7.3. Using a similar technique to that given above we can prove (7.4. (7.5 follows from (7.3, (7.4 and (7.2. Corollary Let Γ + p, ˆΓ p, ˆγ p and γ + p be defined in (7.. Then we have ( d 2 n(ˆγp ˆΓ p φ, n(γ p + Γ + ˆφ p n 0 (7.6 as n. d (Γ + p, ˆΓ p 0, (7.7 PROOF. We first prove (7.6. Using (7.3, (7.5 and the triangular inequality gives (7.6. To prove (7.7 we use (7.4 and (7.5 and the triangular inequality and (7.6 immediately follows. Now by using (7.7 and Lemma 7.2. we have Γ + p P E(Γ p, and by using (7.6, the distribution of n(γ + p Γ + p ˆφ n converges weakly to the distribution of n(ˆγp ˆΓ p φ. Therefore n(ˆφ+ n ˆφ n D N(0, 2Γ p, hence the distributions of n(ˆγ p ˆΓ p φ and n(γ + p Γ + p ˆφ n aymptotically coincide. From (5.28 we have n(ˆφ n φ D N(0, σ 2 Γ p. Thus we see that the distribution of n(ˆφ n φ and n(ˆφ + n ˆφ n asymptotically coincide. 85

Parameter estimation: ACVF of AR processes

Parameter estimation: ACVF of AR processes Yule-Walker s for AR processes: a method of moments, i.e. µ = x and choose parameters so that γ(h) = ˆγ(h) (for h small ). 12 novembre 2013 1 / 8 Parameter estimation: