Extreme inference in stationary time series

Size: px

Start display at page:

Download "Extreme inference in stationary time series"

Ginger Short
5 years ago
Views:

1 Extreme inference in stationary time series Moritz Jirak FOR 1735 February 8, / 30

2 Outline 1 Outline 2 Motivation The multivariate CLT Measuring discrepancies 3 Some theory and problems The problem 4 Special case: AR( ) revisited Numerical results 2 / 30

3 Motivation The multivariate CLT Introduction One of the most fundamental tools in probability theory and statistics is the multivariate CLT, 3 / 30

4 Motivation The multivariate CLT Introduction One of the most fundamental tools in probability theory and statistics is the multivariate CLT, X (d) k 1 n n k=1 X (d) k w B ( Γ ), is a sequence of d-dimensional zero mean, stationary r. v; Γ asymptotic covariance matrix. 3 / 30

5 Motivation The multivariate CLT Introduction One of the most fundamental tools in probability theory and statistics is the multivariate CLT, X (d) k 1 n n k=1 X (d) k w B ( Γ ), is a sequence of d-dimensional zero mean, stationary r. v; Γ asymptotic covariance matrix. Applications: Use it for model diagnosis, hypothesis testing, / 30

6 Motivation The multivariate CLT Introduction One of the most fundamental tools in probability theory and statistics is the multivariate CLT, X (d) k 1 n n k=1 X (d) k w B ( Γ ), is a sequence of d-dimensional zero mean, stationary r. v; Γ asymptotic covariance matrix. Applications: Use it for model diagnosis, hypothesis testing,.... More specific: Let X (n) = (X 1,..., X n ) be a sample, and S n,d = ( S n,1 (X 1,..., X n ),..., S n,d (X 1,..., X n ) ) T, be some statistics. 3 / 30

7 Motivation The multivariate CLT Introduction One of the most fundamental tools in probability theory and statistics is the multivariate CLT, X (d) k 1 n n k=1 X (d) k w B ( Γ ), is a sequence of d-dimensional zero mean, stationary r. v; Γ asymptotic covariance matrix. Applications: Use it for model diagnosis, hypothesis testing,.... More specific: Let X (n) = (X 1,..., X n ) be a sample, and S n,d = ( S n,1 (X 1,..., X n ),..., S n,d (X 1,..., X n ) ) T, be some statistics. In general, the relation between d = d n and n is very important, but let s not worry about this for the moment / 30

8 Motivation The multivariate CLT Using the multivariate CLT Consider the following confidence regions/expressions : X 2 d = { Θ d (S n,d Θ d ) T Γ 1 (S n,d Θ d ) n 1 χ 2 1 α(d) }, where Γ is an estimator of the covariance matrix, χ 2 1 α (d) quantile of Chi-square distribution. 4 / 30

9 Motivation The multivariate CLT Using the multivariate CLT Consider the following confidence regions/expressions : X 2 d = { Θ d (S n,d Θ d ) T Γ 1 (S n,d Θ d ) n 1 χ 2 1 α(d) }, where Γ is an estimator of the covariance matrix, χ 2 1 α (d) quantile of Chi-square distribution. V d = n max 1 h d γ 1 h,h S n,h (X 1,..., X n ) θ h, where γ h,h 2 is an estimator of the diagonal elements γ2 h,h covariance matrix Γ (simultaneous confidence band). of the 4 / 30

10 Motivation The multivariate CLT Using the multivariate CLT Consider the following confidence regions/expressions : X 2 d = { Θ d (S n,d Θ d ) T Γ 1 (S n,d Θ d ) n 1 χ 2 1 α(d) }, where Γ is an estimator of the covariance matrix, χ 2 1 α (d) quantile of Chi-square distribution. V d = n max 1 h d γ 1 h,h S n,h (X 1,..., X n ) θ h, where γ h,h 2 is an estimator of the diagonal elements γ2 h,h covariance matrix Γ (simultaneous confidence band). of the D d = max 1 h d (2h) 1/2 n (Sn,h Θ d ) T Γ 1 (S n,h Θ h ) h, where Γ is an estimator of the covariance matrix. 4 / 30

11 Motivation The multivariate CLT Using the multivariate CLT The confidence ellipsoid X 2 d T-Tests,.... wraps up many tests such as F-Tests, 5 / 30

12 Motivation The multivariate CLT Using the multivariate CLT The confidence ellipsoid X 2 d wraps up many tests such as F-Tests, T-Tests,.... Where are V d, D d (more) useful? 5 / 30

13 Motivation Measuring discrepancies Quality: Local and global discrepancies The ellipsoid is a global measure, i.e; measures the global discrepancy by summing up all local discrepancies. X 2 d = { Θ d (S n,d Θ d ) T Γ 1 (S n,d Θ d ) n 1 χ 2 1 α(d) }. (1) 6 / 30

14 Motivation Measuring discrepancies Quality: Local and global discrepancies The ellipsoid is a global measure, i.e; measures the global discrepancy by summing up all local discrepancies. X 2 d = { Θ d (S n,d Θ d ) T Γ 1 (S n,d Θ d ) n 1 χ 2 1 α(d) }. (1) Suppose that we wish to test the null hypothesis H 0 : Θ d = 0 d = (0,..., 0) T vs H A : Θ d 0 d. 6 / 30

15 Motivation Measuring discrepancies Quality: Local and global discrepancies The ellipsoid is a global measure, i.e; measures the global discrepancy by summing up all local discrepancies. X 2 d = { Θ d (S n,d Θ d ) T Γ 1 (S n,d Θ d ) n 1 χ 2 1 α(d) }. (1) Suppose that we wish to test the null hypothesis If min Θ d > 0, then H 0 : Θ d = 0 d = (0,..., 0) T vs H A : Θ d 0 d. X 2 d = n (Sn,d 0 d ) T Γ 1 d (S n,d 0 d ) O P ( nd ). 6 / 30

16 Motivation Measuring discrepancies Quality: Local and global discrepancies The ellipsoid is a global measure, i.e; measures the global discrepancy by summing up all local discrepancies. X 2 d = { Θ d (S n,d Θ d ) T Γ 1 (S n,d Θ d ) n 1 χ 2 1 α(d) }. (1) Suppose that we wish to test the null hypothesis H 0 : Θ d = 0 d = (0,..., 0) T vs H A : Θ d 0 d. 7 / 30

17 Motivation Measuring discrepancies Quality: Local and global discrepancies The ellipsoid is a global measure, i.e; measures the global discrepancy by summing up all local discrepancies. X 2 d = { Θ d (S n,d Θ d ) T Γ 1 (S n,d Θ d ) n 1 χ 2 1 α(d) }. (1) Suppose that we wish to test the null hypothesis H 0 : Θ d = 0 d = (0,..., 0) T vs H A : Θ d 0 d. If θ i = 0 and only θ d 0, then X 2 d = n (S n,d 0 d ) T Γ 1 d (S n,d 0 d ) O P ( n ). Hence we lose power if d increases (χ 2 1 α (d) d)! 7 / 30

18 Motivation Measuring discrepancies Quality: Local and global discrepancies The ellipsoid is a global measure, i.e; measures the global discrepancy by summing up all local discrepancies. X 2 d = { Θ d (S n,d Θ d ) T Γ 1 (S n,d Θ d ) n 1 χ 2 1 α(d) }. (1) Suppose that we wish to test the null hypothesis H 0 : Θ d = 0 d = (0,..., 0) T vs H A : Θ d 0 d. If θ i = 0 and only θ d 0, then X 2 d = n (S n,d 0 d ) T Γ 1 d (S n,d 0 d ) O P ( n ). Hence we lose power if d increases (χ 2 1 α (d) d)! 7 / 30

19 Motivation Measuring discrepancies Quality: Local and global discrepancies If d is large, additional tools such as principal component analysis are often used (still global measure). Recently, some authors (Cai, Jiang, Liu, Wu,... ) proposed, studied (in special cases) and successfully applied the following local measure: 8 / 30

20 Motivation Measuring discrepancies Quality: Local and global discrepancies If d is large, additional tools such as principal component analysis are often used (still global measure). Recently, some authors (Cai, Jiang, Liu, Wu,... ) proposed, studied (in special cases) and successfully applied the following local measure: V d = n max S n,h (X 1,..., X n ) θ h, 1 h d γ 1 h,h where γ h,h 2 is an estimator of the diagonal elements γ2 h,h matrix Γ. of the covariance 8 / 30

21 Motivation Measuring discrepancies Quality: Local and global discrepancies If d is large, additional tools such as principal component analysis are often used (still global measure). Recently, some authors (Cai, Jiang, Liu, Wu,... ) proposed, studied (in special cases) and successfully applied the following local measure: V d = n max S n,h (X 1,..., X n ) θ h, 1 h d γ 1 h,h where γ h,h 2 is an estimator of the diagonal elements γ2 h,h matrix Γ. The confidence band V d has some nice properties... of the covariance 8 / 30

22 Some theory and problems Properties of V d V d = n max 1 h d γ 1 h,h S n,h (X 1,..., X n ) θ h 9 / 30

23 Some theory and problems Properties of V d V d = n max 1 h d γ 1 h,h S n,h (X 1,..., X n ) θ h Easy to provide inference for single elements or small groups of S n,h (X 1,..., X n ), in particular, whether parameters are equal to zero or not. 9 / 30

24 Some theory and problems Properties of V d V d = n max 1 h d γ 1 h,h S n,h (X 1,..., X n ) θ h Easy to provide inference for single elements or small groups of S n,h (X 1,..., X n ), in particular, whether parameters are equal to zero or not. Only need to estimate the diagonal elements γ h,h of Γ. 9 / 30

25 Some theory and problems Properties of V d V d = n max 1 h d γ 1 h,h S n,h (X 1,..., X n ) θ h Easy to provide inference for single elements or small groups of S n,h (X 1,..., X n ), in particular, whether parameters are equal to zero or not. Only need to estimate the diagonal elements γ h,h of Γ. Reconsider H 0 : Θ d = 0 d = (0,..., 0) T against H A : Θ d 0 d : 9 / 30

26 Some theory and problems Properties of V d V d = n max 1 h d γ 1 h,h S n,h (X 1,..., X n ) θ h Easy to provide inference for single elements or small groups of S n,h (X 1,..., X n ), in particular, whether parameters are equal to zero or not. Only need to estimate the diagonal elements γ h,h of Γ. Reconsider H 0 : Θ d = 0 d = (0,..., 0) T against H A : Θ d 0 d : Good power even if only θ d 0, but lower power than X 2 d if min Θ d > 0. 9 / 30

27 Some theory and problems Order estimation I A particular case where θ i 0 for many indices i is order estimation. 10 / 30

28 Some theory and problems Order estimation I A particular case where θ i 0 for many indices i is order estimation. Consider for example the case of an AR( ) process { X k }k Z : X k = θ 1 X k 1 + θ 2 X k ɛ k. 10 / 30

29 Some theory and problems Order estimation I A particular case where θ i 0 for many indices i is order estimation. Consider for example the case of an AR( ) process { X k }k Z : X k = θ 1 X k 1 + θ 2 X k ɛ k. If d is large enough, then θ d 0, in fact θ d = O ( ρ d) in many cases. 10 / 30

30 Some theory and problems Order estimation I A particular case where θ i 0 for many indices i is order estimation. Consider for example the case of an AR( ) process { X k }k Z : X k = θ 1 X k 1 + θ 2 X k ɛ k. If d is large enough, then θ d 0, in fact θ d = O ( ρ d) in many cases. Using V d, one can decide upon wether θ i is redundant or not (gives order estimate, order = last index which is not redundant). 10 / 30

31 Some theory and problems Order estimation I A particular case where θ i 0 for many indices i is order estimation. Consider for example the case of an AR( ) process { X k }k Z : X k = θ 1 X k 1 + θ 2 X k ɛ k. If d is large enough, then θ d 0, in fact θ d = O ( ρ d) in many cases. Using V d, one can decide upon wether θ i is redundant or not (gives order estimate, order = last index which is not redundant). However: now D d enters, as a replacement for X 2 d : 10 / 30

32 Some theory and problems Order estimation I A particular case where θ i 0 for many indices i is order estimation. Consider for example the case of an AR( ) process { X k }k Z : X k = θ 1 X k 1 + θ 2 X k ɛ k. If d is large enough, then θ d 0, in fact θ d = O ( ρ d) in many cases. Using V d, one can decide upon wether θ i is redundant or not (gives order estimate, order = last index which is not redundant). However: now D d enters, as a replacement for X 2 d : In some sense, D d measures for which h the whole tail (θ h, θ h+1,...) is redundant. D d (l) = max l h d (2h) 1/2 n (S n,h Θ d ) T Γ 1 (S n,h Θ h ) h, 10 / 30

33 Some theory and problems Order estimation II A related issue is variance estimation in the CLT if { X k }k Z dependent (zero mean): is weakly n 1/2 n w ( X k N 0, σ 2 ) where σ 2 = E[X k X 0 ]. k=1 k= 11 / 30

34 Some theory and problems Order estimation II A related issue is variance estimation in the CLT if { X k }k Z dependent (zero mean): is weakly n 1/2 n w ( X k N 0, σ 2 ) where σ 2 = E[X k X 0 ]. k=1 k= Usual estimators look like σ 2 = h n k= h n w(k, n) φ k, where φ n k k = (n k) 1 X j X j+k. j=1 11 / 30

35 Some theory and problems Order estimation II A related issue is variance estimation in the CLT if { X k }k Z dependent (zero mean): is weakly n 1/2 n w ( X k N 0, σ 2 ) where σ 2 = E[X k X 0 ]. k=1 k= Usual estimators look like σ 2 = h n k= h n w(k, n) φ k, where φ n k k = (n k) 1 X j X j+k. j=1 Following questions: 11 / 30

36 Some theory and problems Order estimation II A related issue is variance estimation in the CLT if { X k }k Z dependent (zero mean): is weakly n 1/2 n w ( X k N 0, σ 2 ) where σ 2 = E[X k X 0 ]. k=1 k= Usual estimators look like σ 2 = h n k= h n w(k, n) φ k, where φ n k k = (n k) 1 X j X j+k. j=1 Following questions: Which φ k (which lags) should we use? 11 / 30

37 Some theory and problems Order estimation II A related issue is variance estimation in the CLT if { X k }k Z dependent (zero mean): is weakly n 1/2 n w ( X k N 0, σ 2 ) where σ 2 = E[X k X 0 ]. k=1 k= Usual estimators look like σ 2 = h n k= h n w(k, n) φ k, where φ n k k = (n k) 1 X j X j+k. j=1 Following questions: Which φ k (which lags) should we use? How large should we chose h n? 11 / 30

38 Some theory and problems Order estimation II Since φ k = E[X k X 0 ] 0 as k increases, both V d and D d can be used again to quantify h n and obtain consistent estimators for σ / 30

39 Some theory and problems Order estimation II Since φ k = E[X k X 0 ] 0 as k increases, both V d and D d can be used again to quantify h n and obtain consistent estimators for σ 2. It is possible to construct cases where V d is superior to D d, / 30

40 Some theory and problems Order estimation II Since φ k = E[X k X 0 ] 0 as k increases, both V d and D d can be used again to quantify h n and obtain consistent estimators for σ 2. It is possible to construct cases where V d is superior to D d,... and it is also possible to construct cases where D d is superior to V d, / 30

41 Some theory and problems Order estimation II Since φ k = E[X k X 0 ] 0 as k increases, both V d and D d can be used again to quantify h n and obtain consistent estimators for σ 2. It is possible to construct cases where V d is superior to D d,... and it is also possible to construct cases where D d is superior to V d,... however, often D d gives the better result. 12 / 30

42 Some theory and problems Properties of X 2 d, V d and D d If we want to use V d and D d, we need to control them asymptotically (for large n and d = d n.) 13 / 30

43 Some theory and problems Properties of X 2 d, V d and D d If we want to use V d and D d, we need to control them asymptotically (for large n and d = d n.) Expect that under some conditions : X 2 d n χ 2 (d n ), a n ( Vdn b n ) w G, A n D dn B n w G, for appropriate sequences a n, b n, A n, B n, where G is an extreme value distribution and d n = d is an increasing function in n. 13 / 30

44 Some theory and problems Properties of X 2 d, V d and D d If we want to use V d and D d, we need to control them asymptotically (for large n and d = d n.) Expect that under some conditions : X 2 d n χ 2 (d n ), a n ( Vdn b n ) w G, A n D dn B n w G, for appropriate sequences a n, b n, A n, B n, where G is an extreme value distribution and d n = d is an increasing function in n. However: Establishing the above is highly nontrivial, need (weak) dependence assumptions. 13 / 30

45 Some theory and problems Properties of X 2 d, V d and D d If we want to use V d and D d, we need to control them asymptotically (for large n and d = d n.) Expect that under some conditions : X 2 d n χ 2 (d n ), a n ( Vdn b n ) w G, A n D dn B n w G, for appropriate sequences a n, b n, A n, B n, where G is an extreme value distribution and d n = d is an increasing function in n. However: Establishing the above is highly nontrivial, need (weak) dependence assumptions. What can we say about relation of n and d n (d n = n δ ), δ > 0? 13 / 30

46 Some theory and problems The problem Our setting Suppose that E (d) n := n 1/2( g (d) E(g (d) ) ) w Z d, where Z d = ( Z 1,..., Z d ) T is a Gaussian vector with appropriate covariance matrix. 14 / 30

47 Some theory and problems The problem Our setting Suppose that E (d) n := n 1/2( g (d) E(g (d) ) ) w Z d, where Z d = ( ) T Z 1,..., Z d is a Gaussian vector with appropriate covariance matrix. Then E n (d) n Z d max?d n? d max G 14 / 30

48 Some theory and problems The problem Our setting Suppose that E (d) n := n 1/2( g (d) E(g (d) ) ) w Z d, where Z d = ( ) T Z 1,..., Z d is a Gaussian vector with appropriate covariance matrix. Then E n (d) n Z d max?d n? d max What can we say about the existence and properties of such a sequence d n, i.e. lim n max E n (dn) w G (appropriately normalized)? G 14 / 30

49 Some theory and problems The problem Existence of d n Theorem w Suppose that E n (d) Z d for all fixed d N, and that E ( ) Z i Z j = O ( (log i j ) 2+δ), δ > 0. Then there exist increasing sequences a n, b n, d n such that Drawbacks: ( a n max E (dn) ) ( ) w b n = an V(dn) b n G. n 15 / 30

50 Some theory and problems The problem Existence of d n Theorem w Suppose that E n (d) Z d for all fixed d N, and that E ( ) Z i Z j = O ( (log i j ) 2+δ), δ > 0. Then there exist increasing sequences a n, b n, d n such that Drawbacks: ( a n max E (dn) ) ( ) w b n = an V(dn) b n G. n Does not tell us anything about a possible growth rate of d n, might be log log log...n. 15 / 30

51 Some theory and problems The problem Existence of d n Theorem w Suppose that E n (d) Z d for all fixed d N, and that E ( ) Z i Z j = O ( (log i j ) 2+δ), δ > 0. Then there exist increasing sequences a n, b n, d n such that Drawbacks: ( a n max E (dn) ) ( ) w b n = an V(dn) b n G. n Does not tell us anything about a possible growth rate of d n, might be log log log...n. Proof is not really constructive. 15 / 30

52 Some theory and problems The problem Existence of d n Theorem w Suppose that E n (d) Z d for all fixed d N, and that E ( ) Z i Z j = O ( (log i j ) 2+δ), δ > 0. Then there exist increasing sequences a n, b n, d n such that Drawbacks: ( a n max E (dn) ) ( ) w b n = an V(dn) b n G. n Does not tell us anything about a possible growth rate of d n, might be log log log...n. Proof is not really constructive. What do to? 15 / 30

53 Some theory and problems The problem What to do? Need to control the quantity R n := P ( V (dn) u n ) P ( M(dn) u n ), (3) where M (dn) = max 1 h dn Z h. Claim then follows by using existing theory on Gaussian processes. 16 / 30

54 Some theory and problems The problem What to do? Need to control the quantity R n := P ( V (dn) u n ) P ( M(dn) u n ), (3) where M (dn) = max 1 h dn Z h. Claim then follows by using existing theory on Gaussian processes. Can we find an explicit bound for R n in terms of n, d n (gives us explicit growth rate)? 16 / 30

55 Some theory and problems The problem What to do? Need to control the quantity R n := P ( V (dn) u n ) P ( M(dn) u n ), (3) where M (dn) = max 1 h dn Z h. Claim then follows by using existing theory on Gaussian processes. Can we find an explicit bound for R n in terms of n, d n (gives us explicit growth rate)? Looks very similar to Berry-Essen type results. 16 / 30

56 Some theory and problems The problem What to do? Need to control the quantity R n := P ( V (dn) u n ) P ( M(dn) u n ), (3) where M (dn) = max 1 h dn Z h. Claim then follows by using existing theory on Gaussian processes. Can we find an explicit bound for R n in terms of n, d n (gives us explicit growth rate)? Looks very similar to Berry-Essen type results. Need normal approximation results / 30

57 Some theory and problems The problem Normal approximation A huge variety of normal approximation techniques, depend on metric and dependence assumptions. 17 / 30

58 Some theory and problems The problem Normal approximation A huge variety of normal approximation techniques, depend on metric and dependence assumptions. Berry-Esséen type results... (Zolotarev, Götze, Bentkus, Senatov,...) 17 / 30

59 Some theory and problems The problem Normal approximation A huge variety of normal approximation techniques, depend on metric and dependence assumptions. Berry-Esséen type results... (Zolotarev, Götze, Bentkus, Senatov,...) Strong approximation techniques (KMT, Zaitsev, Mason, Philipp, Berkes,... ) 17 / 30

60 Some theory and problems The problem Normal approximation A huge variety of normal approximation techniques, depend on metric and dependence assumptions. Berry-Esséen type results... (Zolotarev, Götze, Bentkus, Senatov,...) Strong approximation techniques (KMT, Zaitsev, Mason, Philipp, Berkes,... ) Martingale approximation and embedding methods (Strassen, Hall, Heyde, Bolthausen,...) 17 / 30

61 Some theory and problems The problem Normal approximation A huge variety of normal approximation techniques, depend on metric and dependence assumptions. Berry-Esséen type results... (Zolotarev, Götze, Bentkus, Senatov,...) Strong approximation techniques (KMT, Zaitsev, Mason, Philipp, Berkes,... ) Martingale approximation and embedding methods (Strassen, Hall, Heyde, Bolthausen,...) Stein s method / 30

62 Some theory and problems The problem Normal approximation A huge variety of normal approximation techniques, depend on metric and dependence assumptions. Berry-Esséen type results... (Zolotarev, Götze, Bentkus, Senatov,...) Strong approximation techniques (KMT, Zaitsev, Mason, Philipp, Berkes,... ) Martingale approximation and embedding methods (Strassen, Hall, Heyde, Bolthausen,...) Stein s method... All these methods have advantages and disadvantages, there is no ultimate approach. 17 / 30

63 Special case: AR( ) revisited AR(d n ) An example: AR(d n ) revisited / 30

64 Special case: AR( ) revisited AR(d n ) An example: AR(d n ) revisited.... Let { X k }k Z be an AR(d n)-process (or ) with parameter Θ dn = ( θ 1,..., θ dn ) T, i.e. X k = θ 1 X k θ q X k dn + ɛ k, where { ɛ k }k Z is an I.I.D. sequence. For 1 q d n, put φ h = E ( ) X k X k+h, k, h Z, Φq = ( ) T φ 1,..., φ q, and denote with φ n,h = 1 n n i=h+1 X ix i h. Let Γ q = ( ) φ i j be the q q 1 i,j q dimensional covariance matrix. 18 / 30

65 Special case: AR( ) revisited AR(d n ) An example: AR(d n ) revisited.... Let { X k }k Z be an AR(d n)-process (or ) with parameter Θ dn = ( θ 1,..., θ dn ) T, i.e. X k = θ 1 X k θ q X k dn + ɛ k, where { ɛ k }k Z is an I.I.D. sequence. For 1 q d n, put φ h = E ( ) X k X k+h, k, h Z, Φq = ( ) T φ 1,..., φ q, and denote with φ n,h = 1 n n i=h+1 X ix i h. Let Γ q = ( ) φ i j be the q q 1 i,j q dimensional covariance matrix. Then Γ q Θ q Φ q. It is thus natural to consider Γ 1 q Φ q = Θ q and σ 2 (q) = φ 0 Θ T q Φ q, σ 2 = E ( ɛ 2 0), the so called Yule-Walker equations. 18 / 30

66 Special case: AR( ) revisited A few facts/questions. 19 / 30

67 Special case: AR( ) revisited A few facts/questions. CLT (for fixed q): ( n Θq Θ w ( ) q) N 0, Γ 1 q. 19 / 30

68 Special case: AR( ) revisited A few facts/questions. CLT (for fixed q): ( n Θq Θ w ( ) q) N 0, Γ 1 q. A priori, the true order q 0 is usually not known. 19 / 30

69 Special case: AR( ) revisited A few facts/questions. CLT (for fixed q): ( n Θq Θ w ( ) q) N 0, Γ 1 q. A priori, the true order q 0 is usually not known. Famous estimators are information based criteria as AIC, BIC, SIC, / 30

70 Special case: AR( ) revisited A few facts/questions. CLT (for fixed q): ( n Θq Θ w ( ) q) N 0, Γ 1 q. A priori, the true order q 0 is usually not known. Famous estimators are information based criteria as AIC, BIC, SIC,... What can we say about a simultaneous confidence band for Θ dn and the relation of n, d n? 19 / 30

71 Special case: AR( ) revisited Some assumptions Assumption (B) { Xk }k Z admits a causal representation X k = i=0 α iɛ k i, such that sup n Ψ(m) = O ( m ϑ), ϑ > 0, where Ψ(m) := i=m α i, { } ɛk is a mean zero IID-sequence of random variables, such that k Z ɛ p k < for some p > 4, ɛ k 2 2 = σ2 > 0, k Z, sup n i=1 θ i <, θ n = O( (log n) 1 ). 20 / 30

72 Special case: AR( ) revisited Theorem Let { X k }k Z be an AR(d n) process satisfying Assumption (B). Suppose that d n as n increases, with d n = O ( n δ) such that 0 < δ < min { 1/2, ϑp/2 }, (1 2ϑ)δ < (p 4)/p. (4.1) If we have in addition that inf h γh,h > 0, then for z R ( P an 1 ( n max ( γ i,i σ 2 (d n )) 1/2 ( θ i θ i ) ) ) b n z exp( e z ), 1 i d n where a n = (2 log d n ) 1/2 and b n = (2 log d n ) 1/2 (8 log d n ) 1/2 (log log d n + 4π 4). 21 / 30

73 Special case: AR( ) revisited Theorem Let { X k }k Z be an AR(d n) process satisfying Assumption (B). Suppose that d n as n increases, with d n = O ( n δ) such that 0 < δ < min { 1/2, ϑp/2 }, (1 2ϑ)δ < (p 4)/p. (4.1) If we have in addition that inf h γh,h > 0, then for z R ( P an 1 ( n max ( γ i,i σ 2 (d n )) 1/2 ( θ i θ i ) ) ) b n z exp( e z ), 1 i d n where a n = (2 log d n ) 1/2 and b n = (2 log d n ) 1/2 (8 log d n ) 1/2 (log log d n + 4π 4). Note: if p (moments) is sufficiently large, then essentially d n = O( n ). 21 / 30

74 Special case: AR( ) revisited Order estimation in AR(q) We can use this result to construct a family of consistent order estimators. Put (x) + = max(0, x), and Υ n,i = an 1 ( n ( γ i,i σ 2 (d n )) 1/2 θi ) b n. 22 / 30

75 Special case: AR( ) revisited Order estimation in AR(q) We can use this result to construct a family of consistent order estimators. Put (x) + = max(0, x), and Then if z n q (1) z n Υ n,i = an 1 ( n ( γ i,i σ 2 (d n )) 1/2 θi ) b n. = min { q N a 1 n is a consistent estimator. ( n max ( γ i,i σ 2 (d n )) 1/2 θi ) } bn zn q+1 i d n 22 / 30

76 Special case: AR( ) revisited Order estimation in AR(q) We can use this result to construct a family of consistent order estimators. Put (x) + = max(0, x), and Then if z n q (1) z n q (2) z n Υ n,i = an 1 ( n ( γ i,i σ 2 (d n )) 1/2 θi ) b n. = min { q N an 1 ( n max ( γ i,i σ 2 (d n )) 1/2 θi ) } b n zn, q+1 i d n { {( ) + } } = argmin max Υn,i z n + log(1 + q), q N q+1 i d n are consistent estimators (extensions: subset-modelling!). 22 / 30

77 Special case: AR( ) revisited Order estimation A few facts about the estimators 23 / 30

78 Special case: AR( ) revisited Order estimation A few facts about the estimators One can compute the asymptotic distribution of q (1) z n, q (2) z n. 23 / 30

79 Special case: AR( ) revisited Order estimation A few facts about the estimators One can compute the asymptotic distribution of q (1) z n, q (2) z n. One can easily generalize the estimators to a very large family of estimators (weight/penalty functions). 23 / 30

80 Special case: AR( ) revisited Order estimation A few facts about the estimators One can compute the asymptotic distribution of q (1) z n, q (2) z n. One can easily generalize the estimators to a very large family of estimators (weight/penalty functions). q z (1) n, q z (2) n turn out to be good preliminary estimators for AIC, BIC, SIC. 23 / 30

81 Special case: AR( ) revisited Order estimation A few facts about the estimators One can compute the asymptotic distribution of q (1) z n, q (2) z n. One can easily generalize the estimators to a very large family of estimators (weight/penalty functions). q z (1) n, q z (2) n turn out to be good preliminary estimators for AIC, BIC, SIC. They significantly outperform AIC, BIC, SIC in sparse models. 23 / 30

82 Special case: AR( ) revisited Numerical results n q AIC AIC* BIC BIC* MIC MIC* q y (5) n q x (5) n 500 < < < > Table : Simulation of an AR(6) process with coefficients Θ 6 = (0.1, 0.3, 0.05, 0.2, 0.1, 0.2) T, ɛ N (0, 1), 1000 repetitions, d n {13, 14}. 24 / 30

83 Special case: AR( ) revisited Numerical results Numerical results n q AIC AIC* BIC BIC* MIC MIC* q y (5) n q x (5) n 125 < < < > Table : Simulation of an AR(6) process with coefficients Θ 6 = (0.1, 0, 0.05, 0, 0, 0.2) T, ɛ N (0, 1), 1000 repetitions, d n {10, 12}. 25 / 30

84 Special case: AR( ) revisited Numerical results Numerical results n q AIC AIC* BIC BIC* MIC MIC* q y (5) n q x (5) n 500 < < < > Table : Simulation of an AR(6) process with coefficients Θ 6 = (0.1, 0, 0.05, 0, 0, 0.2) T, ɛ N (0, 1), 1000 repetitions, d n {13, 14}. 26 / 30

85 Special case: AR( ) revisited Numerical results n q AIC AIC* BIC BIC* MIC MIC* q y (5) n q x (5) n 125 < > < > Table : Simulation of an AR(12) process with nonzero coefficients θ 1 = 0.1, θ 3 = 0.4, θ 12 = 0.2 ɛ N (0, 1), 1000 repetitions, d n {20, 23}. 27 / 30

86 Special case: AR( ) revisited Numerical results n q AIC AIC* BIC BIC* MIC MIC* q y (5) n q x (5) n 500 < > < > Table : Simulation of an AR(12) process with nonzero coefficients θ 1 = 0.1, θ 3 = 0.4, θ 12 = 0.2 ɛ N (0, 1), 1000 repetitions, d n {25, 28}. 28 / 30

87 Special case: AR( ) revisited Numerical results Thank You for Your patience! 29 / 30

88 Special case: AR( ) revisited Numerical results Some references I. Berkes and W. Philipp. Approximation theorems for independent and weakly dependent random vectors. Ann. Probab., 7(1):29 54, T. Jiang. The asymptotic distributions of the largest entries of sample correlation matrices. Ann. Appl. Probab., 14(2): , M. Jirak. Simultaneous confidence bands for yule-walker estimators and order selection. Annals of Statistics. 40(1): , M. Jirak. A darling-erdös type result for stationary ellipsoids. Stochastic processes and its applications, to appear. 30 / 30

Bootstrapping high dimensional vector: interplay between dependence and dimensionality

Bootstrapping high dimensional vector: interplay between dependence and dimensionality Xianyang Zhang Joint work with Guang Cheng University of Missouri-Columbia LDHD: Transition Workshop, 2014 Xianyang