Extreme inference in stationary time series Moritz Jirak FOR 1735 February 8, 2013 1 / 30
Outline 1 Outline 2 Motivation The multivariate CLT Measuring discrepancies 3 Some theory and problems The problem 4 Special case: AR( ) revisited Numerical results 2 / 30
Motivation The multivariate CLT Introduction One of the most fundamental tools in probability theory and statistics is the multivariate CLT, 3 / 30
Motivation The multivariate CLT Introduction One of the most fundamental tools in probability theory and statistics is the multivariate CLT, X (d) k 1 n n k=1 X (d) k w B ( Γ ), is a sequence of d-dimensional zero mean, stationary r. v; Γ asymptotic covariance matrix. 3 / 30
Motivation The multivariate CLT Introduction One of the most fundamental tools in probability theory and statistics is the multivariate CLT, X (d) k 1 n n k=1 X (d) k w B ( Γ ), is a sequence of d-dimensional zero mean, stationary r. v; Γ asymptotic covariance matrix. Applications: Use it for model diagnosis, hypothesis testing,.... 3 / 30
Motivation The multivariate CLT Introduction One of the most fundamental tools in probability theory and statistics is the multivariate CLT, X (d) k 1 n n k=1 X (d) k w B ( Γ ), is a sequence of d-dimensional zero mean, stationary r. v; Γ asymptotic covariance matrix. Applications: Use it for model diagnosis, hypothesis testing,.... More specific: Let X (n) = (X 1,..., X n ) be a sample, and S n,d = ( S n,1 (X 1,..., X n ),..., S n,d (X 1,..., X n ) ) T, be some statistics. 3 / 30
Motivation The multivariate CLT Introduction One of the most fundamental tools in probability theory and statistics is the multivariate CLT, X (d) k 1 n n k=1 X (d) k w B ( Γ ), is a sequence of d-dimensional zero mean, stationary r. v; Γ asymptotic covariance matrix. Applications: Use it for model diagnosis, hypothesis testing,.... More specific: Let X (n) = (X 1,..., X n ) be a sample, and S n,d = ( S n,1 (X 1,..., X n ),..., S n,d (X 1,..., X n ) ) T, be some statistics. In general, the relation between d = d n and n is very important, but let s not worry about this for the moment.... 3 / 30
Motivation The multivariate CLT Using the multivariate CLT Consider the following confidence regions/expressions : X 2 d = { Θ d (S n,d Θ d ) T Γ 1 (S n,d Θ d ) n 1 χ 2 1 α(d) }, where Γ is an estimator of the covariance matrix, χ 2 1 α (d) quantile of Chi-square distribution. 4 / 30
Motivation The multivariate CLT Using the multivariate CLT Consider the following confidence regions/expressions : X 2 d = { Θ d (S n,d Θ d ) T Γ 1 (S n,d Θ d ) n 1 χ 2 1 α(d) }, where Γ is an estimator of the covariance matrix, χ 2 1 α (d) quantile of Chi-square distribution. V d = n max 1 h d γ 1 h,h S n,h (X 1,..., X n ) θ h, where γ h,h 2 is an estimator of the diagonal elements γ2 h,h covariance matrix Γ (simultaneous confidence band). of the 4 / 30
Motivation The multivariate CLT Using the multivariate CLT Consider the following confidence regions/expressions : X 2 d = { Θ d (S n,d Θ d ) T Γ 1 (S n,d Θ d ) n 1 χ 2 1 α(d) }, where Γ is an estimator of the covariance matrix, χ 2 1 α (d) quantile of Chi-square distribution. V d = n max 1 h d γ 1 h,h S n,h (X 1,..., X n ) θ h, where γ h,h 2 is an estimator of the diagonal elements γ2 h,h covariance matrix Γ (simultaneous confidence band). of the D d = max 1 h d (2h) 1/2 n (Sn,h Θ d ) T Γ 1 (S n,h Θ h ) h, where Γ is an estimator of the covariance matrix. 4 / 30
Motivation The multivariate CLT Using the multivariate CLT The confidence ellipsoid X 2 d T-Tests,.... wraps up many tests such as F-Tests, 5 / 30
Motivation The multivariate CLT Using the multivariate CLT The confidence ellipsoid X 2 d wraps up many tests such as F-Tests, T-Tests,.... Where are V d, D d (more) useful? 5 / 30
Motivation Measuring discrepancies Quality: Local and global discrepancies The ellipsoid is a global measure, i.e; measures the global discrepancy by summing up all local discrepancies. X 2 d = { Θ d (S n,d Θ d ) T Γ 1 (S n,d Θ d ) n 1 χ 2 1 α(d) }. (1) 6 / 30
Motivation Measuring discrepancies Quality: Local and global discrepancies The ellipsoid is a global measure, i.e; measures the global discrepancy by summing up all local discrepancies. X 2 d = { Θ d (S n,d Θ d ) T Γ 1 (S n,d Θ d ) n 1 χ 2 1 α(d) }. (1) Suppose that we wish to test the null hypothesis H 0 : Θ d = 0 d = (0,..., 0) T vs H A : Θ d 0 d. 6 / 30
Motivation Measuring discrepancies Quality: Local and global discrepancies The ellipsoid is a global measure, i.e; measures the global discrepancy by summing up all local discrepancies. X 2 d = { Θ d (S n,d Θ d ) T Γ 1 (S n,d Θ d ) n 1 χ 2 1 α(d) }. (1) Suppose that we wish to test the null hypothesis If min Θ d > 0, then H 0 : Θ d = 0 d = (0,..., 0) T vs H A : Θ d 0 d. X 2 d = n (Sn,d 0 d ) T Γ 1 d (S n,d 0 d ) O P ( nd ). 6 / 30
Motivation Measuring discrepancies Quality: Local and global discrepancies The ellipsoid is a global measure, i.e; measures the global discrepancy by summing up all local discrepancies. X 2 d = { Θ d (S n,d Θ d ) T Γ 1 (S n,d Θ d ) n 1 χ 2 1 α(d) }. (1) Suppose that we wish to test the null hypothesis H 0 : Θ d = 0 d = (0,..., 0) T vs H A : Θ d 0 d. 7 / 30
Motivation Measuring discrepancies Quality: Local and global discrepancies The ellipsoid is a global measure, i.e; measures the global discrepancy by summing up all local discrepancies. X 2 d = { Θ d (S n,d Θ d ) T Γ 1 (S n,d Θ d ) n 1 χ 2 1 α(d) }. (1) Suppose that we wish to test the null hypothesis H 0 : Θ d = 0 d = (0,..., 0) T vs H A : Θ d 0 d. If θ i = 0 and only θ d 0, then X 2 d = n (S n,d 0 d ) T Γ 1 d (S n,d 0 d ) O P ( n ). Hence we lose power if d increases (χ 2 1 α (d) d)! 7 / 30
Motivation Measuring discrepancies Quality: Local and global discrepancies The ellipsoid is a global measure, i.e; measures the global discrepancy by summing up all local discrepancies. X 2 d = { Θ d (S n,d Θ d ) T Γ 1 (S n,d Θ d ) n 1 χ 2 1 α(d) }. (1) Suppose that we wish to test the null hypothesis H 0 : Θ d = 0 d = (0,..., 0) T vs H A : Θ d 0 d. If θ i = 0 and only θ d 0, then X 2 d = n (S n,d 0 d ) T Γ 1 d (S n,d 0 d ) O P ( n ). Hence we lose power if d increases (χ 2 1 α (d) d)! 7 / 30
Motivation Measuring discrepancies Quality: Local and global discrepancies If d is large, additional tools such as principal component analysis are often used (still global measure). Recently, some authors (Cai, Jiang, Liu, Wu,... ) proposed, studied (in special cases) and successfully applied the following local measure: 8 / 30
Motivation Measuring discrepancies Quality: Local and global discrepancies If d is large, additional tools such as principal component analysis are often used (still global measure). Recently, some authors (Cai, Jiang, Liu, Wu,... ) proposed, studied (in special cases) and successfully applied the following local measure: V d = n max S n,h (X 1,..., X n ) θ h, 1 h d γ 1 h,h where γ h,h 2 is an estimator of the diagonal elements γ2 h,h matrix Γ. of the covariance 8 / 30
Motivation Measuring discrepancies Quality: Local and global discrepancies If d is large, additional tools such as principal component analysis are often used (still global measure). Recently, some authors (Cai, Jiang, Liu, Wu,... ) proposed, studied (in special cases) and successfully applied the following local measure: V d = n max S n,h (X 1,..., X n ) θ h, 1 h d γ 1 h,h where γ h,h 2 is an estimator of the diagonal elements γ2 h,h matrix Γ. The confidence band V d has some nice properties... of the covariance 8 / 30
Some theory and problems Properties of V d V d = n max 1 h d γ 1 h,h S n,h (X 1,..., X n ) θ h 9 / 30
Some theory and problems Properties of V d V d = n max 1 h d γ 1 h,h S n,h (X 1,..., X n ) θ h Easy to provide inference for single elements or small groups of S n,h (X 1,..., X n ), in particular, whether parameters are equal to zero or not. 9 / 30
Some theory and problems Properties of V d V d = n max 1 h d γ 1 h,h S n,h (X 1,..., X n ) θ h Easy to provide inference for single elements or small groups of S n,h (X 1,..., X n ), in particular, whether parameters are equal to zero or not. Only need to estimate the diagonal elements γ h,h of Γ. 9 / 30
Some theory and problems Properties of V d V d = n max 1 h d γ 1 h,h S n,h (X 1,..., X n ) θ h Easy to provide inference for single elements or small groups of S n,h (X 1,..., X n ), in particular, whether parameters are equal to zero or not. Only need to estimate the diagonal elements γ h,h of Γ. Reconsider H 0 : Θ d = 0 d = (0,..., 0) T against H A : Θ d 0 d : 9 / 30
Some theory and problems Properties of V d V d = n max 1 h d γ 1 h,h S n,h (X 1,..., X n ) θ h Easy to provide inference for single elements or small groups of S n,h (X 1,..., X n ), in particular, whether parameters are equal to zero or not. Only need to estimate the diagonal elements γ h,h of Γ. Reconsider H 0 : Θ d = 0 d = (0,..., 0) T against H A : Θ d 0 d : Good power even if only θ d 0, but lower power than X 2 d if min Θ d > 0. 9 / 30
Some theory and problems Order estimation I A particular case where θ i 0 for many indices i is order estimation. 10 / 30
Some theory and problems Order estimation I A particular case where θ i 0 for many indices i is order estimation. Consider for example the case of an AR( ) process { X k }k Z : X k = θ 1 X k 1 + θ 2 X k 2 +... + ɛ k. 10 / 30
Some theory and problems Order estimation I A particular case where θ i 0 for many indices i is order estimation. Consider for example the case of an AR( ) process { X k }k Z : X k = θ 1 X k 1 + θ 2 X k 2 +... + ɛ k. If d is large enough, then θ d 0, in fact θ d = O ( ρ d) in many cases. 10 / 30
Some theory and problems Order estimation I A particular case where θ i 0 for many indices i is order estimation. Consider for example the case of an AR( ) process { X k }k Z : X k = θ 1 X k 1 + θ 2 X k 2 +... + ɛ k. If d is large enough, then θ d 0, in fact θ d = O ( ρ d) in many cases. Using V d, one can decide upon wether θ i is redundant or not (gives order estimate, order = last index which is not redundant). 10 / 30
Some theory and problems Order estimation I A particular case where θ i 0 for many indices i is order estimation. Consider for example the case of an AR( ) process { X k }k Z : X k = θ 1 X k 1 + θ 2 X k 2 +... + ɛ k. If d is large enough, then θ d 0, in fact θ d = O ( ρ d) in many cases. Using V d, one can decide upon wether θ i is redundant or not (gives order estimate, order = last index which is not redundant). However: now D d enters, as a replacement for X 2 d : 10 / 30
Some theory and problems Order estimation I A particular case where θ i 0 for many indices i is order estimation. Consider for example the case of an AR( ) process { X k }k Z : X k = θ 1 X k 1 + θ 2 X k 2 +... + ɛ k. If d is large enough, then θ d 0, in fact θ d = O ( ρ d) in many cases. Using V d, one can decide upon wether θ i is redundant or not (gives order estimate, order = last index which is not redundant). However: now D d enters, as a replacement for X 2 d : In some sense, D d measures for which h the whole tail (θ h, θ h+1,...) is redundant. D d (l) = max l h d (2h) 1/2 n (S n,h Θ d ) T Γ 1 (S n,h Θ h ) h, 10 / 30
Some theory and problems Order estimation II A related issue is variance estimation in the CLT if { X k }k Z dependent (zero mean): is weakly n 1/2 n w ( X k N 0, σ 2 ) where σ 2 = E[X k X 0 ]. k=1 k= 11 / 30
Some theory and problems Order estimation II A related issue is variance estimation in the CLT if { X k }k Z dependent (zero mean): is weakly n 1/2 n w ( X k N 0, σ 2 ) where σ 2 = E[X k X 0 ]. k=1 k= Usual estimators look like σ 2 = h n k= h n w(k, n) φ k, where φ n k k = (n k) 1 X j X j+k. j=1 11 / 30
Some theory and problems Order estimation II A related issue is variance estimation in the CLT if { X k }k Z dependent (zero mean): is weakly n 1/2 n w ( X k N 0, σ 2 ) where σ 2 = E[X k X 0 ]. k=1 k= Usual estimators look like σ 2 = h n k= h n w(k, n) φ k, where φ n k k = (n k) 1 X j X j+k. j=1 Following questions: 11 / 30
Some theory and problems Order estimation II A related issue is variance estimation in the CLT if { X k }k Z dependent (zero mean): is weakly n 1/2 n w ( X k N 0, σ 2 ) where σ 2 = E[X k X 0 ]. k=1 k= Usual estimators look like σ 2 = h n k= h n w(k, n) φ k, where φ n k k = (n k) 1 X j X j+k. j=1 Following questions: Which φ k (which lags) should we use? 11 / 30
Some theory and problems Order estimation II A related issue is variance estimation in the CLT if { X k }k Z dependent (zero mean): is weakly n 1/2 n w ( X k N 0, σ 2 ) where σ 2 = E[X k X 0 ]. k=1 k= Usual estimators look like σ 2 = h n k= h n w(k, n) φ k, where φ n k k = (n k) 1 X j X j+k. j=1 Following questions: Which φ k (which lags) should we use? How large should we chose h n? 11 / 30
Some theory and problems Order estimation II Since φ k = E[X k X 0 ] 0 as k increases, both V d and D d can be used again to quantify h n and obtain consistent estimators for σ 2. 12 / 30
Some theory and problems Order estimation II Since φ k = E[X k X 0 ] 0 as k increases, both V d and D d can be used again to quantify h n and obtain consistent estimators for σ 2. It is possible to construct cases where V d is superior to D d,... 12 / 30
Some theory and problems Order estimation II Since φ k = E[X k X 0 ] 0 as k increases, both V d and D d can be used again to quantify h n and obtain consistent estimators for σ 2. It is possible to construct cases where V d is superior to D d,... and it is also possible to construct cases where D d is superior to V d,... 12 / 30
Some theory and problems Order estimation II Since φ k = E[X k X 0 ] 0 as k increases, both V d and D d can be used again to quantify h n and obtain consistent estimators for σ 2. It is possible to construct cases where V d is superior to D d,... and it is also possible to construct cases where D d is superior to V d,... however, often D d gives the better result. 12 / 30
Some theory and problems Properties of X 2 d, V d and D d If we want to use V d and D d, we need to control them asymptotically (for large n and d = d n.) 13 / 30
Some theory and problems Properties of X 2 d, V d and D d If we want to use V d and D d, we need to control them asymptotically (for large n and d = d n.) Expect that under some conditions : X 2 d n χ 2 (d n ), a n ( Vdn b n ) w G, A n D dn B n w G, for appropriate sequences a n, b n, A n, B n, where G is an extreme value distribution and d n = d is an increasing function in n. 13 / 30
Some theory and problems Properties of X 2 d, V d and D d If we want to use V d and D d, we need to control them asymptotically (for large n and d = d n.) Expect that under some conditions : X 2 d n χ 2 (d n ), a n ( Vdn b n ) w G, A n D dn B n w G, for appropriate sequences a n, b n, A n, B n, where G is an extreme value distribution and d n = d is an increasing function in n. However: Establishing the above is highly nontrivial, need (weak) dependence assumptions. 13 / 30
Some theory and problems Properties of X 2 d, V d and D d If we want to use V d and D d, we need to control them asymptotically (for large n and d = d n.) Expect that under some conditions : X 2 d n χ 2 (d n ), a n ( Vdn b n ) w G, A n D dn B n w G, for appropriate sequences a n, b n, A n, B n, where G is an extreme value distribution and d n = d is an increasing function in n. However: Establishing the above is highly nontrivial, need (weak) dependence assumptions. What can we say about relation of n and d n (d n = n δ ), δ > 0? 13 / 30
Some theory and problems The problem Our setting Suppose that E (d) n := n 1/2( g (d) E(g (d) ) ) w Z d, where Z d = ( Z 1,..., Z d ) T is a Gaussian vector with appropriate covariance matrix. 14 / 30
Some theory and problems The problem Our setting Suppose that E (d) n := n 1/2( g (d) E(g (d) ) ) w Z d, where Z d = ( ) T Z 1,..., Z d is a Gaussian vector with appropriate covariance matrix. Then E n (d) n Z d max?d n? d max G 14 / 30
Some theory and problems The problem Our setting Suppose that E (d) n := n 1/2( g (d) E(g (d) ) ) w Z d, where Z d = ( ) T Z 1,..., Z d is a Gaussian vector with appropriate covariance matrix. Then E n (d) n Z d max?d n? d max What can we say about the existence and properties of such a sequence d n, i.e. lim n max E n (dn) w G (appropriately normalized)? G 14 / 30
Some theory and problems The problem Existence of d n Theorem w Suppose that E n (d) Z d for all fixed d N, and that E ( ) Z i Z j = O ( (log i j ) 2+δ), δ > 0. Then there exist increasing sequences a n, b n, d n such that Drawbacks: ( a n max E (dn) ) ( ) w b n = an V(dn) b n G. n 15 / 30
Some theory and problems The problem Existence of d n Theorem w Suppose that E n (d) Z d for all fixed d N, and that E ( ) Z i Z j = O ( (log i j ) 2+δ), δ > 0. Then there exist increasing sequences a n, b n, d n such that Drawbacks: ( a n max E (dn) ) ( ) w b n = an V(dn) b n G. n Does not tell us anything about a possible growth rate of d n, might be log log log...n. 15 / 30
Some theory and problems The problem Existence of d n Theorem w Suppose that E n (d) Z d for all fixed d N, and that E ( ) Z i Z j = O ( (log i j ) 2+δ), δ > 0. Then there exist increasing sequences a n, b n, d n such that Drawbacks: ( a n max E (dn) ) ( ) w b n = an V(dn) b n G. n Does not tell us anything about a possible growth rate of d n, might be log log log...n. Proof is not really constructive. 15 / 30
Some theory and problems The problem Existence of d n Theorem w Suppose that E n (d) Z d for all fixed d N, and that E ( ) Z i Z j = O ( (log i j ) 2+δ), δ > 0. Then there exist increasing sequences a n, b n, d n such that Drawbacks: ( a n max E (dn) ) ( ) w b n = an V(dn) b n G. n Does not tell us anything about a possible growth rate of d n, might be log log log...n. Proof is not really constructive. What do to? 15 / 30
Some theory and problems The problem What to do? Need to control the quantity R n := P ( V (dn) u n ) P ( M(dn) u n ), (3) where M (dn) = max 1 h dn Z h. Claim then follows by using existing theory on Gaussian processes. 16 / 30
Some theory and problems The problem What to do? Need to control the quantity R n := P ( V (dn) u n ) P ( M(dn) u n ), (3) where M (dn) = max 1 h dn Z h. Claim then follows by using existing theory on Gaussian processes. Can we find an explicit bound for R n in terms of n, d n (gives us explicit growth rate)? 16 / 30
Some theory and problems The problem What to do? Need to control the quantity R n := P ( V (dn) u n ) P ( M(dn) u n ), (3) where M (dn) = max 1 h dn Z h. Claim then follows by using existing theory on Gaussian processes. Can we find an explicit bound for R n in terms of n, d n (gives us explicit growth rate)? Looks very similar to Berry-Essen type results. 16 / 30
Some theory and problems The problem What to do? Need to control the quantity R n := P ( V (dn) u n ) P ( M(dn) u n ), (3) where M (dn) = max 1 h dn Z h. Claim then follows by using existing theory on Gaussian processes. Can we find an explicit bound for R n in terms of n, d n (gives us explicit growth rate)? Looks very similar to Berry-Essen type results. Need normal approximation results.... 16 / 30
Some theory and problems The problem Normal approximation A huge variety of normal approximation techniques, depend on metric and dependence assumptions. 17 / 30
Some theory and problems The problem Normal approximation A huge variety of normal approximation techniques, depend on metric and dependence assumptions. Berry-Esséen type results... (Zolotarev, Götze, Bentkus, Senatov,...) 17 / 30
Some theory and problems The problem Normal approximation A huge variety of normal approximation techniques, depend on metric and dependence assumptions. Berry-Esséen type results... (Zolotarev, Götze, Bentkus, Senatov,...) Strong approximation techniques (KMT, Zaitsev, Mason, Philipp, Berkes,... ) 17 / 30
Some theory and problems The problem Normal approximation A huge variety of normal approximation techniques, depend on metric and dependence assumptions. Berry-Esséen type results... (Zolotarev, Götze, Bentkus, Senatov,...) Strong approximation techniques (KMT, Zaitsev, Mason, Philipp, Berkes,... ) Martingale approximation and embedding methods (Strassen, Hall, Heyde, Bolthausen,...) 17 / 30
Some theory and problems The problem Normal approximation A huge variety of normal approximation techniques, depend on metric and dependence assumptions. Berry-Esséen type results... (Zolotarev, Götze, Bentkus, Senatov,...) Strong approximation techniques (KMT, Zaitsev, Mason, Philipp, Berkes,... ) Martingale approximation and embedding methods (Strassen, Hall, Heyde, Bolthausen,...) Stein s method... 17 / 30
Some theory and problems The problem Normal approximation A huge variety of normal approximation techniques, depend on metric and dependence assumptions. Berry-Esséen type results... (Zolotarev, Götze, Bentkus, Senatov,...) Strong approximation techniques (KMT, Zaitsev, Mason, Philipp, Berkes,... ) Martingale approximation and embedding methods (Strassen, Hall, Heyde, Bolthausen,...) Stein s method... All these methods have advantages and disadvantages, there is no ultimate approach. 17 / 30
Special case: AR( ) revisited AR(d n ) An example: AR(d n ) revisited.... 18 / 30
Special case: AR( ) revisited AR(d n ) An example: AR(d n ) revisited.... Let { X k }k Z be an AR(d n)-process (or ) with parameter Θ dn = ( θ 1,..., θ dn ) T, i.e. X k = θ 1 X k 1 +... + θ q X k dn + ɛ k, where { ɛ k }k Z is an I.I.D. sequence. For 1 q d n, put φ h = E ( ) X k X k+h, k, h Z, Φq = ( ) T φ 1,..., φ q, and denote with φ n,h = 1 n n i=h+1 X ix i h. Let Γ q = ( ) φ i j be the q q 1 i,j q dimensional covariance matrix. 18 / 30
Special case: AR( ) revisited AR(d n ) An example: AR(d n ) revisited.... Let { X k }k Z be an AR(d n)-process (or ) with parameter Θ dn = ( θ 1,..., θ dn ) T, i.e. X k = θ 1 X k 1 +... + θ q X k dn + ɛ k, where { ɛ k }k Z is an I.I.D. sequence. For 1 q d n, put φ h = E ( ) X k X k+h, k, h Z, Φq = ( ) T φ 1,..., φ q, and denote with φ n,h = 1 n n i=h+1 X ix i h. Let Γ q = ( ) φ i j be the q q 1 i,j q dimensional covariance matrix. Then Γ q Θ q Φ q. It is thus natural to consider Γ 1 q Φ q = Θ q and σ 2 (q) = φ 0 Θ T q Φ q, σ 2 = E ( ɛ 2 0), the so called Yule-Walker equations. 18 / 30
Special case: AR( ) revisited A few facts/questions. 19 / 30
Special case: AR( ) revisited A few facts/questions. CLT (for fixed q): ( n Θq Θ w ( ) q) N 0, Γ 1 q. 19 / 30
Special case: AR( ) revisited A few facts/questions. CLT (for fixed q): ( n Θq Θ w ( ) q) N 0, Γ 1 q. A priori, the true order q 0 is usually not known. 19 / 30
Special case: AR( ) revisited A few facts/questions. CLT (for fixed q): ( n Θq Θ w ( ) q) N 0, Γ 1 q. A priori, the true order q 0 is usually not known. Famous estimators are information based criteria as AIC, BIC, SIC,... 19 / 30
Special case: AR( ) revisited A few facts/questions. CLT (for fixed q): ( n Θq Θ w ( ) q) N 0, Γ 1 q. A priori, the true order q 0 is usually not known. Famous estimators are information based criteria as AIC, BIC, SIC,... What can we say about a simultaneous confidence band for Θ dn and the relation of n, d n? 19 / 30
Special case: AR( ) revisited Some assumptions Assumption (B) { Xk }k Z admits a causal representation X k = i=0 α iɛ k i, such that sup n Ψ(m) = O ( m ϑ), ϑ > 0, where Ψ(m) := i=m α i, { } ɛk is a mean zero IID-sequence of random variables, such that k Z ɛ p k < for some p > 4, ɛ k 2 2 = σ2 > 0, k Z, sup n i=1 θ i <, θ n = O( (log n) 1 ). 20 / 30
Special case: AR( ) revisited Theorem Let { X k }k Z be an AR(d n) process satisfying Assumption (B). Suppose that d n as n increases, with d n = O ( n δ) such that 0 < δ < min { 1/2, ϑp/2 }, (1 2ϑ)δ < (p 4)/p. (4.1) If we have in addition that inf h γh,h > 0, then for z R ( P an 1 ( n max ( γ i,i σ 2 (d n )) 1/2 ( θ i θ i ) ) ) b n z exp( e z ), 1 i d n where a n = (2 log d n ) 1/2 and b n = (2 log d n ) 1/2 (8 log d n ) 1/2 (log log d n + 4π 4). 21 / 30
Special case: AR( ) revisited Theorem Let { X k }k Z be an AR(d n) process satisfying Assumption (B). Suppose that d n as n increases, with d n = O ( n δ) such that 0 < δ < min { 1/2, ϑp/2 }, (1 2ϑ)δ < (p 4)/p. (4.1) If we have in addition that inf h γh,h > 0, then for z R ( P an 1 ( n max ( γ i,i σ 2 (d n )) 1/2 ( θ i θ i ) ) ) b n z exp( e z ), 1 i d n where a n = (2 log d n ) 1/2 and b n = (2 log d n ) 1/2 (8 log d n ) 1/2 (log log d n + 4π 4). Note: if p (moments) is sufficiently large, then essentially d n = O( n ). 21 / 30
Special case: AR( ) revisited Order estimation in AR(q) We can use this result to construct a family of consistent order estimators. Put (x) + = max(0, x), and Υ n,i = an 1 ( n ( γ i,i σ 2 (d n )) 1/2 θi ) b n. 22 / 30
Special case: AR( ) revisited Order estimation in AR(q) We can use this result to construct a family of consistent order estimators. Put (x) + = max(0, x), and Then if z n q (1) z n Υ n,i = an 1 ( n ( γ i,i σ 2 (d n )) 1/2 θi ) b n. = min { q N a 1 n is a consistent estimator. ( n max ( γ i,i σ 2 (d n )) 1/2 θi ) } bn zn q+1 i d n 22 / 30
Special case: AR( ) revisited Order estimation in AR(q) We can use this result to construct a family of consistent order estimators. Put (x) + = max(0, x), and Then if z n q (1) z n q (2) z n Υ n,i = an 1 ( n ( γ i,i σ 2 (d n )) 1/2 θi ) b n. = min { q N an 1 ( n max ( γ i,i σ 2 (d n )) 1/2 θi ) } b n zn, q+1 i d n { {( ) + } } = argmin max Υn,i z n + log(1 + q), q N q+1 i d n are consistent estimators (extensions: subset-modelling!). 22 / 30
Special case: AR( ) revisited Order estimation A few facts about the estimators 23 / 30
Special case: AR( ) revisited Order estimation A few facts about the estimators One can compute the asymptotic distribution of q (1) z n, q (2) z n. 23 / 30
Special case: AR( ) revisited Order estimation A few facts about the estimators One can compute the asymptotic distribution of q (1) z n, q (2) z n. One can easily generalize the estimators to a very large family of estimators (weight/penalty functions). 23 / 30
Special case: AR( ) revisited Order estimation A few facts about the estimators One can compute the asymptotic distribution of q (1) z n, q (2) z n. One can easily generalize the estimators to a very large family of estimators (weight/penalty functions). q z (1) n, q z (2) n turn out to be good preliminary estimators for AIC, BIC, SIC. 23 / 30
Special case: AR( ) revisited Order estimation A few facts about the estimators One can compute the asymptotic distribution of q (1) z n, q (2) z n. One can easily generalize the estimators to a very large family of estimators (weight/penalty functions). q z (1) n, q z (2) n turn out to be good preliminary estimators for AIC, BIC, SIC. They significantly outperform AIC, BIC, SIC in sparse models. 23 / 30
Special case: AR( ) revisited Numerical results n q AIC AIC* BIC BIC* MIC MIC* q y (5) n q x (5) n 500 < 5 1 1 177 75 15 15 86 52 5 3 3 9 11 3 3 17 14 6 730 713 805 874 892 867 865 849 7 108 108 8 8 57 57 0 2 < 7 158 175 1 32 33 58 32 83 1000 < 5 0 0 3 0 0 0 0 0 5 0 0 0 0 0 0 0 0 6 724 709 990 951 934 901 955 885 7 103 101 7 9 47 44 5 7 > 7 173 190 0 40 19 55 40 108 Table : Simulation of an AR(6) process with coefficients Θ 6 = (0.1, 0.3, 0.05, 0.2, 0.1, 0.2) T, ɛ N (0, 1), 1000 repetitions, d n {13, 14}. 24 / 30
Special case: AR( ) revisited Numerical results Numerical results n q AIC AIC* BIC BIC* MIC MIC* q y (5) n q x (5) n 125 < 5 719 699 998 854 839 787 854 747 5 11 11 0 0 7 7 0 11 6 168 181 2 124 107 145 124 184 7 44 44 0 4 23 24 4 8 < 7 58 65 0 18 24 37 18 50 250 < 5 290 276 960 437 550 396 438 321 5 6 6 0 3 5 5 3 5 6 491 488 39 513 376 494 513 573 7 91 90 1 2 40 40 1 7 > 7 122 140 0 45 29 65 45 94 Table : Simulation of an AR(6) process with coefficients Θ 6 = (0.1, 0, 0.05, 0, 0, 0.2) T, ɛ N (0, 1), 1000 repetitions, d n {10, 12}. 25 / 30
Special case: AR( ) revisited Numerical results Numerical results n q AIC AIC* BIC BIC* MIC MIC* q y (5) n q x (5) n 500 < 5 21 21 761 102 164 85 102 56 5 0 0 1 0 0 0 0 1 6 663 655 234 871 736 796 874 863 7 125 124 4 3 69 68 0 10 < 7 191 200 0 24 31 51 24 70 1000 < 5 0 0 168 1 1 1 1 0 5 0 0 0 0 0 0 0 0 6 702 683 822 949 919 887 955 898 7 121 119 9 9 52 52 3 9 > 7 177 198 1 41 28 60 41 93 Table : Simulation of an AR(6) process with coefficients Θ 6 = (0.1, 0, 0.05, 0, 0, 0.2) T, ɛ N (0, 1), 1000 repetitions, d n {13, 14}. 26 / 30
Special case: AR( ) revisited Numerical results n q AIC AIC* BIC BIC* MIC MIC* q y (5) n q x (5) n 125 < 10 884 853 1000 920 963 910 920 861 11 3 3 0 0 1 1 0 3 12 68 94 0 71 25 70 71 114 13 11 13 0 3 4 7 3 5 > 13 34 37 0 6 7 12 6 17 250 < 10 509 421 999 555 792 530 555 424 11 3 3 0 3 2 3 3 4 12 340 419 1 421 170 416 421 514 13 67 68 0 2 18 19 2 5 > 13 81 89 0 19 18 32 19 53 Table : Simulation of an AR(12) process with nonzero coefficients θ 1 = 0.1, θ 3 = 0.4, θ 12 = 0.2 ɛ N (0, 1), 1000 repetitions, d n {20, 23}. 27 / 30
Special case: AR( ) revisited Numerical results n q AIC AIC* BIC BIC* MIC MIC* q y (5) n q x (5) n 500 < 11 77 58 983 125 402 115 125 78 11 0 0 0 2 0 1 2 1 12 663 678 17 858 532 808 858 870 13 104 103 0 3 39 40 3 4 > 13 156 161 0 12 27 36 12 47 1000 < 11 0 0 689 2 35 2 2 2 11 0 0 0 0 0 0 0 0 12 706 701 307 971 893 907 972 936 13 124 123 2 2 54 53 1 3 > 13 170 176 2 25 18 38 25 59 Table : Simulation of an AR(12) process with nonzero coefficients θ 1 = 0.1, θ 3 = 0.4, θ 12 = 0.2 ɛ N (0, 1), 1000 repetitions, d n {25, 28}. 28 / 30
Special case: AR( ) revisited Numerical results Thank You for Your patience! 29 / 30
Special case: AR( ) revisited Numerical results Some references I. Berkes and W. Philipp. Approximation theorems for independent and weakly dependent random vectors. Ann. Probab., 7(1):29 54, 1979. T. Jiang. The asymptotic distributions of the largest entries of sample correlation matrices. Ann. Appl. Probab., 14(2):865 880, 2004. M. Jirak. Simultaneous confidence bands for yule-walker estimators and order selection. Annals of Statistics. 40(1): 494 528, 2012. M. Jirak. A darling-erdös type result for stationary ellipsoids. Stochastic processes and its applications, to appear. 30 / 30