Extreme inference in stationary time series

Similar documents
Bootstrapping high dimensional vector: interplay between dependence and dimensionality

Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma

Multivariate Time Series

Asymptotic Statistics-VI. Changliang Zou

Introduction to Time Series Analysis. Lecture 11.

Stat 710: Mathematical Statistics Lecture 31

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

A Significance Test for the Lasso

Hypothesis Testing For Multilayer Network Data

Empirical Likelihood Tests for High-dimensional Data

Optimal Estimation of a Nonsmooth Functional

Stein s Method and Characteristic Functions

Akaike criterion: Kullback-Leibler discrepancy

Single Index Quantile Regression for Heteroscedastic Data

Detection of structural breaks in multivariate time series

Akaike criterion: Kullback-Leibler discrepancy

Lecture 2: ARMA(p,q) models (part 2)

Lecture 28: Asymptotic confidence sets

Concentration Inequalities for Random Matrices

A Hierarchy of Information Quantities for Finite Block Length Analysis of Quantum Tasks

Lawrence D. Brown* and Daniel McCarthy*

Non-Stationary Time Series and Unit Root Testing

The largest eigenvalues of the sample covariance matrix. in the heavy-tail case

Multiple Testing of One-Sided Hypotheses: Combining Bonferroni and the Bootstrap

Lecture 8 Inequality Testing and Moment Inequality Models

STAT 200C: High-dimensional Statistics

Hypothesis Testing via Convex Optimization

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Week 5 Quantitative Analysis of Financial Markets Characterizing Cycles

Robustní monitorování stability v modelu CAPM

Introduction to Self-normalized Limit Theory

Advanced Statistics II: Non Parametric Tests

Large sample distribution for fully functional periodicity tests

STAT 461/561- Assignments, Year 2015

Discussion of High-dimensional autocovariance matrices and optimal linear prediction,

Stochastic Convergence, Delta Method & Moment Estimators

Self-normalized Cramér-Type Large Deviations for Independent Random Variables

Adjusted Empirical Likelihood for Long-memory Time Series Models

Small Ball Probability, Arithmetic Structure and Random Matrices

Sliced Inverse Regression

3. ARMA Modeling. Now: Important class of stationary processes

IMPROVING TWO RESULTS IN MULTIPLE TESTING

INFORMATION APPROACH FOR CHANGE POINT DETECTION OF WEIBULL MODELS WITH APPLICATIONS. Tao Jiang. A Thesis

17: INFERENCE FOR MULTIPLE REGRESSION. Inference for Individual Regression Coefficients

STT 843 Key to Homework 1 Spring 2018

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

Stat 5101 Lecture Notes

Modeling and testing long memory in random fields

STAT Financial Time Series

Single Index Quantile Regression for Heteroscedastic Data

Multivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8]

Strong approximation for additive functionals of geometrically ergodic Markov chains

Journal of Statistical Research 2007, Vol. 41, No. 1, pp Bangladesh

Minimax Estimation of Kernel Mean Embeddings

Non-Stationary Time Series and Unit Root Testing

Han-Ying Liang, Dong-Xia Zhang, and Jong-Il Baek

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

On detection of unit roots generalizing the classic Dickey-Fuller approach

Unsupervised Learning: Dimensionality Reduction

Asymmetric least squares estimation and testing

Exponential tail inequalities for eigenvalues of random matrices

High-dimensional two-sample tests under strongly spiked eigenvalue models

MAT3379 (Winter 2016)

Ch 6. Model Specification. Time Series Analysis

Lecture 3. Inference about multivariate normal distribution

RELATIVE ERRORS IN CENTRAL LIMIT THEOREMS FOR STUDENT S t STATISTIC, WITH APPLICATIONS

STAT 200C: High-dimensional Statistics

Time Series 2. Robert Almgren. Sept. 21, 2009

ON TWO RESULTS IN MULTIPLE TESTING

Statistical inference on Lévy processes

Model selection using penalty function criteria

Covariance function estimation in Gaussian process regression

If we want to analyze experimental or simulated data we might encounter the following tasks:

Comment about AR spectral estimation Usually an estimate is produced by computing the AR theoretical spectrum at (ˆφ, ˆσ 2 ). With our Monte Carlo

Lecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf

Regression and Statistical Inference

Monitoring Wafer Geometric Quality using Additive Gaussian Process

Multivariate Analysis and Likelihood Inference

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

A central limit theorem for an omnibus embedding of random dot product graphs

A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices

Generalized Method of Moments Estimation

Non-Stationary Time Series and Unit Root Testing

The circular law. Lewis Memorial Lecture / DIMACS minicourse March 19, Terence Tao (UCLA)

Concentration of Measures by Bounded Couplings

A sequence of triangle-free pseudorandom graphs

Linear Models and Estimation by Least Squares

Linear regression methods

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Lecture 32: Asymptotic confidence sets and likelihoods

Elements of Multivariate Time Series Analysis

A Gentle Introduction to Stein s Method for Normal Approximation I

Bayesian Nonparametric Point Estimation Under a Conjugate Prior

Asymptotic Statistics-III. Changliang Zou

Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued

Some Curiosities Arising in Objective Bayesian Analysis

On the Power of Tests for Regime Switching

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University

Transcription:

Extreme inference in stationary time series Moritz Jirak FOR 1735 February 8, 2013 1 / 30

Outline 1 Outline 2 Motivation The multivariate CLT Measuring discrepancies 3 Some theory and problems The problem 4 Special case: AR( ) revisited Numerical results 2 / 30

Motivation The multivariate CLT Introduction One of the most fundamental tools in probability theory and statistics is the multivariate CLT, 3 / 30

Motivation The multivariate CLT Introduction One of the most fundamental tools in probability theory and statistics is the multivariate CLT, X (d) k 1 n n k=1 X (d) k w B ( Γ ), is a sequence of d-dimensional zero mean, stationary r. v; Γ asymptotic covariance matrix. 3 / 30

Motivation The multivariate CLT Introduction One of the most fundamental tools in probability theory and statistics is the multivariate CLT, X (d) k 1 n n k=1 X (d) k w B ( Γ ), is a sequence of d-dimensional zero mean, stationary r. v; Γ asymptotic covariance matrix. Applications: Use it for model diagnosis, hypothesis testing,.... 3 / 30

Motivation The multivariate CLT Introduction One of the most fundamental tools in probability theory and statistics is the multivariate CLT, X (d) k 1 n n k=1 X (d) k w B ( Γ ), is a sequence of d-dimensional zero mean, stationary r. v; Γ asymptotic covariance matrix. Applications: Use it for model diagnosis, hypothesis testing,.... More specific: Let X (n) = (X 1,..., X n ) be a sample, and S n,d = ( S n,1 (X 1,..., X n ),..., S n,d (X 1,..., X n ) ) T, be some statistics. 3 / 30

Motivation The multivariate CLT Introduction One of the most fundamental tools in probability theory and statistics is the multivariate CLT, X (d) k 1 n n k=1 X (d) k w B ( Γ ), is a sequence of d-dimensional zero mean, stationary r. v; Γ asymptotic covariance matrix. Applications: Use it for model diagnosis, hypothesis testing,.... More specific: Let X (n) = (X 1,..., X n ) be a sample, and S n,d = ( S n,1 (X 1,..., X n ),..., S n,d (X 1,..., X n ) ) T, be some statistics. In general, the relation between d = d n and n is very important, but let s not worry about this for the moment.... 3 / 30

Motivation The multivariate CLT Using the multivariate CLT Consider the following confidence regions/expressions : X 2 d = { Θ d (S n,d Θ d ) T Γ 1 (S n,d Θ d ) n 1 χ 2 1 α(d) }, where Γ is an estimator of the covariance matrix, χ 2 1 α (d) quantile of Chi-square distribution. 4 / 30

Motivation The multivariate CLT Using the multivariate CLT Consider the following confidence regions/expressions : X 2 d = { Θ d (S n,d Θ d ) T Γ 1 (S n,d Θ d ) n 1 χ 2 1 α(d) }, where Γ is an estimator of the covariance matrix, χ 2 1 α (d) quantile of Chi-square distribution. V d = n max 1 h d γ 1 h,h S n,h (X 1,..., X n ) θ h, where γ h,h 2 is an estimator of the diagonal elements γ2 h,h covariance matrix Γ (simultaneous confidence band). of the 4 / 30

Motivation The multivariate CLT Using the multivariate CLT Consider the following confidence regions/expressions : X 2 d = { Θ d (S n,d Θ d ) T Γ 1 (S n,d Θ d ) n 1 χ 2 1 α(d) }, where Γ is an estimator of the covariance matrix, χ 2 1 α (d) quantile of Chi-square distribution. V d = n max 1 h d γ 1 h,h S n,h (X 1,..., X n ) θ h, where γ h,h 2 is an estimator of the diagonal elements γ2 h,h covariance matrix Γ (simultaneous confidence band). of the D d = max 1 h d (2h) 1/2 n (Sn,h Θ d ) T Γ 1 (S n,h Θ h ) h, where Γ is an estimator of the covariance matrix. 4 / 30

Motivation The multivariate CLT Using the multivariate CLT The confidence ellipsoid X 2 d T-Tests,.... wraps up many tests such as F-Tests, 5 / 30

Motivation The multivariate CLT Using the multivariate CLT The confidence ellipsoid X 2 d wraps up many tests such as F-Tests, T-Tests,.... Where are V d, D d (more) useful? 5 / 30

Motivation Measuring discrepancies Quality: Local and global discrepancies The ellipsoid is a global measure, i.e; measures the global discrepancy by summing up all local discrepancies. X 2 d = { Θ d (S n,d Θ d ) T Γ 1 (S n,d Θ d ) n 1 χ 2 1 α(d) }. (1) 6 / 30

Motivation Measuring discrepancies Quality: Local and global discrepancies The ellipsoid is a global measure, i.e; measures the global discrepancy by summing up all local discrepancies. X 2 d = { Θ d (S n,d Θ d ) T Γ 1 (S n,d Θ d ) n 1 χ 2 1 α(d) }. (1) Suppose that we wish to test the null hypothesis H 0 : Θ d = 0 d = (0,..., 0) T vs H A : Θ d 0 d. 6 / 30

Motivation Measuring discrepancies Quality: Local and global discrepancies The ellipsoid is a global measure, i.e; measures the global discrepancy by summing up all local discrepancies. X 2 d = { Θ d (S n,d Θ d ) T Γ 1 (S n,d Θ d ) n 1 χ 2 1 α(d) }. (1) Suppose that we wish to test the null hypothesis If min Θ d > 0, then H 0 : Θ d = 0 d = (0,..., 0) T vs H A : Θ d 0 d. X 2 d = n (Sn,d 0 d ) T Γ 1 d (S n,d 0 d ) O P ( nd ). 6 / 30

Motivation Measuring discrepancies Quality: Local and global discrepancies The ellipsoid is a global measure, i.e; measures the global discrepancy by summing up all local discrepancies. X 2 d = { Θ d (S n,d Θ d ) T Γ 1 (S n,d Θ d ) n 1 χ 2 1 α(d) }. (1) Suppose that we wish to test the null hypothesis H 0 : Θ d = 0 d = (0,..., 0) T vs H A : Θ d 0 d. 7 / 30

Motivation Measuring discrepancies Quality: Local and global discrepancies The ellipsoid is a global measure, i.e; measures the global discrepancy by summing up all local discrepancies. X 2 d = { Θ d (S n,d Θ d ) T Γ 1 (S n,d Θ d ) n 1 χ 2 1 α(d) }. (1) Suppose that we wish to test the null hypothesis H 0 : Θ d = 0 d = (0,..., 0) T vs H A : Θ d 0 d. If θ i = 0 and only θ d 0, then X 2 d = n (S n,d 0 d ) T Γ 1 d (S n,d 0 d ) O P ( n ). Hence we lose power if d increases (χ 2 1 α (d) d)! 7 / 30

Motivation Measuring discrepancies Quality: Local and global discrepancies The ellipsoid is a global measure, i.e; measures the global discrepancy by summing up all local discrepancies. X 2 d = { Θ d (S n,d Θ d ) T Γ 1 (S n,d Θ d ) n 1 χ 2 1 α(d) }. (1) Suppose that we wish to test the null hypothesis H 0 : Θ d = 0 d = (0,..., 0) T vs H A : Θ d 0 d. If θ i = 0 and only θ d 0, then X 2 d = n (S n,d 0 d ) T Γ 1 d (S n,d 0 d ) O P ( n ). Hence we lose power if d increases (χ 2 1 α (d) d)! 7 / 30

Motivation Measuring discrepancies Quality: Local and global discrepancies If d is large, additional tools such as principal component analysis are often used (still global measure). Recently, some authors (Cai, Jiang, Liu, Wu,... ) proposed, studied (in special cases) and successfully applied the following local measure: 8 / 30

Motivation Measuring discrepancies Quality: Local and global discrepancies If d is large, additional tools such as principal component analysis are often used (still global measure). Recently, some authors (Cai, Jiang, Liu, Wu,... ) proposed, studied (in special cases) and successfully applied the following local measure: V d = n max S n,h (X 1,..., X n ) θ h, 1 h d γ 1 h,h where γ h,h 2 is an estimator of the diagonal elements γ2 h,h matrix Γ. of the covariance 8 / 30

Motivation Measuring discrepancies Quality: Local and global discrepancies If d is large, additional tools such as principal component analysis are often used (still global measure). Recently, some authors (Cai, Jiang, Liu, Wu,... ) proposed, studied (in special cases) and successfully applied the following local measure: V d = n max S n,h (X 1,..., X n ) θ h, 1 h d γ 1 h,h where γ h,h 2 is an estimator of the diagonal elements γ2 h,h matrix Γ. The confidence band V d has some nice properties... of the covariance 8 / 30

Some theory and problems Properties of V d V d = n max 1 h d γ 1 h,h S n,h (X 1,..., X n ) θ h 9 / 30

Some theory and problems Properties of V d V d = n max 1 h d γ 1 h,h S n,h (X 1,..., X n ) θ h Easy to provide inference for single elements or small groups of S n,h (X 1,..., X n ), in particular, whether parameters are equal to zero or not. 9 / 30

Some theory and problems Properties of V d V d = n max 1 h d γ 1 h,h S n,h (X 1,..., X n ) θ h Easy to provide inference for single elements or small groups of S n,h (X 1,..., X n ), in particular, whether parameters are equal to zero or not. Only need to estimate the diagonal elements γ h,h of Γ. 9 / 30

Some theory and problems Properties of V d V d = n max 1 h d γ 1 h,h S n,h (X 1,..., X n ) θ h Easy to provide inference for single elements or small groups of S n,h (X 1,..., X n ), in particular, whether parameters are equal to zero or not. Only need to estimate the diagonal elements γ h,h of Γ. Reconsider H 0 : Θ d = 0 d = (0,..., 0) T against H A : Θ d 0 d : 9 / 30

Some theory and problems Properties of V d V d = n max 1 h d γ 1 h,h S n,h (X 1,..., X n ) θ h Easy to provide inference for single elements or small groups of S n,h (X 1,..., X n ), in particular, whether parameters are equal to zero or not. Only need to estimate the diagonal elements γ h,h of Γ. Reconsider H 0 : Θ d = 0 d = (0,..., 0) T against H A : Θ d 0 d : Good power even if only θ d 0, but lower power than X 2 d if min Θ d > 0. 9 / 30

Some theory and problems Order estimation I A particular case where θ i 0 for many indices i is order estimation. 10 / 30

Some theory and problems Order estimation I A particular case where θ i 0 for many indices i is order estimation. Consider for example the case of an AR( ) process { X k }k Z : X k = θ 1 X k 1 + θ 2 X k 2 +... + ɛ k. 10 / 30

Some theory and problems Order estimation I A particular case where θ i 0 for many indices i is order estimation. Consider for example the case of an AR( ) process { X k }k Z : X k = θ 1 X k 1 + θ 2 X k 2 +... + ɛ k. If d is large enough, then θ d 0, in fact θ d = O ( ρ d) in many cases. 10 / 30

Some theory and problems Order estimation I A particular case where θ i 0 for many indices i is order estimation. Consider for example the case of an AR( ) process { X k }k Z : X k = θ 1 X k 1 + θ 2 X k 2 +... + ɛ k. If d is large enough, then θ d 0, in fact θ d = O ( ρ d) in many cases. Using V d, one can decide upon wether θ i is redundant or not (gives order estimate, order = last index which is not redundant). 10 / 30

Some theory and problems Order estimation I A particular case where θ i 0 for many indices i is order estimation. Consider for example the case of an AR( ) process { X k }k Z : X k = θ 1 X k 1 + θ 2 X k 2 +... + ɛ k. If d is large enough, then θ d 0, in fact θ d = O ( ρ d) in many cases. Using V d, one can decide upon wether θ i is redundant or not (gives order estimate, order = last index which is not redundant). However: now D d enters, as a replacement for X 2 d : 10 / 30

Some theory and problems Order estimation I A particular case where θ i 0 for many indices i is order estimation. Consider for example the case of an AR( ) process { X k }k Z : X k = θ 1 X k 1 + θ 2 X k 2 +... + ɛ k. If d is large enough, then θ d 0, in fact θ d = O ( ρ d) in many cases. Using V d, one can decide upon wether θ i is redundant or not (gives order estimate, order = last index which is not redundant). However: now D d enters, as a replacement for X 2 d : In some sense, D d measures for which h the whole tail (θ h, θ h+1,...) is redundant. D d (l) = max l h d (2h) 1/2 n (S n,h Θ d ) T Γ 1 (S n,h Θ h ) h, 10 / 30

Some theory and problems Order estimation II A related issue is variance estimation in the CLT if { X k }k Z dependent (zero mean): is weakly n 1/2 n w ( X k N 0, σ 2 ) where σ 2 = E[X k X 0 ]. k=1 k= 11 / 30

Some theory and problems Order estimation II A related issue is variance estimation in the CLT if { X k }k Z dependent (zero mean): is weakly n 1/2 n w ( X k N 0, σ 2 ) where σ 2 = E[X k X 0 ]. k=1 k= Usual estimators look like σ 2 = h n k= h n w(k, n) φ k, where φ n k k = (n k) 1 X j X j+k. j=1 11 / 30

Some theory and problems Order estimation II A related issue is variance estimation in the CLT if { X k }k Z dependent (zero mean): is weakly n 1/2 n w ( X k N 0, σ 2 ) where σ 2 = E[X k X 0 ]. k=1 k= Usual estimators look like σ 2 = h n k= h n w(k, n) φ k, where φ n k k = (n k) 1 X j X j+k. j=1 Following questions: 11 / 30

Some theory and problems Order estimation II A related issue is variance estimation in the CLT if { X k }k Z dependent (zero mean): is weakly n 1/2 n w ( X k N 0, σ 2 ) where σ 2 = E[X k X 0 ]. k=1 k= Usual estimators look like σ 2 = h n k= h n w(k, n) φ k, where φ n k k = (n k) 1 X j X j+k. j=1 Following questions: Which φ k (which lags) should we use? 11 / 30

Some theory and problems Order estimation II A related issue is variance estimation in the CLT if { X k }k Z dependent (zero mean): is weakly n 1/2 n w ( X k N 0, σ 2 ) where σ 2 = E[X k X 0 ]. k=1 k= Usual estimators look like σ 2 = h n k= h n w(k, n) φ k, where φ n k k = (n k) 1 X j X j+k. j=1 Following questions: Which φ k (which lags) should we use? How large should we chose h n? 11 / 30

Some theory and problems Order estimation II Since φ k = E[X k X 0 ] 0 as k increases, both V d and D d can be used again to quantify h n and obtain consistent estimators for σ 2. 12 / 30

Some theory and problems Order estimation II Since φ k = E[X k X 0 ] 0 as k increases, both V d and D d can be used again to quantify h n and obtain consistent estimators for σ 2. It is possible to construct cases where V d is superior to D d,... 12 / 30

Some theory and problems Order estimation II Since φ k = E[X k X 0 ] 0 as k increases, both V d and D d can be used again to quantify h n and obtain consistent estimators for σ 2. It is possible to construct cases where V d is superior to D d,... and it is also possible to construct cases where D d is superior to V d,... 12 / 30

Some theory and problems Order estimation II Since φ k = E[X k X 0 ] 0 as k increases, both V d and D d can be used again to quantify h n and obtain consistent estimators for σ 2. It is possible to construct cases where V d is superior to D d,... and it is also possible to construct cases where D d is superior to V d,... however, often D d gives the better result. 12 / 30

Some theory and problems Properties of X 2 d, V d and D d If we want to use V d and D d, we need to control them asymptotically (for large n and d = d n.) 13 / 30

Some theory and problems Properties of X 2 d, V d and D d If we want to use V d and D d, we need to control them asymptotically (for large n and d = d n.) Expect that under some conditions : X 2 d n χ 2 (d n ), a n ( Vdn b n ) w G, A n D dn B n w G, for appropriate sequences a n, b n, A n, B n, where G is an extreme value distribution and d n = d is an increasing function in n. 13 / 30

Some theory and problems Properties of X 2 d, V d and D d If we want to use V d and D d, we need to control them asymptotically (for large n and d = d n.) Expect that under some conditions : X 2 d n χ 2 (d n ), a n ( Vdn b n ) w G, A n D dn B n w G, for appropriate sequences a n, b n, A n, B n, where G is an extreme value distribution and d n = d is an increasing function in n. However: Establishing the above is highly nontrivial, need (weak) dependence assumptions. 13 / 30

Some theory and problems Properties of X 2 d, V d and D d If we want to use V d and D d, we need to control them asymptotically (for large n and d = d n.) Expect that under some conditions : X 2 d n χ 2 (d n ), a n ( Vdn b n ) w G, A n D dn B n w G, for appropriate sequences a n, b n, A n, B n, where G is an extreme value distribution and d n = d is an increasing function in n. However: Establishing the above is highly nontrivial, need (weak) dependence assumptions. What can we say about relation of n and d n (d n = n δ ), δ > 0? 13 / 30

Some theory and problems The problem Our setting Suppose that E (d) n := n 1/2( g (d) E(g (d) ) ) w Z d, where Z d = ( Z 1,..., Z d ) T is a Gaussian vector with appropriate covariance matrix. 14 / 30

Some theory and problems The problem Our setting Suppose that E (d) n := n 1/2( g (d) E(g (d) ) ) w Z d, where Z d = ( ) T Z 1,..., Z d is a Gaussian vector with appropriate covariance matrix. Then E n (d) n Z d max?d n? d max G 14 / 30

Some theory and problems The problem Our setting Suppose that E (d) n := n 1/2( g (d) E(g (d) ) ) w Z d, where Z d = ( ) T Z 1,..., Z d is a Gaussian vector with appropriate covariance matrix. Then E n (d) n Z d max?d n? d max What can we say about the existence and properties of such a sequence d n, i.e. lim n max E n (dn) w G (appropriately normalized)? G 14 / 30

Some theory and problems The problem Existence of d n Theorem w Suppose that E n (d) Z d for all fixed d N, and that E ( ) Z i Z j = O ( (log i j ) 2+δ), δ > 0. Then there exist increasing sequences a n, b n, d n such that Drawbacks: ( a n max E (dn) ) ( ) w b n = an V(dn) b n G. n 15 / 30

Some theory and problems The problem Existence of d n Theorem w Suppose that E n (d) Z d for all fixed d N, and that E ( ) Z i Z j = O ( (log i j ) 2+δ), δ > 0. Then there exist increasing sequences a n, b n, d n such that Drawbacks: ( a n max E (dn) ) ( ) w b n = an V(dn) b n G. n Does not tell us anything about a possible growth rate of d n, might be log log log...n. 15 / 30

Some theory and problems The problem Existence of d n Theorem w Suppose that E n (d) Z d for all fixed d N, and that E ( ) Z i Z j = O ( (log i j ) 2+δ), δ > 0. Then there exist increasing sequences a n, b n, d n such that Drawbacks: ( a n max E (dn) ) ( ) w b n = an V(dn) b n G. n Does not tell us anything about a possible growth rate of d n, might be log log log...n. Proof is not really constructive. 15 / 30

Some theory and problems The problem Existence of d n Theorem w Suppose that E n (d) Z d for all fixed d N, and that E ( ) Z i Z j = O ( (log i j ) 2+δ), δ > 0. Then there exist increasing sequences a n, b n, d n such that Drawbacks: ( a n max E (dn) ) ( ) w b n = an V(dn) b n G. n Does not tell us anything about a possible growth rate of d n, might be log log log...n. Proof is not really constructive. What do to? 15 / 30

Some theory and problems The problem What to do? Need to control the quantity R n := P ( V (dn) u n ) P ( M(dn) u n ), (3) where M (dn) = max 1 h dn Z h. Claim then follows by using existing theory on Gaussian processes. 16 / 30

Some theory and problems The problem What to do? Need to control the quantity R n := P ( V (dn) u n ) P ( M(dn) u n ), (3) where M (dn) = max 1 h dn Z h. Claim then follows by using existing theory on Gaussian processes. Can we find an explicit bound for R n in terms of n, d n (gives us explicit growth rate)? 16 / 30

Some theory and problems The problem What to do? Need to control the quantity R n := P ( V (dn) u n ) P ( M(dn) u n ), (3) where M (dn) = max 1 h dn Z h. Claim then follows by using existing theory on Gaussian processes. Can we find an explicit bound for R n in terms of n, d n (gives us explicit growth rate)? Looks very similar to Berry-Essen type results. 16 / 30

Some theory and problems The problem What to do? Need to control the quantity R n := P ( V (dn) u n ) P ( M(dn) u n ), (3) where M (dn) = max 1 h dn Z h. Claim then follows by using existing theory on Gaussian processes. Can we find an explicit bound for R n in terms of n, d n (gives us explicit growth rate)? Looks very similar to Berry-Essen type results. Need normal approximation results.... 16 / 30

Some theory and problems The problem Normal approximation A huge variety of normal approximation techniques, depend on metric and dependence assumptions. 17 / 30

Some theory and problems The problem Normal approximation A huge variety of normal approximation techniques, depend on metric and dependence assumptions. Berry-Esséen type results... (Zolotarev, Götze, Bentkus, Senatov,...) 17 / 30

Some theory and problems The problem Normal approximation A huge variety of normal approximation techniques, depend on metric and dependence assumptions. Berry-Esséen type results... (Zolotarev, Götze, Bentkus, Senatov,...) Strong approximation techniques (KMT, Zaitsev, Mason, Philipp, Berkes,... ) 17 / 30

Some theory and problems The problem Normal approximation A huge variety of normal approximation techniques, depend on metric and dependence assumptions. Berry-Esséen type results... (Zolotarev, Götze, Bentkus, Senatov,...) Strong approximation techniques (KMT, Zaitsev, Mason, Philipp, Berkes,... ) Martingale approximation and embedding methods (Strassen, Hall, Heyde, Bolthausen,...) 17 / 30

Some theory and problems The problem Normal approximation A huge variety of normal approximation techniques, depend on metric and dependence assumptions. Berry-Esséen type results... (Zolotarev, Götze, Bentkus, Senatov,...) Strong approximation techniques (KMT, Zaitsev, Mason, Philipp, Berkes,... ) Martingale approximation and embedding methods (Strassen, Hall, Heyde, Bolthausen,...) Stein s method... 17 / 30

Some theory and problems The problem Normal approximation A huge variety of normal approximation techniques, depend on metric and dependence assumptions. Berry-Esséen type results... (Zolotarev, Götze, Bentkus, Senatov,...) Strong approximation techniques (KMT, Zaitsev, Mason, Philipp, Berkes,... ) Martingale approximation and embedding methods (Strassen, Hall, Heyde, Bolthausen,...) Stein s method... All these methods have advantages and disadvantages, there is no ultimate approach. 17 / 30

Special case: AR( ) revisited AR(d n ) An example: AR(d n ) revisited.... 18 / 30

Special case: AR( ) revisited AR(d n ) An example: AR(d n ) revisited.... Let { X k }k Z be an AR(d n)-process (or ) with parameter Θ dn = ( θ 1,..., θ dn ) T, i.e. X k = θ 1 X k 1 +... + θ q X k dn + ɛ k, where { ɛ k }k Z is an I.I.D. sequence. For 1 q d n, put φ h = E ( ) X k X k+h, k, h Z, Φq = ( ) T φ 1,..., φ q, and denote with φ n,h = 1 n n i=h+1 X ix i h. Let Γ q = ( ) φ i j be the q q 1 i,j q dimensional covariance matrix. 18 / 30

Special case: AR( ) revisited AR(d n ) An example: AR(d n ) revisited.... Let { X k }k Z be an AR(d n)-process (or ) with parameter Θ dn = ( θ 1,..., θ dn ) T, i.e. X k = θ 1 X k 1 +... + θ q X k dn + ɛ k, where { ɛ k }k Z is an I.I.D. sequence. For 1 q d n, put φ h = E ( ) X k X k+h, k, h Z, Φq = ( ) T φ 1,..., φ q, and denote with φ n,h = 1 n n i=h+1 X ix i h. Let Γ q = ( ) φ i j be the q q 1 i,j q dimensional covariance matrix. Then Γ q Θ q Φ q. It is thus natural to consider Γ 1 q Φ q = Θ q and σ 2 (q) = φ 0 Θ T q Φ q, σ 2 = E ( ɛ 2 0), the so called Yule-Walker equations. 18 / 30

Special case: AR( ) revisited A few facts/questions. 19 / 30

Special case: AR( ) revisited A few facts/questions. CLT (for fixed q): ( n Θq Θ w ( ) q) N 0, Γ 1 q. 19 / 30

Special case: AR( ) revisited A few facts/questions. CLT (for fixed q): ( n Θq Θ w ( ) q) N 0, Γ 1 q. A priori, the true order q 0 is usually not known. 19 / 30

Special case: AR( ) revisited A few facts/questions. CLT (for fixed q): ( n Θq Θ w ( ) q) N 0, Γ 1 q. A priori, the true order q 0 is usually not known. Famous estimators are information based criteria as AIC, BIC, SIC,... 19 / 30

Special case: AR( ) revisited A few facts/questions. CLT (for fixed q): ( n Θq Θ w ( ) q) N 0, Γ 1 q. A priori, the true order q 0 is usually not known. Famous estimators are information based criteria as AIC, BIC, SIC,... What can we say about a simultaneous confidence band for Θ dn and the relation of n, d n? 19 / 30

Special case: AR( ) revisited Some assumptions Assumption (B) { Xk }k Z admits a causal representation X k = i=0 α iɛ k i, such that sup n Ψ(m) = O ( m ϑ), ϑ > 0, where Ψ(m) := i=m α i, { } ɛk is a mean zero IID-sequence of random variables, such that k Z ɛ p k < for some p > 4, ɛ k 2 2 = σ2 > 0, k Z, sup n i=1 θ i <, θ n = O( (log n) 1 ). 20 / 30

Special case: AR( ) revisited Theorem Let { X k }k Z be an AR(d n) process satisfying Assumption (B). Suppose that d n as n increases, with d n = O ( n δ) such that 0 < δ < min { 1/2, ϑp/2 }, (1 2ϑ)δ < (p 4)/p. (4.1) If we have in addition that inf h γh,h > 0, then for z R ( P an 1 ( n max ( γ i,i σ 2 (d n )) 1/2 ( θ i θ i ) ) ) b n z exp( e z ), 1 i d n where a n = (2 log d n ) 1/2 and b n = (2 log d n ) 1/2 (8 log d n ) 1/2 (log log d n + 4π 4). 21 / 30

Special case: AR( ) revisited Theorem Let { X k }k Z be an AR(d n) process satisfying Assumption (B). Suppose that d n as n increases, with d n = O ( n δ) such that 0 < δ < min { 1/2, ϑp/2 }, (1 2ϑ)δ < (p 4)/p. (4.1) If we have in addition that inf h γh,h > 0, then for z R ( P an 1 ( n max ( γ i,i σ 2 (d n )) 1/2 ( θ i θ i ) ) ) b n z exp( e z ), 1 i d n where a n = (2 log d n ) 1/2 and b n = (2 log d n ) 1/2 (8 log d n ) 1/2 (log log d n + 4π 4). Note: if p (moments) is sufficiently large, then essentially d n = O( n ). 21 / 30

Special case: AR( ) revisited Order estimation in AR(q) We can use this result to construct a family of consistent order estimators. Put (x) + = max(0, x), and Υ n,i = an 1 ( n ( γ i,i σ 2 (d n )) 1/2 θi ) b n. 22 / 30

Special case: AR( ) revisited Order estimation in AR(q) We can use this result to construct a family of consistent order estimators. Put (x) + = max(0, x), and Then if z n q (1) z n Υ n,i = an 1 ( n ( γ i,i σ 2 (d n )) 1/2 θi ) b n. = min { q N a 1 n is a consistent estimator. ( n max ( γ i,i σ 2 (d n )) 1/2 θi ) } bn zn q+1 i d n 22 / 30

Special case: AR( ) revisited Order estimation in AR(q) We can use this result to construct a family of consistent order estimators. Put (x) + = max(0, x), and Then if z n q (1) z n q (2) z n Υ n,i = an 1 ( n ( γ i,i σ 2 (d n )) 1/2 θi ) b n. = min { q N an 1 ( n max ( γ i,i σ 2 (d n )) 1/2 θi ) } b n zn, q+1 i d n { {( ) + } } = argmin max Υn,i z n + log(1 + q), q N q+1 i d n are consistent estimators (extensions: subset-modelling!). 22 / 30

Special case: AR( ) revisited Order estimation A few facts about the estimators 23 / 30

Special case: AR( ) revisited Order estimation A few facts about the estimators One can compute the asymptotic distribution of q (1) z n, q (2) z n. 23 / 30

Special case: AR( ) revisited Order estimation A few facts about the estimators One can compute the asymptotic distribution of q (1) z n, q (2) z n. One can easily generalize the estimators to a very large family of estimators (weight/penalty functions). 23 / 30

Special case: AR( ) revisited Order estimation A few facts about the estimators One can compute the asymptotic distribution of q (1) z n, q (2) z n. One can easily generalize the estimators to a very large family of estimators (weight/penalty functions). q z (1) n, q z (2) n turn out to be good preliminary estimators for AIC, BIC, SIC. 23 / 30

Special case: AR( ) revisited Order estimation A few facts about the estimators One can compute the asymptotic distribution of q (1) z n, q (2) z n. One can easily generalize the estimators to a very large family of estimators (weight/penalty functions). q z (1) n, q z (2) n turn out to be good preliminary estimators for AIC, BIC, SIC. They significantly outperform AIC, BIC, SIC in sparse models. 23 / 30

Special case: AR( ) revisited Numerical results n q AIC AIC* BIC BIC* MIC MIC* q y (5) n q x (5) n 500 < 5 1 1 177 75 15 15 86 52 5 3 3 9 11 3 3 17 14 6 730 713 805 874 892 867 865 849 7 108 108 8 8 57 57 0 2 < 7 158 175 1 32 33 58 32 83 1000 < 5 0 0 3 0 0 0 0 0 5 0 0 0 0 0 0 0 0 6 724 709 990 951 934 901 955 885 7 103 101 7 9 47 44 5 7 > 7 173 190 0 40 19 55 40 108 Table : Simulation of an AR(6) process with coefficients Θ 6 = (0.1, 0.3, 0.05, 0.2, 0.1, 0.2) T, ɛ N (0, 1), 1000 repetitions, d n {13, 14}. 24 / 30

Special case: AR( ) revisited Numerical results Numerical results n q AIC AIC* BIC BIC* MIC MIC* q y (5) n q x (5) n 125 < 5 719 699 998 854 839 787 854 747 5 11 11 0 0 7 7 0 11 6 168 181 2 124 107 145 124 184 7 44 44 0 4 23 24 4 8 < 7 58 65 0 18 24 37 18 50 250 < 5 290 276 960 437 550 396 438 321 5 6 6 0 3 5 5 3 5 6 491 488 39 513 376 494 513 573 7 91 90 1 2 40 40 1 7 > 7 122 140 0 45 29 65 45 94 Table : Simulation of an AR(6) process with coefficients Θ 6 = (0.1, 0, 0.05, 0, 0, 0.2) T, ɛ N (0, 1), 1000 repetitions, d n {10, 12}. 25 / 30

Special case: AR( ) revisited Numerical results Numerical results n q AIC AIC* BIC BIC* MIC MIC* q y (5) n q x (5) n 500 < 5 21 21 761 102 164 85 102 56 5 0 0 1 0 0 0 0 1 6 663 655 234 871 736 796 874 863 7 125 124 4 3 69 68 0 10 < 7 191 200 0 24 31 51 24 70 1000 < 5 0 0 168 1 1 1 1 0 5 0 0 0 0 0 0 0 0 6 702 683 822 949 919 887 955 898 7 121 119 9 9 52 52 3 9 > 7 177 198 1 41 28 60 41 93 Table : Simulation of an AR(6) process with coefficients Θ 6 = (0.1, 0, 0.05, 0, 0, 0.2) T, ɛ N (0, 1), 1000 repetitions, d n {13, 14}. 26 / 30

Special case: AR( ) revisited Numerical results n q AIC AIC* BIC BIC* MIC MIC* q y (5) n q x (5) n 125 < 10 884 853 1000 920 963 910 920 861 11 3 3 0 0 1 1 0 3 12 68 94 0 71 25 70 71 114 13 11 13 0 3 4 7 3 5 > 13 34 37 0 6 7 12 6 17 250 < 10 509 421 999 555 792 530 555 424 11 3 3 0 3 2 3 3 4 12 340 419 1 421 170 416 421 514 13 67 68 0 2 18 19 2 5 > 13 81 89 0 19 18 32 19 53 Table : Simulation of an AR(12) process with nonzero coefficients θ 1 = 0.1, θ 3 = 0.4, θ 12 = 0.2 ɛ N (0, 1), 1000 repetitions, d n {20, 23}. 27 / 30

Special case: AR( ) revisited Numerical results n q AIC AIC* BIC BIC* MIC MIC* q y (5) n q x (5) n 500 < 11 77 58 983 125 402 115 125 78 11 0 0 0 2 0 1 2 1 12 663 678 17 858 532 808 858 870 13 104 103 0 3 39 40 3 4 > 13 156 161 0 12 27 36 12 47 1000 < 11 0 0 689 2 35 2 2 2 11 0 0 0 0 0 0 0 0 12 706 701 307 971 893 907 972 936 13 124 123 2 2 54 53 1 3 > 13 170 176 2 25 18 38 25 59 Table : Simulation of an AR(12) process with nonzero coefficients θ 1 = 0.1, θ 3 = 0.4, θ 12 = 0.2 ɛ N (0, 1), 1000 repetitions, d n {25, 28}. 28 / 30

Special case: AR( ) revisited Numerical results Thank You for Your patience! 29 / 30

Special case: AR( ) revisited Numerical results Some references I. Berkes and W. Philipp. Approximation theorems for independent and weakly dependent random vectors. Ann. Probab., 7(1):29 54, 1979. T. Jiang. The asymptotic distributions of the largest entries of sample correlation matrices. Ann. Appl. Probab., 14(2):865 880, 2004. M. Jirak. Simultaneous confidence bands for yule-walker estimators and order selection. Annals of Statistics. 40(1): 494 528, 2012. M. Jirak. A darling-erdös type result for stationary ellipsoids. Stochastic processes and its applications, to appear. 30 / 30