The Functional Central Limit Theorem and Testing for Time Varying Parameters

NBER Summer Institute Minicourse What s New in Econometrics: ime Series Lecture : July 4, 008 he Functional Central Limit heorem and esting for ime Varying Parameters Lecture -, July, 008

Outline. FCL. Overview of VP topics (Models, esting, Estimation) 3. esting problems 4. ests Lecture -, July, 008

. FCL he Functional Central Limit heorem Problem: Suppose ε t ~ iid(0, σ ) (or weakly correlated with long-run variance σ ε ), x t = function of (x, x, x 3, x ), say ε t εi, and we need to approximate the distribution of a i= xt. t= t Solution: Notice x t = ε i= i = x t + ε t. his suggests an approximation based on a normally distributed (CL for first equality) random walk (second equality). he tool used for the approximation is the Functional Central Limit heorem. Lecture - 3, July, 008

Some familiar notions. Convergence in distribution or weak convergence : ξ, =,, is a sequence of random variables. ξ d ξ means that the probability distribution function (PDF) of ξ converges to the PDF of ξ. As a practical matter this means that we can approximate the PDF of ξ using the PDF of ξ when is large.. Central Limit heorem: Let ε t be a mds(0, σ ) with +δ moments and ξ = t ε. hen ξ d ξ ~ N(0, σ ε ). t= 3. Continuous mapping theorem. Let g be a continuous function and ξ d ξ, then g(ξ ) d g(ξ). (Example ξ is the usual t-statistic, and ξ d ξ ~ N(0, ), then d ξ ξ χ. ε hese ideas can be extended to random functions: Lecture - 4, July, 008

A Random Function: he Wiener Process, a continuous-time stochastic process sometimes called Standard Brownian Motion that will play the role of a standard normal in the relevant function space. Denote the process by W(s) defined on s [0,] with the following properties. W(0) = 0. For any dates 0 t < t < < t k, W(t ) W(t ), W(t 3 ) W(t 4 ),, W(t k ) W(t k ) are independent normally distributed random variables with W(t i ) W(t i ) ~ N(0, t i t i ). 3. Realizations of W(s) are continuous w.p.. From () and (), note that W() ~ N(0,). Lecture - 5, July, 008

Another Random Function: Suppose ε t ~ iidn(0,), t =,,, and let ξ (s) denote the function that linearly interpolates between the points ξ (t/) = t εi. i= Can we use W to approximate the probability law of ξ (s) if is large? More generally, we want to know whether the probability distibution of a random function can be well approximated by the PDF of another (perhaps simpler, maybe Gaussian) function when is large. Formally, we want to study weak convergence on function spaces. Useful References: Hall and Heyde (980), Davidson (994), Andrews (994) Lecture - 6, July, 008

Suppose we limit our attention to continuous functions on s [0,] (the space of such functions is denoted C[0,]), and we define the distance between two functions, say x and y as d(x,y) = sup 0 s x(s) y(s). hree key theorems (Hall and Heyde (980) and Davidson (994, part VI): Lecture - 7, July, 008

Important heorm : (Hall and Heyde heorem A.) Weak Convergence of random functions on C[0,] Weak convergence (denoted ξ ξ ) follows from (i) and (ii), where (i) Let 0 s < s < s k, a set of k points. Suppose that (ξ (s ), ξ (s ),, ξ (s k )) d (ξ(s ), ξ(s ),, ξ(s k )) for any set of k points, {s i }. (ii) he function ξ (s) is tight (or more generally satisfies stochastic equicontinuity as discussed in Andrews (994)), meaning (a) For each ε > 0, Prob[sup s t <δ ξ (s) ξ (t) > ε ] 0 as δ 0 uniformly in. (his says that the function ξ does not get too wild as grows.) (b) Prob[ ξ (0) > δ] 0 as δ uniformly in. (his says the function can t get too crazy at the origin ast grows.) Lecture - 8, July, 008

Important heorem : (Hall on Heyde heorem A.3) Continuous Mapping heorem Let g: C[0,] be a continuous function and suppose ξ (.) ξ(.). hen g(ξ ) g(ξ). Lecture - 9, July, 008

Important heorem 3: (Hall and Heyde) Functional Central Limit heorem: Suppose ε t ~ mds with variance σ ε and bounded +δ moments for some δ > 0. (a) Let ξ () s denote the function that linearly interpolates between the points ξ(t/) = (standard Brownian motion). t εi. hen ξ σ ε W, where W is a Wiener process i= s i (b) he results can be extended to ξ () s = [ ] ε, the step-function interpolation, where [. ] is the less than or equal to integer function (so that [3.] = 3, [3.0] = 3, [3.9999] = 3, and so forth). See Davidson Ch. 9 for extensions. i= Lecture - 0, July, 008

An Example: (): Let x t t = ε, where ε i is mds(0, σ ), and let i= i be a step function approximation of W(s). ε ξ [ s ] () s = εi = x[ s] i= hen t x () () 3 t = ε / i σε ξ 0 s ds σ W s ds / ε 0 t= = t= i= Lecture -, July, 008

Additional persistence Suppose a t = ε t θε t = θ(l)ε t, and x t = x t + a t. hen / x t = t t t / / / / i = εi θεi θ εi θ = + εt ε 0 i= i= i= a ( ) ( ) ( ) But θ / (ε t ε 0 ) is negligible, so that / x [s] ( θ)σ ε W(s). his generalizes: suppose a t = θ(l)ε t and i θi < (so that the MA coefficients are one-summable ), then / x [s] θ()σ ε W(s). i= 0 Note: θ()σ ε is the long-run standard deviation of a. Lecture -, July, 008

What does this all mean? Suppose I want to approximate the 95 th quantile of the distribution of, say, v = 3 xt. Because v / t= are the approximator. v = W() s ds, I can use the 95 th quantile of v σ ε 0 How do I find (or approximate) the 95 th quantile of v? Use Monte Carlo draws of very large. σ ε N 3/ N t zi where z i ~ iidn(0,) and N is t= i= his approximation works well when is reasonably large, and does not require knowledge of the distribution of x. Lecture - 3, July, 008

. Overview of VP topics Models Linear regression: y t = x t β t + ε t IV Regression (Linear GMM): E{z t (y t x t β t )} = 0 Nonlinear Model (GMM, NLLS, Stochastic Volatility, ) Simple model as leading case (variables are scalars): y t = β t + ε t β t is the local level ( mean ) of y t. Lecture - 4, July, 008

ime variation in β Discrete Breaks: Single Break: β t = β for t τ β + δ for t > τ wo Breaks: β for t τ βt = β + δ for τ < t τ β + δ + δ for t > τ Multiple Breaks:. Lecture - 5, July, 008

Stochastic Breaks/Markov Switching (Hamilton (989)): -Regime Model: β t = β when st = 0 β + δ when st =, s t follows a Markov process P(s t = i s t = j) = p ij Multiple Regimes Other Stochastic Breaks: Lecture - 6, July, 008

Continuous Evolution: Random Walk/Martingale: β t = β t + η t ARIMA Model: β t ~ ARIMA(p,d,q) In simple model: y t = β t + ε t, these are unobserved component models (Harvey (989), Nerlove, Grether and Carvalho (995)) with the familiar simple forecast functions y t+/t = ( θ) i θ + yt i. i= 0 Lecture - 7, July, 008

Relationship between models: () Discrete vs. Continuous Not very important (a) β t = β t + η t. If distribution of η has point mass at zero, this is a model with occasional discrete shifts in β. (b) Elliott and Müller (006). Optimal tests are very similar if number of breaks is large (say, 3). (c) Discrete breaks and long-memory (fractional process for β): Diebold and Inoue (00) and Davidson and Sibbertsen (005) Lecture - 8, July, 008

() Strongly mean-reverting or not can be more important β t ~ ARMA with small AR roots, β t ~ recurrent Markov Switching vs. β t ~ RW (or AR with large root), β t has breaks with little structure. (a) Difference important for forecasting (b) Not important for tracking (smoothing) (3) Deterministic vs. Stochastic Important for forecasting (we will return to this) Lecture - 9, July, 008

What does VP mean? Suppose Y t is a covariance stationary vector. hen subsets of Y are covariance stationary o Y t ~ ARMA, then subsets of Y are ~ ARMA (where order depends on the subset chosen). hus, finding VP in univariate or bivariate models indicates VP in larger models that include Y. ime variation in conditional mean or variance? o Φ(L)Y t = ε t, Φ = Φ t and/or Σ ε = Σ ε,t o Suppose Γ(L)X t = e t, and Y is a subset of X. hen n x C(L)Y t = Ai( Le ) i, t. Changes in the relative variances of e, i= will induce changes in both Φ and Σ ε in the marginal representation for Y. hus, finding Φ-VP in Y model does not imply Γ-VP in X model (but it does imply Γ and/or Σ e VP). it Lecture - 0, July, 008

Evidence on VP in Macro VARs o SW (996): 5700 Bivariate VARs involving Post-war U.S. macro series. Reject null of constant VAR coefficient (Φ) in over 50% of cases using tests with size = 0%. o Many others Volatility (Great Moderation) Lecture -, July, 008

. esting Problems ests for a break Model: y t = β t + ε t, where ε t ~ iid (0, σ ε ) β t = β for t τ β + δ for t > τ Null and alternative: H o : δ = 0 vs. H o : δ 0 ests for H o vs. H a depends on whether τ is known or unknown. Lecture -, July, 008

Chow ests (known break date) Least squares estimator of δ : δ = Y Y τ where Y = yt and Y t τ = = τ t = τ + y t Wald statistic: ξ W = σ δ ˆ ε ( τ + τ ) Follows from independent. ~ a ( σ e Y ) N β τ, and a σ ε Y ~ ( ) N β + δ, τ and they are Under H o ξ W is distributed as a χ random variable in large (τ and τ) samples. hus, critical values for the test can be determined from the χ distribution. Lecture - 3, July, 008

Quandt ests (Sup Wald or QLR) (unknown break date) Quandt (960) suggested computing the Chow statistic for a large number of possible values of τ and using the largest of these as the test statistics. QLR statistic: ξ = max ξ ( τ) Q τ τ τ W where the Chow statistic ξ W is now indexed by the break date. he problem is then to find the distribution of ξ Q under the null (it will not be χ ), so that the critical value for the test can be determined. Lecture - 4, July, 008

Let s = τ/. Under the null δ = 0, and (now using s as the index), we can then write ξ W as ξ a where () [ s ] Ho [[ s ] ε t [( s) ] ε ] t= t= [ s] + t W, () s = ˆ σ e [ s ] + [( s) ] W s = = ˆ σ [ s ] [ s ε t ( s) ε ] t= t= [ s] + t e s + ( s) [ W ( s) ( W () W ( s))] [ W ( s) sw ()] = = a a a a a s ( s) s + ( s) s( s) [ s ] ˆ σ ε t= ε, and the last equality follows from t multiplying the numerator and denominator by s ( s) and simplifying. hus, using FCL, ξ W, ( ) ξ( ), where ξ(s) = [ W( s) sw()] s( s). Lecture - 5, July, 008

Suppose that τ is chosen as [λ] and τ is chosen as [( λ)], where 0 < λ < 0.5. hen ξ Q= ξw, λ s ( λ) sup ( s), and ξ Q sup ξ ( s) λ s ( λ) It has become standard practice to use a value of λ = 0.5. Using this value of λ, the %, 5% and 0% critical values for the test are:.6, 8.68 and 7.. (hese can be compared to the corresponding critical values of the χ distribution of 6.63, 3.84 and.7). he results have been derived here for the case of a single constant regressor. Exensions to the case of multiple (non-constant) regressors can be found in Andrews (993) (Critical values for the test statistic are also given in Andrews (993) with corrections in Andrews (003), reprinted in Stock and Watson (006).) Lecture - 6, July, 008

Optimal ests when the break point (τ) is unknown he QLR test seems very sensible, but is there a more powerful procedure? Andrews and Ploberger (993) develop optimal tests (most powerful) for this (and related) problems. Recall the Neyman-Pearson (NP) lemma: consider two simple hypotheses H o : Y ~ f o (y) vs. H a : Y ~ f a (y), then the most powerful test rejects H o for large values of the Likelihood Ratio, LR = f a (Y)/f o (Y), the ratio of the densities evaluated at the realized value of the random variable. Here: likelihoods depend on parameters δ, β, σ ε, and τ, where δ = 0 under H o ). (δ, β, σ ε ) are easily handled in the NP framework. τ is more of a problem. It is unknown, and if H o is true, it is irrelevant (τ is unidentified under the null). Andrews and Ploberger (AP) attack this problem. Lecture - 7, July, 008

One way to motivate the AP approach: suppose τ is a random variable with a known distribution, say F τ. hen the density of Y is a mixture: Y ~ f a (y) where f a (y) = E τ [ f a (y τ)]. he LR test (ignoring (δ, β, σ ε ) for convenience) is then LR = E τ [ f a (y τ)]/f o (Y). he interpretation of this test is (equivalently) that it is (i) the most powerful for τ ~ F τ, or (ii) it maximizes F τ -weighted power for fixed values of τ. his approach to dealing with nuisance parameters (here τ) that are unidentified under H o is now standard. Lecture - 8, July, 008

he specific form of the test depends on the weight function F τ, (AP suggest a uniform distribution on τ) and how large δ is assumed to be under the alternative. When δ is small, the test statistic turns out to be simple average of ξ W (τ) over all possible break dates. Sometimes this test statistic is called the Mean Wald statistic. When δ is large, the test statistic turns out to be a simple average of exp(0.5 ξ W (τ)) over all possible break dates. Sometimes this test statistic is called the Exponential Wald statistic. Importantly, as it turns out, the AP exponential Wald test statistic is typically dominated by the largest values of ξ W (τ). his means the QLR statistic behaves very much like the exponential Wald test statistic and is, in this sense, essentially an optimal test. Lecture - 9, July, 008

ests for martingale time variation Write the model as y t = β t + ε t with β t = β t + γe t where e t is iidn(0, σ ε ) and is independent of ε i for all t and i. (As a normalization, the variances of ε and e are assumed to be equal.) For simplicity suppose that β 0 = 0. (Non-zero values are handled by restricting tests to be invariant to adding a constant to the data.) Let Y = (y, y ), so that Y ~ N(0, σ ε Ω(γ)), where Ω(γ) = I + γ A, where A = [a ij ] with a ij = min(i,j). Lecture - 30, July, 008

From King (980), the optimal test of H o : γ = 0 vs. H a : γ = γ a, can be constructed using the likelihood ratio statistic. he LR statistic is given by LR = Ω γ Y Ω γ Y Y Y ( a) exp{ [ ( ) ]} a σ ε so that the test rejects the null for large values of Y Ω( γ ) YY a Y. Optimal tests require a choice of γ a, which measures the amount of time variation under the alternative. A common way to choose γ a is to use a value so that, whenγ = γ a, the test has a pre-specified power, often 50%. Generally, this test (called a point optimal or point optimal invariant test) has near optimal power for a wide range of values of γ. A good rule of thumb (from Stock-Watson (998) is to set γ a = 7/.) Lecture - 3, July, 008

A well known version of this test uses the local ( γ a small) approximation Ω ( γ ) = [ I + γ A] γ A. a a a In this case, the test rejects for large values of ψ = YAY YY which is a version of the locally best test of Nyblom (989). Because A=PP where P is a lower triangular matrix of s, the test QQ statistic can be written as ψ =, where Q=P Y (so that q t = y i= t i). he YY statistic can then be written as pt t= t= i= t ψ y t t y = t= t ( yi ) = =. Lecture - 3, July, 008

o derive the distribution of the statistic under the null, write (under the null) y t = β 0 + ε t, and [ ] i t= ε ξ () ( ) Ho y = t= i= t i ψ = ( () ( )) 0 y t= t W W s ds t i, where ξ(s) = σ ε W(s). hus In most empirical applications, β 0 is non-zero and unknown, and in this case the test statistic is = t= i= t ψ ˆ ε t= t ( ˆ ε ) = case, one can show that i., where ε = y ˆ β is the OLS residual. In this H o ˆt = ψ ( W ( s ) sw ()) ds 0 (In Lecture 6, I ll denote ψ by ξ Nyblom because it is a version of the Nyblom test statistic.) t Lecture - 33, July, 008

Regressors: Example studied above: y t = β t + ε t Regressors: y t = x t β t + ε t Heuristic with β t = β t + γη t, β 0 = 0 x t y t = x t x t β t + x t ε t = Σ xx β t + x t ε t + (x t x t Σ xx )β t est statistic depends on = Σ xx β t + e t + m t β t t t t t / / / / t t =Σ xx βt + t + t t i= i= i= i= y x e mβ With m t (x) well-behaved, the final term is negligible. Hansen (000) studies the effect of changes in the x process on standard VP tests. Lecture - 34, July, 008

HAC Corrections: y t = x t β t + ε t where Var(x t ε t ) σ ε Σ XX. Lecture - 35, July, 008

Power Comparisons of ests. Elliott-Müller (006): Discrete Break DGP: y t = μ t + αx t + ε t, where μ t follows a discrete break process. ests for martingale variation (qll, Ny) have power that is similar to tests for discrete break (SupF=QLR, AP). Lecture - 36, July, 008

. Stock-Watson (998): Martingale VP DGP: y t = β t + ε t ests for a discrete break (QLR, EW, MW) have power that is similar to tests for martingale variation (L, POI) Lecture - 37, July, 008

Summary.FCL (tool).overview of VP topics a. Persistent vs. mean reverting VP 3.esting problems 4.ests a. Discrete break and AP approach a. Little difference between tests for discrete break and persistent continuous variation. What do you conclude when the null of stability is rejected? Lecture - 38, July, 008