The Functional Central Limit Theorem and Testing for Time Varying Parameters

Similar documents
The Kalman filter, Nonlinear filtering, and Markov Chain Monte Carlo

More Empirical Process Theory

Testing for Anomalous Periods in Time Series Data. Graham Elliott

Heteroskedasticity and Autocorrelation Consistent Standard Errors

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Lectures on Structural Change

Long-Run Covariability

Central Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Structural Breaks October 29-31, / 91. Bruce E.

1 Linear Difference Equations

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Financial Econometrics / 49

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

ECON 616: Lecture Two: Deterministic Trends, Nonstationary Processes

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Fin. Econometrics / 53

Switching Regime Estimation

E 4101/5101 Lecture 9: Non-stationarity

Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017

Understanding Regressions with Observations Collected at High Frequency over Long Span

Economic modelling and forecasting

On the Power of Tests for Regime Switching

Non-Stationary Time Series and Unit Root Testing

Efficient Tests for General Persistent Time Variation in Regression Coefficients 1

VAR-based Granger-causality Test in the Presence of Instabilities

Markov-Switching Models with Endogenous Explanatory Variables. Chang-Jin Kim 1

Multivariate Time Series: VAR(p) Processes and Models

Discrete time processes

Testing for a Trend with Persistent Errors

ECON 4160, Spring term Lecture 12

Non-Stationary Time Series and Unit Root Testing

Time-Varying Parameters

Comment on HAC Corrections for Strongly Autocorrelated Time Series by Ulrich K. Müller

Extreme Values and Local Power for KPSS Tests

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

Non-nested model selection. in unstable environments

ARIMA Modelling and Forecasting

Basic concepts and terminology: AR, MA and ARMA processes

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

E 4160 Autumn term Lecture 9: Deterministic trends vs integrated series; Spurious regression; Dickey-Fuller distribution and test

Lecture 2: Univariate Time Series

Simulating Properties of the Likelihood Ratio Test for a Unit Root in an Explosive Second Order Autoregression

Bayesian Inference for DSGE Models. Lawrence J. Christiano

ECONOMICS 7200 MODERN TIME SERIES ANALYSIS Econometric Theory and Applications

A nonparametric test for seasonal unit roots

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

1 Procedures robust to weak instruments

GARCH Models Estimation and Inference

Cointegrated VAR s. Eduardo Rossi University of Pavia. November Rossi Cointegrated VAR s Financial Econometrics / 56

Lecture 3 Stationary Processes and the Ergodic LLN (Reference Section 2.2, Hayashi)

LM threshold unit root tests

Efficiency Tradeoffs in Estimating the Linear Trend Plus Noise Model. Abstract

ECON 616: Lecture 1: Time Series Basics

GARCH Models Estimation and Inference. Eduardo Rossi University of Pavia

11. Further Issues in Using OLS with TS Data

Lecture 6: Univariate Volatility Modelling: ARCH and GARCH Models

Why Segregating Cointegration Test?

Class: Trend-Cycle Decomposition

Non-Stationary Time Series and Unit Root Testing

Dynamic Regression Models (Lect 15)

Univariate Unit Root Process (May 14, 2018)

Trending Models in the Data

Estimation and Inference on Dynamic Panel Data Models with Stochastic Volatility

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I.

Econometrics II - EXAM Outline Solutions All questions have 25pts Answer each question in separate sheets

Financial Econometrics

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Booth School of Business, University of Chicago Business 41914, Spring Quarter 2017, Mr. Ruey S. Tsay. Solutions to Midterm

Testing for Regime Switching: A Comment

Long memory and changing persistence

Short Questions (Do two out of three) 15 points each

Time Series and Forecasting Lecture 4 NonLinear Time Series

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Vector Auto-Regressive Models

Class 1: Stationary Time Series Analysis

Department of Economics, Vanderbilt University While it is known that pseudo-out-of-sample methods are not optimal for

VAR Models and Applications

Chapter 2: Unit Roots

Estimation of Dynamic Regression Models

EASTERN MEDITERRANEAN UNIVERSITY ECON 604, FALL 2007 DEPARTMENT OF ECONOMICS MEHMET BALCILAR ARIMA MODELS: IDENTIFICATION

Modeling conditional distributions with mixture models: Applications in finance and financial decision-making

LECTURE 5 HYPOTHESIS TESTING

Consider the trend-cycle decomposition of a time series y t

Dynamic Discrete Choice Structural Models in Empirical IO

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

11. Bootstrap Methods

Bayesian Inference for DSGE Models. Lawrence J. Christiano

When is a copula constant? A test for changing relationships

Bayesian Semiparametric GARCH Models

Econometrics of Panel Data

Weak convergence and Brownian Motion. (telegram style notes) P.J.C. Spreij

1. Stochastic Processes and filtrations

The loss function and estimating equations

Economics 583: Econometric Theory I A Primer on Asymptotics

Bayesian Semiparametric GARCH Models

X t = a t + r t, (7.1)

Non-Stationary Time Series, Cointegration, and Spurious Regression

Volatility. Gerald P. Dwyer. February Clemson University

Trend-Cycle Decompositions

Empirical Market Microstructure Analysis (EMMA)

Using all observations when forecasting under structural breaks

Transcription:

NBER Summer Institute Minicourse What s New in Econometrics: ime Series Lecture : July 4, 008 he Functional Central Limit heorem and esting for ime Varying Parameters Lecture -, July, 008

Outline. FCL. Overview of VP topics (Models, esting, Estimation) 3. esting problems 4. ests Lecture -, July, 008

. FCL he Functional Central Limit heorem Problem: Suppose ε t ~ iid(0, σ ) (or weakly correlated with long-run variance σ ε ), x t = function of (x, x, x 3, x ), say ε t εi, and we need to approximate the distribution of a i= xt. t= t Solution: Notice x t = ε i= i = x t + ε t. his suggests an approximation based on a normally distributed (CL for first equality) random walk (second equality). he tool used for the approximation is the Functional Central Limit heorem. Lecture - 3, July, 008

Some familiar notions. Convergence in distribution or weak convergence : ξ, =,, is a sequence of random variables. ξ d ξ means that the probability distribution function (PDF) of ξ converges to the PDF of ξ. As a practical matter this means that we can approximate the PDF of ξ using the PDF of ξ when is large.. Central Limit heorem: Let ε t be a mds(0, σ ) with +δ moments and ξ = t ε. hen ξ d ξ ~ N(0, σ ε ). t= 3. Continuous mapping theorem. Let g be a continuous function and ξ d ξ, then g(ξ ) d g(ξ). (Example ξ is the usual t-statistic, and ξ d ξ ~ N(0, ), then d ξ ξ χ. ε hese ideas can be extended to random functions: Lecture - 4, July, 008

A Random Function: he Wiener Process, a continuous-time stochastic process sometimes called Standard Brownian Motion that will play the role of a standard normal in the relevant function space. Denote the process by W(s) defined on s [0,] with the following properties. W(0) = 0. For any dates 0 t < t < < t k, W(t ) W(t ), W(t 3 ) W(t 4 ),, W(t k ) W(t k ) are independent normally distributed random variables with W(t i ) W(t i ) ~ N(0, t i t i ). 3. Realizations of W(s) are continuous w.p.. From () and (), note that W() ~ N(0,). Lecture - 5, July, 008

Another Random Function: Suppose ε t ~ iidn(0,), t =,,, and let ξ (s) denote the function that linearly interpolates between the points ξ (t/) = t εi. i= Can we use W to approximate the probability law of ξ (s) if is large? More generally, we want to know whether the probability distibution of a random function can be well approximated by the PDF of another (perhaps simpler, maybe Gaussian) function when is large. Formally, we want to study weak convergence on function spaces. Useful References: Hall and Heyde (980), Davidson (994), Andrews (994) Lecture - 6, July, 008

Suppose we limit our attention to continuous functions on s [0,] (the space of such functions is denoted C[0,]), and we define the distance between two functions, say x and y as d(x,y) = sup 0 s x(s) y(s). hree key theorems (Hall and Heyde (980) and Davidson (994, part VI): Lecture - 7, July, 008

Important heorm : (Hall and Heyde heorem A.) Weak Convergence of random functions on C[0,] Weak convergence (denoted ξ ξ ) follows from (i) and (ii), where (i) Let 0 s < s < s k, a set of k points. Suppose that (ξ (s ), ξ (s ),, ξ (s k )) d (ξ(s ), ξ(s ),, ξ(s k )) for any set of k points, {s i }. (ii) he function ξ (s) is tight (or more generally satisfies stochastic equicontinuity as discussed in Andrews (994)), meaning (a) For each ε > 0, Prob[sup s t <δ ξ (s) ξ (t) > ε ] 0 as δ 0 uniformly in. (his says that the function ξ does not get too wild as grows.) (b) Prob[ ξ (0) > δ] 0 as δ uniformly in. (his says the function can t get too crazy at the origin ast grows.) Lecture - 8, July, 008

Important heorem : (Hall on Heyde heorem A.3) Continuous Mapping heorem Let g: C[0,] be a continuous function and suppose ξ (.) ξ(.). hen g(ξ ) g(ξ). Lecture - 9, July, 008

Important heorem 3: (Hall and Heyde) Functional Central Limit heorem: Suppose ε t ~ mds with variance σ ε and bounded +δ moments for some δ > 0. (a) Let ξ () s denote the function that linearly interpolates between the points ξ(t/) = (standard Brownian motion). t εi. hen ξ σ ε W, where W is a Wiener process i= s i (b) he results can be extended to ξ () s = [ ] ε, the step-function interpolation, where [. ] is the less than or equal to integer function (so that [3.] = 3, [3.0] = 3, [3.9999] = 3, and so forth). See Davidson Ch. 9 for extensions. i= Lecture - 0, July, 008

An Example: (): Let x t t = ε, where ε i is mds(0, σ ), and let i= i be a step function approximation of W(s). ε ξ [ s ] () s = εi = x[ s] i= hen t x () () 3 t = ε / i σε ξ 0 s ds σ W s ds / ε 0 t= = t= i= Lecture -, July, 008

Additional persistence Suppose a t = ε t θε t = θ(l)ε t, and x t = x t + a t. hen / x t = t t t / / / / i = εi θεi θ εi θ = + εt ε 0 i= i= i= a ( ) ( ) ( ) But θ / (ε t ε 0 ) is negligible, so that / x [s] ( θ)σ ε W(s). his generalizes: suppose a t = θ(l)ε t and i θi < (so that the MA coefficients are one-summable ), then / x [s] θ()σ ε W(s). i= 0 Note: θ()σ ε is the long-run standard deviation of a. Lecture -, July, 008

What does this all mean? Suppose I want to approximate the 95 th quantile of the distribution of, say, v = 3 xt. Because v / t= are the approximator. v = W() s ds, I can use the 95 th quantile of v σ ε 0 How do I find (or approximate) the 95 th quantile of v? Use Monte Carlo draws of very large. σ ε N 3/ N t zi where z i ~ iidn(0,) and N is t= i= his approximation works well when is reasonably large, and does not require knowledge of the distribution of x. Lecture - 3, July, 008

. Overview of VP topics Models Linear regression: y t = x t β t + ε t IV Regression (Linear GMM): E{z t (y t x t β t )} = 0 Nonlinear Model (GMM, NLLS, Stochastic Volatility, ) Simple model as leading case (variables are scalars): y t = β t + ε t β t is the local level ( mean ) of y t. Lecture - 4, July, 008

ime variation in β Discrete Breaks: Single Break: β t = β for t τ β + δ for t > τ wo Breaks: β for t τ βt = β + δ for τ < t τ β + δ + δ for t > τ Multiple Breaks:. Lecture - 5, July, 008

Stochastic Breaks/Markov Switching (Hamilton (989)): -Regime Model: β t = β when st = 0 β + δ when st =, s t follows a Markov process P(s t = i s t = j) = p ij Multiple Regimes Other Stochastic Breaks: Lecture - 6, July, 008

Continuous Evolution: Random Walk/Martingale: β t = β t + η t ARIMA Model: β t ~ ARIMA(p,d,q) In simple model: y t = β t + ε t, these are unobserved component models (Harvey (989), Nerlove, Grether and Carvalho (995)) with the familiar simple forecast functions y t+/t = ( θ) i θ + yt i. i= 0 Lecture - 7, July, 008

Relationship between models: () Discrete vs. Continuous Not very important (a) β t = β t + η t. If distribution of η has point mass at zero, this is a model with occasional discrete shifts in β. (b) Elliott and Müller (006). Optimal tests are very similar if number of breaks is large (say, 3). (c) Discrete breaks and long-memory (fractional process for β): Diebold and Inoue (00) and Davidson and Sibbertsen (005) Lecture - 8, July, 008

() Strongly mean-reverting or not can be more important β t ~ ARMA with small AR roots, β t ~ recurrent Markov Switching vs. β t ~ RW (or AR with large root), β t has breaks with little structure. (a) Difference important for forecasting (b) Not important for tracking (smoothing) (3) Deterministic vs. Stochastic Important for forecasting (we will return to this) Lecture - 9, July, 008

What does VP mean? Suppose Y t is a covariance stationary vector. hen subsets of Y are covariance stationary o Y t ~ ARMA, then subsets of Y are ~ ARMA (where order depends on the subset chosen). hus, finding VP in univariate or bivariate models indicates VP in larger models that include Y. ime variation in conditional mean or variance? o Φ(L)Y t = ε t, Φ = Φ t and/or Σ ε = Σ ε,t o Suppose Γ(L)X t = e t, and Y is a subset of X. hen n x C(L)Y t = Ai( Le ) i, t. Changes in the relative variances of e, i= will induce changes in both Φ and Σ ε in the marginal representation for Y. hus, finding Φ-VP in Y model does not imply Γ-VP in X model (but it does imply Γ and/or Σ e VP). it Lecture - 0, July, 008

Evidence on VP in Macro VARs o SW (996): 5700 Bivariate VARs involving Post-war U.S. macro series. Reject null of constant VAR coefficient (Φ) in over 50% of cases using tests with size = 0%. o Many others Volatility (Great Moderation) Lecture -, July, 008

. esting Problems ests for a break Model: y t = β t + ε t, where ε t ~ iid (0, σ ε ) β t = β for t τ β + δ for t > τ Null and alternative: H o : δ = 0 vs. H o : δ 0 ests for H o vs. H a depends on whether τ is known or unknown. Lecture -, July, 008

Chow ests (known break date) Least squares estimator of δ : δ = Y Y τ where Y = yt and Y t τ = = τ t = τ + y t Wald statistic: ξ W = σ δ ˆ ε ( τ + τ ) Follows from independent. ~ a ( σ e Y ) N β τ, and a σ ε Y ~ ( ) N β + δ, τ and they are Under H o ξ W is distributed as a χ random variable in large (τ and τ) samples. hus, critical values for the test can be determined from the χ distribution. Lecture - 3, July, 008

Quandt ests (Sup Wald or QLR) (unknown break date) Quandt (960) suggested computing the Chow statistic for a large number of possible values of τ and using the largest of these as the test statistics. QLR statistic: ξ = max ξ ( τ) Q τ τ τ W where the Chow statistic ξ W is now indexed by the break date. he problem is then to find the distribution of ξ Q under the null (it will not be χ ), so that the critical value for the test can be determined. Lecture - 4, July, 008

Let s = τ/. Under the null δ = 0, and (now using s as the index), we can then write ξ W as ξ a where () [ s ] Ho [[ s ] ε t [( s) ] ε ] t= t= [ s] + t W, () s = ˆ σ e [ s ] + [( s) ] W s = = ˆ σ [ s ] [ s ε t ( s) ε ] t= t= [ s] + t e s + ( s) [ W ( s) ( W () W ( s))] [ W ( s) sw ()] = = a a a a a s ( s) s + ( s) s( s) [ s ] ˆ σ ε t= ε, and the last equality follows from t multiplying the numerator and denominator by s ( s) and simplifying. hus, using FCL, ξ W, ( ) ξ( ), where ξ(s) = [ W( s) sw()] s( s). Lecture - 5, July, 008

Suppose that τ is chosen as [λ] and τ is chosen as [( λ)], where 0 < λ < 0.5. hen ξ Q= ξw, λ s ( λ) sup ( s), and ξ Q sup ξ ( s) λ s ( λ) It has become standard practice to use a value of λ = 0.5. Using this value of λ, the %, 5% and 0% critical values for the test are:.6, 8.68 and 7.. (hese can be compared to the corresponding critical values of the χ distribution of 6.63, 3.84 and.7). he results have been derived here for the case of a single constant regressor. Exensions to the case of multiple (non-constant) regressors can be found in Andrews (993) (Critical values for the test statistic are also given in Andrews (993) with corrections in Andrews (003), reprinted in Stock and Watson (006).) Lecture - 6, July, 008

Optimal ests when the break point (τ) is unknown he QLR test seems very sensible, but is there a more powerful procedure? Andrews and Ploberger (993) develop optimal tests (most powerful) for this (and related) problems. Recall the Neyman-Pearson (NP) lemma: consider two simple hypotheses H o : Y ~ f o (y) vs. H a : Y ~ f a (y), then the most powerful test rejects H o for large values of the Likelihood Ratio, LR = f a (Y)/f o (Y), the ratio of the densities evaluated at the realized value of the random variable. Here: likelihoods depend on parameters δ, β, σ ε, and τ, where δ = 0 under H o ). (δ, β, σ ε ) are easily handled in the NP framework. τ is more of a problem. It is unknown, and if H o is true, it is irrelevant (τ is unidentified under the null). Andrews and Ploberger (AP) attack this problem. Lecture - 7, July, 008

One way to motivate the AP approach: suppose τ is a random variable with a known distribution, say F τ. hen the density of Y is a mixture: Y ~ f a (y) where f a (y) = E τ [ f a (y τ)]. he LR test (ignoring (δ, β, σ ε ) for convenience) is then LR = E τ [ f a (y τ)]/f o (Y). he interpretation of this test is (equivalently) that it is (i) the most powerful for τ ~ F τ, or (ii) it maximizes F τ -weighted power for fixed values of τ. his approach to dealing with nuisance parameters (here τ) that are unidentified under H o is now standard. Lecture - 8, July, 008

he specific form of the test depends on the weight function F τ, (AP suggest a uniform distribution on τ) and how large δ is assumed to be under the alternative. When δ is small, the test statistic turns out to be simple average of ξ W (τ) over all possible break dates. Sometimes this test statistic is called the Mean Wald statistic. When δ is large, the test statistic turns out to be a simple average of exp(0.5 ξ W (τ)) over all possible break dates. Sometimes this test statistic is called the Exponential Wald statistic. Importantly, as it turns out, the AP exponential Wald test statistic is typically dominated by the largest values of ξ W (τ). his means the QLR statistic behaves very much like the exponential Wald test statistic and is, in this sense, essentially an optimal test. Lecture - 9, July, 008

ests for martingale time variation Write the model as y t = β t + ε t with β t = β t + γe t where e t is iidn(0, σ ε ) and is independent of ε i for all t and i. (As a normalization, the variances of ε and e are assumed to be equal.) For simplicity suppose that β 0 = 0. (Non-zero values are handled by restricting tests to be invariant to adding a constant to the data.) Let Y = (y, y ), so that Y ~ N(0, σ ε Ω(γ)), where Ω(γ) = I + γ A, where A = [a ij ] with a ij = min(i,j). Lecture - 30, July, 008

From King (980), the optimal test of H o : γ = 0 vs. H a : γ = γ a, can be constructed using the likelihood ratio statistic. he LR statistic is given by LR = Ω γ Y Ω γ Y Y Y ( a) exp{ [ ( ) ]} a σ ε so that the test rejects the null for large values of Y Ω( γ ) YY a Y. Optimal tests require a choice of γ a, which measures the amount of time variation under the alternative. A common way to choose γ a is to use a value so that, whenγ = γ a, the test has a pre-specified power, often 50%. Generally, this test (called a point optimal or point optimal invariant test) has near optimal power for a wide range of values of γ. A good rule of thumb (from Stock-Watson (998) is to set γ a = 7/.) Lecture - 3, July, 008

A well known version of this test uses the local ( γ a small) approximation Ω ( γ ) = [ I + γ A] γ A. a a a In this case, the test rejects for large values of ψ = YAY YY which is a version of the locally best test of Nyblom (989). Because A=PP where P is a lower triangular matrix of s, the test QQ statistic can be written as ψ =, where Q=P Y (so that q t = y i= t i). he YY statistic can then be written as pt t= t= i= t ψ y t t y = t= t ( yi ) = =. Lecture - 3, July, 008

o derive the distribution of the statistic under the null, write (under the null) y t = β 0 + ε t, and [ ] i t= ε ξ () ( ) Ho y = t= i= t i ψ = ( () ( )) 0 y t= t W W s ds t i, where ξ(s) = σ ε W(s). hus In most empirical applications, β 0 is non-zero and unknown, and in this case the test statistic is = t= i= t ψ ˆ ε t= t ( ˆ ε ) = case, one can show that i., where ε = y ˆ β is the OLS residual. In this H o ˆt = ψ ( W ( s ) sw ()) ds 0 (In Lecture 6, I ll denote ψ by ξ Nyblom because it is a version of the Nyblom test statistic.) t Lecture - 33, July, 008

Regressors: Example studied above: y t = β t + ε t Regressors: y t = x t β t + ε t Heuristic with β t = β t + γη t, β 0 = 0 x t y t = x t x t β t + x t ε t = Σ xx β t + x t ε t + (x t x t Σ xx )β t est statistic depends on = Σ xx β t + e t + m t β t t t t t / / / / t t =Σ xx βt + t + t t i= i= i= i= y x e mβ With m t (x) well-behaved, the final term is negligible. Hansen (000) studies the effect of changes in the x process on standard VP tests. Lecture - 34, July, 008

HAC Corrections: y t = x t β t + ε t where Var(x t ε t ) σ ε Σ XX. Lecture - 35, July, 008

Power Comparisons of ests. Elliott-Müller (006): Discrete Break DGP: y t = μ t + αx t + ε t, where μ t follows a discrete break process. ests for martingale variation (qll, Ny) have power that is similar to tests for discrete break (SupF=QLR, AP). Lecture - 36, July, 008

. Stock-Watson (998): Martingale VP DGP: y t = β t + ε t ests for a discrete break (QLR, EW, MW) have power that is similar to tests for martingale variation (L, POI) Lecture - 37, July, 008

Summary.FCL (tool).overview of VP topics a. Persistent vs. mean reverting VP 3.esting problems 4.ests a. Discrete break and AP approach a. Little difference between tests for discrete break and persistent continuous variation. What do you conclude when the null of stability is rejected? Lecture - 38, July, 008