NASMES 2008 June 21, 2008 Carnegie Mellon U. Cointegrating Regressions with Messy Regressors: Missingness, Mixed Frequency, and Measurement Error J. Isaac Miller University of Missouri 1
Messy Data Example I: Missing or Mixed Frequency Data Possible sources: data collection (unbalanced panels, historical data, different sources, e.g.), temporal aggregation, etc. Regression/Interpolation Solutions: Friedman (1962), Chow and Lin (1971, 1976) Kalman Filter Solutions: Harvey and Pierse (1984), Kohn and Ansley (1986), Gomez and Maravall (1994), Harvey et al. (1998), Mariano and Murasawa (2003), Seong et al. (2007) Frequency Domain Solutions: Ghysels et al. (2004) Messy Data Example II: Measurement Error Classical Measurement Error: Fuller (1979), Grilliches (1986) Cointegration and Stationary Measurement Error: Fisher (1990) Non-classical Measurement Error: Bound et al. (2001) J. Isaac Miller 2
Outline of the Presentation I. Motivating Example: Lerping Mixed Frequency Data Linear Interpolation, Messy-Data Noise II. Cointegrating Regression without Messy Data Cointegrating Regression, Basic Assumptions, Estimation III. Cointegrating Regression with Messy Data Messy-Data Noise Specification, Estimation IV. Lerp Revisited Small Sample Results J. Isaac Miller 3
I. Motivating Example: Lerping Mixed Frequency Data Mixed Frequency Setting For a time series of interest (x t ), let the subseries of regularly observed data be (x τp ) for p = 1,...,l and the subseries of unobserved data (for each interval p) be (x τp +j) for j = 1,...,m. Specifically, there are m < missing observations in each of l intervals. The sample size (at the highest frequency) is n = l(m + 1). E.g., 10 years of quarterly data are observed, but monthly data are interpolated. Then m = 2, l = 40, and n = 120. E.g., 10 years of annual data are observed, but monthly data are interpolated. Then m = 11, l = 10, and n = 120. Subsequent theoretical results should hold for more complicated patterns of missingness or irregular frequencies (e.g., weekends and holidays in daily data). J. Isaac Miller 4
Interpolating an I(0) and an I(1) Series J. Isaac Miller 5
Linear Interpolation (Lerp) and Messy-Data Noise Interpolating the unobserved subseries (x τp +j) yields an interpolated series given by ( x τ p 1 +j = 1 j ) x τp 1 + j m + 1 m + 1 x τ p for j = 1,...,m and simply (x τp ) otherwise. The messy-data noise (interpolation error, in this case) is given by z τ p 1 +j x τ p 1 +j x τp 1 +j = j m + 1 for j = 1,...,m and zero otherwise. m+1 i=1 x τ p 1 +i j Note the explicit nonstationary dependence in the messy-data noise. i=1 x τ p 1 +i J. Isaac Miller 6
II. Cointegrating Regression without Messy Data Regression model y t = β x t + v t (x t ) : nonsingular r-dimensional I(1) series (v t ) : one-dimensional I(0) series of unobservable disturbances (y t, x t) : cointegrated with a cointegrating vector (1, β ) Generalizations Explicitly Considered in the Paper The model may include additional stationary regressors, as long as these are orthogonal to the disturbances. The I(1) regressors may be cointegrated, as long as β is not a cointegrating vector. E.g., all series may have a common stochastic trend. (Estimation will be slightly different in this case.) J. Isaac Miller 7
Series and Covariance Notation Define the R 1 + r-dimensional vector b t (v t, x t) with contemporaneous variance Σ bb, long-run variance Ω bb, and one sided long-run variance bb, such that Ω bb = bb + bb Σ bb. Standard Assumptions Stationarity: (b t ) is an R-dimensional mean-zero series that is stationary and α-mixing of size a with a > 1. Initial Condition: Either x 0 = 0 or x 0 = O p (1) and independent of (v t ). Invariance Principle: with variance Ω bb > 0. B n (s) 1 [ns] n t=1 b t d B (s) (V (s), Q(s) ) J. Isaac Miller 8
Feasible Estimation Using a Canonical Cointegrating Regression (CCR) The estimation techniques FM-OLS and CCR developed by Phillips and Hansen (1990) and Park (1992), respectively, were designed to handle such a model. CCR works as follows: [1] Estimate β using least squares. [2] Estimate Ω, Σ, and. Use observable ( x t ) and sample analog (ˆv t ) of unobservable (v t ) to get consistent estimates. Standard nonparametric techniques estimate all variances consistently. [3] Create (x t ) and (y t ). Let bx (δ xv, xx) denote the one-sided long-run covariance between (b t ) and (x t ). Let hats denote consistent estimates from [1], [2]. x t yt x t ˆ bxˆσ 1ˆb bb t y t ˆβ ˆ bxˆσ 1ˆb bb t ˆω vx ˆΩ 1 xx x t [4] Re-Estimate β. Least squares on y t = β x t + v t provides consistent and asymptotically normal estimates. J. Isaac Miller 9
III. A Cointegrating Regression with Messy Data Messy Regressors Let (x t ) be observed with noise such that x t = x t + zt where (x t) is observed (or interpolated, e.g.) and (zt ) is messy-data noise. Specifically, the observable series may be distorted by interpolation/imputation error, possibly nonstationary measurement error, error from temporal aggregation, etc. A feasible version of the original system is y t = β x t + vt where vt v t β zt. Although (y t ) is cointegrated with (x t ) by assumption, (y t ) is generally not cointegrated with (x t), since the messy-data noise may be explicitly nonstationary (from linear interpolation, e.g.). J. Isaac Miller 10
More Realistic Paradigm than Stationarity Near-epoch Dependence. A sequence (z t ) is near epoch-dependent in L 2 -norm (L 2 -NED) of size λ on a stochastic sequence (η t ) if z t E ( ) z t F t+k 2 t K d t ν K where ν K 0 as K such that ν K = O ( K λ ε) for ε > 0, d t is a sequence of positive constants, and F t+k t K σ (η t K,...,η t+k ) is the σ-field defined by The definition and concept of NED sequences have origins in works by Ibragimov (1962), Billingsley (1968), McLeish (1975), and others. Some useful limit theory for NED sequences has been derived by Davidson (1994), Davidson and de Jong (1997), de Jong (1997), de Jong and Davidson (2000), and others. Under general assumptions, asymptotic results for cointegrating regressions (using a CCR, e.g.) still hold, even though the series are no longer cointegrated in the strict sense. J. Isaac Miller 11
Messy-Data Noise: First Alternative Set of Assumptions (z t ) is a vector of L 2 -NED sequences of size 1 on (b t ), w.r.t. bdd. constants (d t ) sup t z t 4a/(a 1) < Ez t = 0 Ez t z t k Σt (k) < for all t, Σ (k) n 1 n t=k+1 Σt (k) < Ez t b t k Σt b (k) < for all t, Σ b (k) n 1 n t=k+1 Σt (k) < EZ nz n (s) Ω (s) with Ω (s) <, where Z n (s) 1 n [ns] t=1 z t with long-run variance Ω EZ nq n (s) Ω q (s) with Ω q < J. Isaac Miller 12
Messy-Data Noise: Second Alternative Set of Assumptions (z t ) is a vector of L 2 -NED sequences of size 1 on (b t ), w.r.t. bdd. constants (d t ) sup t z t 2a/(a 1) < n 1 z t = o p (1) n 1 (z t z t Σ ) = o p (1) n 1 (z t b t Σ b ) = o p (1) n 1/2 z t = O p (1) n 1 x t z t d QdZ (s) + x J. Isaac Miller 13
Consistency of Least Squares n(ˆβ LS β) d ( QQ ) 1 ( Qd (V (s) β Z (s)) + δ vx xβ ) As in the conventional case studied by Phillips and Hansen (1990) and Park (1992), the least squares estimator of β has rate-n instead of root-n convergence. Similarly, the limiting distribution contains nuisance parameters and is asymptotically biased and non-normal. The messy-data noise contributes additional nuisance parameters, bias, and non-normality. Superconsistency of the least squares estimator allows consistent estimation of the long-run and contemporaneous variances. J. Isaac Miller 14
Variance Estimation The kernel estimator and lag truncation parameter must satisfy lim n ( l 1 n + n 1 l n ) = 0 π belongs to the function class defined by de Jong and Davidson (2000) n 1/2 n k=0 π( k l n ) = o(1) EG, Bartlett kernel with l n = o ( n 1/2) lags. Let ˆb t ˆb t + ˆDz t with ˆD ( ˆβ, 0), and define ˆΩ b b 1 n n t=1 n s=1 ˆb tˆb s π ( ) t s, l n ˆ b b 1 n n π k=0 ( k l n ) n t=k+1 ˆb tˆb t k and ˆΣb b 1 n n t=1 ˆb tˆb t J. Isaac Miller 15
Consistent Variance Estimation Under the above assumptions, ˆΣ b b p Σ b b Σ bb + Σ b D + DΣ b + DΣ D ˆ b b p b b bb + b D + D b + D D ˆΩ b b p Ω b b Ω bb + Ω b D + DΩ b + DΩ D Note that: The variance estimators no longer estimate consistently the variances and covariances of the original error term. Instead, the variance estimators now estimate consistently the variance resulting from adding messy-data noise. J. Isaac Miller 16
Feasible Estimation Using a CCR Estimation works exactly as in the conventional case, except with the possibility of an additional step at the beginning. [1] Messy-data generating step. For example, interpolation or imputation techniques generate messy data. [2] Estimate β using least squares. [3] Estimate Ω, Σ, and. Use observable ( x t) and sample analog (ˆv t ) of unobservable (v t ) to get consistent estimates. Standard nonparametric techniques estimate all variances consistently. [4] Create (x t ) and (y t ). In using the same CCR technique, these are implicitly redefined by x t y t [5] Re-Estimate β. x t ˆ b xˆσ 1 b b ˆb t y t ˆβ ˆ b xˆσ 1 b b ˆb t ˆω v xˆω 1 xx x t J. Isaac Miller 17
Asymptotic Normality of ˆβ CCR Define a Brownian motion V Q (s) (V (s) β Z (s)) (ω vx β Ω x ) Ω 1 xx Q (s) The long-run variance lrvar(v Q 1 ω vv ω v β ω v Ω (s)) of this BM is ω vxω 1 xx ω xv Ω x Ω 1 xx ω xv ω vx Ω 1 xx Ω x Ω x Ω 1 xx Ω x 1 β The limiting distribution of ˆβ CCR is n(ˆβ CCR β) d ( QQ ) 1 QdV Q (s) or, equivalently, lrvar(v Q (s)) ( QQ ) 1/2 N (0, I) J. Isaac Miller 18
IV. Lerp Revisited Proposition For messy-data noise generated by linear interpolation, the second set of alternative assumptions holds as l. Simulations 10, 000 simulations, n = 240 (e.g., 20 years of monthly data), r = 1,..., 16 regressors. True parameters are β = ι r, Σ = I r, (b t ) is generated as a VAR(1) with autoregressive matrix (1/2)I r. The first regressor is observed at a lower frequency. Specifically, m = 2 and l = 80 (e.g, 20 years of quarterly data). J. Isaac Miller 19
Distributions of t-statistics for the First Four Regression Coefficients Row = # of Regressors Col = Regressor All Obs. Dropped Obs. Interpolated Obs. J. Isaac Miller 20
Root Mean Squared Error (In Sample) J. Isaac Miller 21
Root Mean Squared Error (Forecast m+1 Steps Ahead) J. Isaac Miller 22
Concluding Remarks When integrated or cointegrated data are messy in the sense that the messiness adds disturbances that are possibly nonstationary but not too strongly dependent (i.e., near-epoch dependent), some standard techniques for estimation and inference may be used. Such messiness does not detract from superconsistency of least squares, even though it adds asymptotic bias and non-normality Techniques such as CCR and FM-OLS were designed to correct for these problems with stationary data A CCR may be constructed that corrects for the bias and non-normality inherent in the cointegrated model and for the bias and non-normality that resulting from messy regressors A CCR is much simpler to implement than the Kalman filter, e.g., and does not rely on strict distributional assumptions or lengthy optimization routines J. Isaac Miller 23