Regression-Based Mixed Frequency Granger Causality Tests

Size: px

Start display at page:

Download "Regression-Based Mixed Frequency Granger Causality Tests"

Agatha Wade
5 years ago
Views:

1 Regression-Based Mixed Frequency Granger Causality Tests Eric Ghysels Jonathan B Hill Kaiji Motegi First Draft: October 1, 2013 This Draft: January 19, 2015 Abstract This paper presents a new mixed frequency Granger causality test that achieves high power even when we have a small sample size and a large ratio of sampling frequencies We postulate multiple parsimonious regression models where each model regresses a low frequency variable onto only one individual lag or lead of a high frequency variable We then construct a max test statistic that selects the largest squared estimator among all parsimonious regression models We show via Monte Carlo simulations that the max test is more powerful than existing mixed frequency Granger causality tests in small samples In an empirical application, we compute the max test over rolling windows of US macroeconomic data from Our study reveals a weekly interest rate spread caused quarterly real growth until about 2000, when evidence for such causality vanishes Keywords: Granger causality test, Local asymptotic power, Max test, Mixed data sampling (MIDAS), Sims test, Temporal aggregation Department of Economics and Department of Finance, Kenan-Flagler Business School, University of North Carolina at Chapel Hill eghysels@uncedu Department of Economics, University of North Carolina at Chapel Hill jbhill@ uncedu Faculty of Political Science and Economics, Waseda University motegi@aoniwasedajp

2 1 Introduction Time series are often sampled at different frequencies, and it is well known that temporal aggregation into the lowest frequency may hide or generate Granger s (1969) causality Existing Granger causality tests typically ignore this issue since are based on aggregating data to the common lowest frequency, leading possibly to spurious (non)causality See Zellner and Montmarquette (1971) and Amemiya and Wu (1972) for early contributions This subject has been subsequencly extensively researched: see, for example, Granger (1980), Granger (1988), Lütkepohl (1993), Granger (1995), Renault, Sekkat, and Szafarz (1998), Marcellino (1999), Breitung and Swanson (2002), and McCrorie and Chambers (2006), among others One of the most popular Granger causality tests is a Wald test based on multi-step ahead vector autoregression (VAR) models This approach can handle causal chains among more than two variables See Lütkepohl (1993), Dufour and Renault (1998), Dufour, Pelletier, and Renault (2006), and Hill (2007) Since standard VAR models are designed for single-frequency data, these tests often suffer from the adverse effect of temporal aggregation In order to alleviate this problem, Ghysels, Hill, and Motegi (2013) develop a set of Granger causality tests that explicitly take advantage of data sampled at mixed frequencies They extend Dufour, Pelletier, and Renault s (2006) VAR-based causality test using Ghysels (2012) mixed frequency vector autoregressive (MF-VAR) models MF-VAR models avoid temporal aggregation by using variables at the frequency in which they are reported 1 Ghysels, Hill, and Motegi s (2013) tests have low power when the ratio of sampling frequencies m is large and the sample size is small (eg in mixed monthly and quarterly data, m = 3) An essential reason for the low power is that the dimension of MF-VAR models soars as m increases, resulting in parameter proliferation Imposing parametric restrictions might be one simple solution, but this leads to misspecification if the restrictions are wrong It is thus desired to establish a new test that imposes no parametric constraints and still achieves high power for large m and small sample size The present paper proposes a regression-based mixed frequency Granger causality test that is in part based on Sims (1972) two-sided regression model We postulate multiple parsimonious regression models where the j-th model regresses a low frequency variable x L onto only the j-th lag or lead of a high frequency variable x H Our test statistic is the maximum among squared estimators scaled and weighted properly While our max test statistic follows a non-standard asymptotic distribution under the null hypothesis of Granger non-causality, a simulated p-value is readily available through an arbitrary number of draws from the null distribution The max test is thus straightforward to implement 1 MIDAS, standing for Mi(xed) Da(ta) S(ampling), regression models have been put forward in recent work by Ghysels, Santa-Clara, and Valkanov (2004), Ghysels, Santa-Clara, and Valkanov (2006), and Andreou, Ghysels, and Kourtellos (2010) See Andreou, Ghysels, and Kourtellos (2011) and Armesto, Engemann, and Owyang (2010) for surveys VAR models for mixed frequency data were independently introduced by Anderson, Deistler, Felsenstein, Funovits, Zadrozny, Eichler, Chen, and Zamani (2012), Ghysels (2012), and McCracken, Owyang, and Sekhposyan (2013) An early example of related ideas appears in Friedman (1962) Foroni, Ghysels, and Marcellino (2013) provide a survey of mixed frequency VAR models 1

3 in practice Through local asymptotic power analysis and Monte Carlo simulations, we compare the max test based on mixed frequency data ( MF max test ), a Wald test based on mixed frequency data ( MF Wald test ), the max test based on low frequency data ( LF max test ), and a Wald test based on low frequency data ( LF Wald test ) It turns out that, relative to LF tests, MF tests are more robust against complex (but realistic) causal patterns in both local asymptotics and finite samples The MF max test and the MF Wald test are roughly as powerful as each other in local asymptotics, but the former is clearly more powerful than the latter in finite sample For Granger causality from x H to x L, we prove the consistency of MF max test We also show by counter-examples that LF tests need not be consistent For Granger causality from x L to x H, proving the consistency of MF max test remains an open question As an empirical application, we conduct a rolling window analysis on a weekly interest rate spread and quarterly real GDP growth in the US over the period The MF max test yields an intuitive result that the weekly spread caused real growth until only about the year 2000 The remainder of the paper is organized as follows Sections 2 and 3 present the max test statistic and derives its asymptotic properties for the two cases of testing for non-causality from high-to-low and low-to-high frequencies In Section 4 we conduct local power analysis In Section 5 we run Monte Carlo simulations and Section 6 presents the empirical application Section 7 concludes the paper Proofs for all theorems as well as some theoretical details are provided in Technical Appendices, and tables and figures are collected at the end 2 Max Test : High-to-Low Granger Causality This paper focuses on a bivariate case where we have a high frequency variable x H and a low frequency variable x L The trivariate case involves causality chains in mixed frequency which is far more complicated, and detracts from the main thesis of dimension reduction See Dufour and Renault (1998), Dufour, Pelletier, and Renault (2006) and Hill (2007) We need to formulate a data generating process (DGP) governing these variables For each low frequency time period τ L Z, we have m high frequency time periods, while m is often called the ratio of sampling frequencies We sequentially observe {x H (τ L, 1),, x H (τ L, m), x L (τ L )} in a period τ L A simple example involves month and quarter where m = 3 x H (τ L, 1) is the first monthly observation of x H in quarter τ L, x H (τ L, 2) is the second, and x H (τ L, 3) is the third We then observe x L (τ L ), the quarterly observation of x L The assumption that x L (τ L ) is observed after x H (τ L, m) is merely a convention See Figure F1 in Ghysels, Hill, and Motegi (2014) for a visual explanation of these notations Example 21 (Mixed Frequency Data in Economic Applications) A leading example of how a mixed frequency model is useful in macroeconomics concerns quarterly real GDP growth x L (τ L ), 2

4 where existing studies of causal patterns use unemployment, oil prices, inflation, interest rates, etc aggregated into quarters (see Hill (2007) for references) Consider monthly CPI inflation in quarter τ L, denoted [x H (τ L, 1), x H (τ L, 2), x H (τ L, 3)] According to the Bureau of Economic Analysis, GDP is announced roughly one month after the quarter, with subsequent updates over the following two months (eg the 2014 first quarter advanced estimate is due on April 30, 2014) By comparison, the monthly CPI is announced roughly three weeks after the month Since the CPI inflation is announced before the GDP, {x H (τ L, 1), x H (τ L, 2), x H (τ L, 3), x L (τ L )} is an appropriate order The ratio of sampling frequencies, m, depends on τ L in some applications, including week vs month, where m is four or five This paper postpones such a case to the future work since time-dependent m complicates our statistical theory substantially We collect all observations in period τ L to define a K 1 mixed frequency vector X(τ L ) = [x H (τ L, 1),, x H (τ L, m), x L (τ L )] We have that K = m+1 since we are considering a bivariate case with one high and low frequency variable, and time-independent m Define the σ-field F τl σ(x(τ) : τ τ L ) Following Ghysels (2012) and Ghysels, Hill, and Motegi (2013), we assume that E[X(τ L ) F τl 1] has a version that is almost surely linear in {X(τ L 1),, X(τ L p)} for some finite p 1 Assumption 21 The mixed frequency vector X(τ L ) is governed by MF-VAR(p) for some finite p 1: x H (τ L, 1) = x H (τ L, m) } x L (τ L ) {{ } =X(τ L ) or compactly d 11,k d 1m,k c (k 1)m+1 x H (τ L k, 1) ϵ H (τ L, 1) p + d m1,k d mm,k c km x H (τ L k, m) ϵ H (τ L, m) } b km b (k 1)m+1 {{ a k }} x L (τ L k) {{ } } ϵ L (τ L ) {{ } A k =X(τ L k) ϵ(τ L ) X(τ L ) = p A k X(τ L k) + ϵ(τ L ) The error {ϵ(τ L )} is a strictly stationary martingale difference sequence (mds) with respect to increasing F τl F τl +1, with positive definite covariance matrix Ω E[ϵ(τ L )ϵ(τ L ) ] Remark 21 The mds assumption allows for conditional heteroscedasticity of unknown form, including GARCH-type processes By expanding the definition of the σ-fields F τl, we can also easily allow for stochastic volatility errors A constant term is omitted from (21) for simplicity, but can be easily added if desired Thus, X(τ L ) is mean centered Coefficients d s govern the autoregressive property of x H, while coefficients a s govern the autoregressive property of x L Coefficients b s and c s are relevant for Granger causality, so we explain how they are labeled in (21) b 1 is the impact of the most recent past observation of x H (ie x H (τ L 1, m)) on x L (τ L ), 3 (21)

5 b 2 is the impact of the second most recent past observation of x H (ie x H (τ L 1, m 1)) on x L (τ L ), and so on through b m In general, b k represents the impact of x H on x L when there are k high frequency periods apart from each other Similarly, c 1 is the impact of x L (τ L 1) on the nearest observation of x H (ie x H (τ L, 1)), c 2 is the impact of x L (τ L 1) on the second nearest observation of x H (ie x H (τ L, 2)), c m+1 is the impact of x L (τ L 2) on the (m + 1)-st nearest observation of x H (ie x H (τ L, 1)), and so on Finally, c pm is the impact of x L (τ L p) on x H (τ L, m) In general, c k represents the impact of x L on x H when there are k high frequency periods apart from each other Since {ϵ(τ L )} is not iid we must impose a weak dependence property in order to ensure standard asymptotics In the following we assume ϵ(τ L ) and X(τ L ) are stationary α-mixing Assumption 22 All roots of the polynomial det(i K p A kz k ) = 0 lie outside the unit circle, where det( ) is the determinant Assumption 23 X(τ L ) and ϵ(τ L ) are α-mixing: h=0 α 2 h < Remark 22 Notice Ω E[ϵ(τ L )ϵ(τ L ) ] allows for the high frequency innovations ϵ H (τ L, j) to have a difference variance for each j Thus, while Assumptions 21 and 22 imply {x H (τ L, j)} τl is covariance stationary for each fixed j {1,, m}, they do not imply the covariance stationarity of the entire high frequency series {{x H (τ L, j)} m j=1 } τ L We now discuss testing strategies for Granger causality between x H and x L Because there are fundamentally different challenges when testing for non-causality from high-to-low or low-tohigh frequency, we restrict attention to the former in this section, and treat the latter in Section 3 In order to study high-to-low frequency causation, we first pick the last row of the entire system (21): x L (τ L ) = p a k x L (τ L k) + ϵ L (τ L ) mds (0, σ 2 L), σ 2 L > 0 pm j=1 b j x H (τ L 1, m + 1 j) + ϵ L (τ L ), (22) The index j {1,, pm} is in high frequency terms, and the second argument m + 1 j of x H (τ L, m + 1 j) will go below 1 when j > m Allowing any integer value in the the second argument of x H (τ L, m+1 j), including those smaller than 1 or larger than m, does not cause any confusion, and simplifies analytical arguments below Simply note that x H (τ L, 0) is understood as x H (τ L 1, m); x H (τ L, 1) is understood as x H (τ L 1, m 1); x H (τ L, m + 1) is understood as x H (τ L + 1, 1) More generally, we can interchangeably write x H (τ L i, j) = x H (τ L, j im) for j = 1,, m and i 0 Complete details on these mixed frequency notation conventions are given in Appendix A Now define X L (τ L 1) = [x L (τ L 1),, x L (τ L p)], X H (τ L 1) = [x H (τ L 1, m + 1 1),, x H (τ L 1, m + 1 pm)], a = [a 1,, a p ], and b = [b 1,, b pm ] Then, (22) can be 4

6 compactly rewritten as: x L (τ L ) = X L (τ L 1) a + X H (τ L 1) b + ϵ L (τ L ) (23) It is evident from (21)-(23) that high-to-low Granger causality is associated with with coefficient b Based on the classic theory of Dufour and Renault (1998) and the mixed frequency extension made by Ghysels, Hill, and Motegi (2013), we know that x H does not Granger cause x L given the mixed frequency information set F τl = σ(x(τ) : τ τ L ) if and only if b = 0 pm 1 In other words, DGP (22) reduces to a pure AR(p) process under non-causality We are interested in testing the non-causality hypothesis H 0 : b = 0 pm 1 We want a test statistic that is consistent (ie power is one asymptotically against any deviation from non-causality), it achieves high power in local asymptotics and finite samples, and it does not produce size distortions in small samples Section 21 discusses the mixed frequency approach which works on high frequency observations of x H, while Section 22 discusses the conventional low frequency approach which works on an aggregated x H It turns out that only the former allows us to construct a consistent test 21 High-to-Low Granger Causality: Mixed Frequency Approach Before presenting our own test, it is helpful to review the existing mixed frequency Granger causality test proposed by Ghysels, Hill, and Motegi (2013) They work with a naïve regression model that regresses x L onto q low frequency lags and h high frequency lags of x H : q h x L (τ L ) = α k x L (τ L k) + β j x H (τ L 1, m + 1 j) + u L (τ L ) (24) j=1 for τ L = 1,, T L Ghysels, Hill, and Motegi (2013) estimate the parameters in (24) by least square sand then test H 0 : β 1 = = β h = 0 via a Wald test Model (24) contains DGP (22) as a special case when q p and h pm Hence the Wald test is trivially consistent if q p and h pm See also Ghysels, Hill, and Motegi (2013) A potential problem here is that pm, the true lag order of x H, may be quite large in some applications even when p is fairly small Consider a week vs quarter case for instance, then the MF-VAR lag order p is in terms of quarter and m = 13 approximately We thus have pm = 39 when p = 3, and pm = 52 when p = 4, etc Therefore, including sufficiently many high frequency lags h pm likely results in size distortions when sample size T L is small and m is large Size distortions may be corrected by bootstrap, but then finite sample power may get quite low Conversely, we take use a small number of lags h < pm to ensure the Wald statistic is well characterized by its chi-square limit distribution and therefore improve size This obviously comes with the cost of inconsistency since power cannot approach unity when there exists Granger causality involving lags beyond h A main contribution of this paper is to resolve this trade-off by combining multiple parsimo- 5

7 nious regression models: x L (τ L ) = q α k,j x L (τ L k) + β j x H (τ L 1, m + 1 j) + u L,j (τ L ), j = 1,, h (25) Model j is compactly rewritten as x L (τ L ) = [ ] X (q) L (τ L 1) x H (τ L 1, m + 1 j) α 1,j α q,j + u L,j(τ L ), j = 1,, h (26) = x j (τ L 1) θ j + u L,j (τ L ), β j say, where X (q) L (τ L 1) [x L (τ L 1),, x L (τ L q)] Observe that model j contains q low frequency autoregressive lags of x L as well as only the j-th high frequency lag of x H The number of parameters in model j is thus q + 1, which tends to be much smaller than the number of parameters in the naïve regression model (24), q + h This feature alleviates size distortions for large m and small T L In order for each parsimonious regression model to be correctly specified under the null hypothesis of high-to-low non-causality, we need to assume that the autoregressive part of (25) has enough lags: q p We impose the same assumption on the naïve regression model (24) in order to focus on the causality component, and not the autoregressive component Assumption 24 The number of autoregressive lags included in the the naïve regression model (24) and each parsimonious regression model (25), q, is larger than or equal to the true autoregressive lag order p in (22) The parsimonious regression models obviously reveal non-causality from high-to-low frequency since β j = 0 for each j in (24) implies β j = 0 in each j th equation in (25) The subtler challenge is showing that (25) reveals causation We first describe how to combine all h parsimonious models to get a test statistic for testing non-causality Consider estimating the parsimonious regression models (25) by least squares Since we are assuming that q p, each model is correctly specified under the null hypothesis of high-tolow non-causality Hence, if there is no causation from high-to-low frequency, the least squares estimators ˆβ p j 0, hence max1 j h { ˆβ j 2} p 0 Using this property, we propose a max test statistic: ( ˆT max TL w TL,j ˆβ ) 2 j, (27) 1 j h where {w TL,j : j = 1,, h} is a sequence of σ(x(τ L k) : k 1)-measurable L 2 -bounded nonnegative scalar weights with non-random mean-squared-error limits {w j } As a standardization, we assume h j=1 w T L,j = 1 without loss of generality Let W TL,h be an h h diagonal matrix 6

8 whose diagonal elements are w TL,1,, w TL,h Similarly, let W h be an h h diagonal matrix whose diagonal elements are w 1,, w h When we do not own any prior information about the weighting structure, a trivial choice of w TL,j is the non-random flat weight 1/h We can consider any other weighting structure by choosing desired {w TL,1,, w TL,h} (cfr Andrews and Ploberger (1994)) 211 Asymptotics under Non-Causality from High-to-Low Frequency We derive the asymptotic distribution of ˆT under H 0 : b = 0 pm 1 Stack all parameters across the h models (26), and write and construct a selection matrix R such that θ [θ 1,, θ h ], β [β 1,, β h ] = Rθ Specifically, R is an h (q+1)h matrix with R j,(q+1)j = 1 for j = 1,, h, and all other elements are zero Under Assumptions 21-24, it is not hard to prove the asymptotic normality of ˆθ and hence ˆβ A simple weak convergence argument then suffices for the max test statistic Theorem 21 Let Assumptions hold Under H 0 : b = 0 pm 1, we have that ˆT max 1 j h Nj 2 as T L, where N [N 1,, N h ] is distributed N(0 h 1, V ) with positive definite covariance matrix: where σ 2 L E[ϵ2 L (τ L)], and V σ 2 LW h RSR W h R h h, (28) Σ 1,1 Σ 1,h S R(q+1)h (q+1)h and Σ j,i Γ 1 j,j Γ j,iγ 1 i,i R (q+1) (q+1), d Σ h,1 Σ h,h Γ j,i E [ 0 1 q q 0 x j (τ L 1)x i (τ L 1) ] R (q+1) (q+1) and R Rh (q+1)h 0 1 q q 1 (29) Proof See Appendix B 212 Simulated p-value The mixed frequency max test statistic ˆT has a non-standard limit distribution under H 0 that can easily simulated for computation of an approximate p-value Let ˆV TL be a consistent estimator of V (see below), and draw R samples N (1),, N (R) independently from N(0 h 1, ˆV ) Now compute artificial test statistics ˆT (r) max 1 j h (N (r) j ) 2 The asymptotic p-value approximation is ˆp = (1/R) R r=1 I( ˆT (r) > ˆT ) Since N (r) are iid, and R can be made arbitrarily large, by the Glivenko-Cantelli Theorem ˆp can be made arbitrarily close to P ( ˆT (1) > ˆT ) 7

9 Define the max test limit distribution under H 0 : F 0 (c) P (max 1 j h (N (r) j ) 2 c) Thus, the asymptotic p-value is F 0 ( ˆT ) 1 F 0 ( ˆT ) = P (max 1 j h (N (r) j ) 2 ˆT ) By an argument identical to Theorem 2 in Hansen (1996), we have the following link between P ( ˆT (1) > ˆT ) and the asymptotic p-value for ˆT Theorem 22 Under Assumptions it follows P ( ˆT (1) > ˆT ) = F 0 ( ˆT ) + o p (1) Therefore ˆp = F 0 ( ˆT ) + o p (1) Proof See Appendix C p A consistent estimator of V in (28) is easily obtained Simply note that W TL,h W h by assumption, and ˆΓ j,i 1/T TL L τl =1 x j(τ L 1)x i (τ L 1) p Γj,i under Assumptions Consistent estimators ˆΣ p j,i Σj,i and ˆΣ p Σ can then be obtained directly from (29) Next, a consistent estimator ˆσ L 2 of σ2 L E[ϵ2 L (τ L)] can be obtained by computing residuals ˆϵ L (τ L ) from 22 Notice, though, that we only require consistency for the true σ 2 L under H 0 since power only requires an estimator with a constant finite probability limit As a bonus, estimating σ 2 L under H 0 can be done simply by fitting an AR(q) model for x L and computing the sample variance of residuals 213 Identification of Null and Alternative Hypotheses We now derive a sufficient condition for identifying both null and alternative hypotheses In particular, we show that as long as the number of high frequency lags h used across the parsimonious regression models (25) is at least as large as the dimension pm of b, then the parsimonious regression parameters β identify null and alternative hypotheses in the sense b = 0 pm 1 if any only β = 0 h 1 Under the condition h pm the max test statistic ˆT then has its intended limit properties under either hypothesis If there is Granger causality then ˆβ j estimated in the parsimonious models (25) are in general inconsistent estimators of the true β j in model (24) due to omitted regressors The next results shows that the least squares first order equations for (25) identify some so-called pseudo-true values β = [β 1,, β h ], which are identically the probability limits of ˆβ j Specifically, β is a function of underlying parameters a, b, and σ 2 L as well as population moments of x H and x L Stack all parameters from (26) and write θ = [θ 1,, θ h ], and let ˆθ be the least squares estimator Theorem 23 Let Assumptions 21, 22, and 24 hold Then ˆθ p θ [θ 1,, θ h ], the 8

10 unique pseudo-true value of θ that satisfies θj α 1,j α p,j α p+1,j α q,j β j a 1 a p = 0 + [ E [ x j (τ L 1)x j (τ L 1) ]] 1 E [ x j (τ L 1)X H (τ L 1) ] b, (210) }{{}}{{} =Γ 1 j,j : (q+1) (q+1) C j : (q+1) pm 0 0 where x j (τ L 1) is a vector of all regressors in each parsimonious regression model (cfr (26)) while X H (τ L 1) is a vector of pm high frequency lags of x H (cfr (23)) Therefore ˆβ p β = Rθ, the unique pseudo-true value of β Proof See Appendix D Remark 23 Although tedious, the population covariance terms Γ j,j and C j can be characterized by the underlying parameters a, b, and σl 2 See the local asymptotic power analysis in Section 4 Theorem 23 provides useful insights on the relationship between the underlying coefficient b and the pseudo-true value β for the parameter β First, as noted in the discussion leading to Theorem 21, it is clear that β = 0 h 1 whenever there is non-causality (ie b = 0 pm 1 ), regardless of the relative magnitude of h and pm Second, as the next result proves, b = 0 pm 1 whenever β = 0 h 1, provided h pm Theorem 24 Let Assumptions 21, 22, and 24 hold, and assume h pm Then β = 0 h 1 implies b = 0 pm 1, hence β = 0 h 1 if and only if b = 0 pm 1 Proof See Appendix E Theorems 21 and 24 imply the max test statistic has its intended limit properties under either hypothesis Assume the weight limits w j > 0 for all j = 1,, h so that we have a non-trivial result under the alternative In view of Theorem 24, we also assume h is sufficiently large to allow the parsimonious regression models to identify the hypotheses Assumption 25 Let h pm We first tackle the limit when H 0 is false This will allow us to definitively argue that an asymptotic chi-squared limit holds if and only if H 0 is true The max test statistic construction (27) and with non-trivial weights w j > 0 for all j = 1,, h indicates that ˆT p if and only if β 0 h 1, and by Theorems 210 and 24 ˆβ p β 0 h 1 under a general alternative hypothesis H 1 : b 0 pm 1, given h pm This proves consistency of the mixed frequency max test 9

11 Theorem 25 Let Assumptions hold, and assume w j > 0 for all j = 1,, h Then ˆT p if and only if H 1 : b 0 pm 1 is true As an immediate consequence of the limit distribution Theorem 21, identification Theorem 24 and consistency Theorem 25, the limiting null distribution arises if and only if H 0 is true Corollary 26 Let Assumptions hold, and assume w j > 0 for all j = 1,, h Then ˆT d max 1 j h N 2 j as T L if and only if H 0 : b = 0 pm 1 is true If we choose h < pm then is possible for asymptotic power to be less than unity, as the following example reveals Example 22 (Inconsistency due to Small h) Consider a simple DGP with m = 2 and p = 1: x H (τ L, 1) x H (τ L, 2) = } x L (τ L ) {{ } =X(τ L ) ϵ(τ L ) mds (0 3 1, Ω), Ω = x H (τ L 1, 1) ϵ H (τ L, 1) x H (τ L 1, 2) + ϵ H (τ L, 2), 1/ρ 1 0 x L (τ L 1) ϵ L (τ L ) }{{}}{{}}{{} =A 1 =X(τ L 1) =ϵ(τ L ) 1 ρ 0 ρ 1 0, ρ 0, ρ < (211) We will show that choosing (q, h) = (1, 1) provides zero power (above the nominal size) and choosing (q, h) = (1, 2) provides asymptotic power of 1 We first compute Γ 1,1 = E[x 1 (τ L 1)x 1 (τ L 1) ], where x 1 (τ L 1) = [x L (τ L 1), x H (τ L 1, 2)] as defined in (26) and (210) It follows that 2 [ ] 1/ρ 2 0 Γ 1,1 = 0 1 Next we consider Γ 2,2 = E[x 2 (τ L 1)x 2 (τ L 1) ], where x 2 (τ L 1) = [x L (τ L 1), x H (τ L 1, 1)] It is easy to show that Γ 2,2 = Γ 1,1 and hence [ ] Γ 1 1,1 = Γ 1 2,2 = ρ We next compute C 1 E[x 1 (τ L 1)X H (τ L 1) ], where X H (τ L 1) = [x H (τ L 1, 2), x H (τ L 2 DGP (211) implies that E[x H(τ L 1, 1) 2 ] = E[x H(τ L 1, 2) 2 ] = 1, E[x L(τ L 1) 2 ] = 1/ρ 2, and E[ x L(τ L 1) x H(τ L 1, 2) ] = E [ ( 1 xh(τl 2, 1) + xh(τl 2, 2) + ϵl(τl 1)) xh(τl 1, 2)] = 0 Using these results, ρ we have that Γ 1,1 E [ x 1 (τ L 1)x 1 (τ L 1) ] [ ] [ ] E[x L(τ L 1) 2 ] E[x L(τ L 1)x H(τ L 1, 2)] 1/ρ 2 0 = E[x H (τ L 1, 2)x L (τ L 1)] E[x H (τ L 1, 2) 2 = ]

12 1, 1)] as defined in (23) It is evident that C 1 E [ x 1(τ L 1)X H(τ L 1) ] [ ] [ ] = E[x L (τ L 1)x H (τ L 1, 2)] E[x L (τ L 1)x H (τ L 1, 1)] 0 0 = E[x H (τ L 1, 2)x H (τ L 1, 2)] E[x H (τ L 1, 2)x H (τ L 1, 1)] 1 ρ and similarly C 2 E [ x 2(τ L 1)X H(τ L 1) ] [ ] [ ] = E[x L (τ L 1)x H (τ L 1, 2)] E[x L (τ L 1)x H (τ L 1, 1)] 0 0 = E[x H(τ L 1, 1)x H(τ L 1, 2)] E[x H(τ L 1, 1)x H(τ L 1, 1)] ρ 1 In view of (210), we get that [ α 1,1 β 1 ] [ ] [ ] [ ] [ ] ρ = = ρ 1/ρ 0 and [ α 1,2 β 2 ] [ ] [ ] [ ] [ ] ρ = = 0 1 ρ 1 1/ρ ρ 1/ρ Observe that β1 = 0 and β 2 = ρ 1/ρ 0 since ρ < 1 Therefore, if we choose h = 1, the mixed frequency max test statistic ˆT converges to the Theorem 21 asymptotic null distribution under H 0, resulting in no power (above the nominal size) However, if we choose h = 2 and assign positive weight w 2 > 0 to ˆβ 2, then ˆT p and there is power approaching 1 Assume ρ > 0 for simplicity The simple explanation behind the lack of power is that the positive impact of x H (τ L 1, 2) on x L (τ L ), the negative impact of x H (τ L 1, 1) on x L (τ L ), and the positive autocorrelation of x H all offset each other to make the pseudo-true β1 = 0 22 High-to-Low Granger Causality : Low Frequency Approach The mixed frequency max test based on model 25 is consistent as long as h pm, and the mixed frequency Wald test based on model 24 is trivially consistent given Assumption 21 Both tests operate on data sampled at its observed frequency If, instead, we worked on an aggregated x H, then neither test would be consistent no matter how many low frequency lags of x H we included In order to verify this point, we formulate a Wald statistic based on s low frequency version of model and a max test statistic based on a low frequency version of parsimonious regression models We introduce linear aggregation scheme m x H (τ L ) = δ j x H (τ L, j) where δ j 0 for all j = 1,, m and j=1 m δ j = 1 j=1 The linear aggregation scheme is sufficiently general for most economic applications since it includes flow sampling (ie δ j = 1/m for j = 1,, m) and stock sampling (ie δ j = I(j = m) for j = 1,, m) as special cases Note that δ j is not a parameter to estimate; it is a fixed quantity that determines an aggregation scheme We start with a low frequency naïve regression model and then move on to parsimonious regression models 11

13 221 Low Frequency Naïve Regression The low frequency naïve regression model is: x L (τ L ) = q α k x L (τ L k) + h β j x H (τ L j) + u L (τ L ) j=1 = [X (q) L (τ L 1), x H (τ L 1),, x H (τ L h) }{{} X H (τ L 1) ] α 1 α q β +u L (τ L ) 1 (212) = [X (q) L (τ L 1), X H (τ L 1) (LF ] θ ) + u L (τ L ) }{{} x(τ L 1) β h }{{} θ (LF ) Notice that X H (τ L 1) is an h 1 vector stacking aggregated x H, and x(τ L 1) is a (q + h) 1 vector of all regressors The superscript LF in θ (LF ) emphasizes that we are working on a low frequency model here 3 We impose Assumption 24 such that q p in order to focus on the testing causality Since (21) governs the data generating process, the pseudo-true value for θ (LF ), denoted θ (LF ), can be derived easily: θ (LF ) α 1 α p α p+1 α q β = a 1 a p h 1 + [E [x(τ L 1)x(τ L 1) ]] 1 E [x(τ L 1)X H (τ L 1) ] b, (213) }{{}}{{} Γ 1 : (q+h) (q+h) C: (q+h) pm where β = [β 1,, β h ] The derivation of (213) is omitted since it is similar to the proof of Theorem 23 A low frequency Wald statistic W LF is simply a classic Wald statistic with respect to a hypothesis β [β 1,, β h ] p = 0 h 1 Consistency requires that W LF whenever from model (23) there is high-to-low non-causality b 0 pm 1 We present a counter-example where high-to-low Granger causality exists such that b 0 pm 1, yet in the LF model (213) β = 0 h 1 Example 23 (Inconsistency of Low Frequency Wald Test) Consider an even simpler DGP 3 LF should also be put on α s, β s, and u L (τ L ) since they are generally different from the parameters and error term in the mixed frequency naïve regression model (24) We refrain from doing that for the sake of notational brevity 12

14 than (211) with m = 2 and p = 1: x H (τ L, 1) x H (τ L 1, 1) ϵ H (τ L, 1) x H (τ L, 2) = x H (τ L 1, 2) + ϵ H (τ L, 2), ϵ(τ L ) mds (0 3 1, I 3 1 ), b (214) } x L (τ L ) {{ } b 2 } b 1 {{ 0 }} x L (τ L 1) {{ } } ϵ L (τ L ) {{ } =X(τ L ) =A 1 =X(τ L 1) =ϵ(τ L ) The linear aggregation scheme when m = 2 is x H (τ L ) = δ 1 x H (τ L, 1) + (1 δ 1 )x H (τ L, 2) We want to show that β = 0 h 1, regardless of linear aggregation scheme (ie for any δ), and the number of used low frequency lags of the aggregated high frequency x H (ie h) We require a key population moment in (213): E[x L (τ L 1)x H (τ L 1, 2)] E[x L (τ L 1)x H (τ L 1, 1)] E[x H (τ L 1)x H (τ L 1, 2)] E[x H (τ L 1)x H (τ L 1, 1)] C E[x(τ L 1)X H (τ L 1) ] = E[x H (τ L 2)x H (τ L 1, 2)] E[x H (τ L 2)x H (τ L 1, 1)] E[x H (τ L h)x H (τ L 1, 2)] E[x H (τ L h)x H (τ L 1, 1)] Given (214), it follows that: δ 1 δ 1 C = Given δ 1 we can find a b such that Cb = 0 (h+1) 1 Simply let b 1 = 0 and b 2 0 if δ 1 = 0; let b 1 0 and b 2 = b 1 (1 δ 1 )/δ 1 if δ 1 (0, 1); or let b 1 0 and b 2 = 0 if δ 1 = 1 For any of these three cases, we have that Cb = 0 (h+1) 1 and thus β = 0 h 1 in view of (213) Intuitively, the impact of x H (τ L 1, 1) on x L (τ L ) and the impact of x H (τ L 1, 2) on x L (τ L ) are inversely proportional to the aggregation scheme Hence, high-to-low causal effects are offset by each other after temporal aggregation 4 Equation (214) immediately implies that x L (τ L 1) = b 1 ϵ H (τ L 2, 2) + b 2 ϵ H (τ L 2, 1) + ϵ L (τ L 1) and therefore E[x L (τ L 1)x H (τ L 1, 2)] = b 1 E[ϵ H (τ L 2, 2)ϵ H (τ L 1, 2)] + b 2 E[ϵ H (τ L 2, 1)ϵ H (τ L 1, 2)] + ϵ L (τ L 1)ϵ H (τ L 1, 2) = 0 Similarly, E[x L (τ L 1)x H (τ L 1, 1)] = 0 In addition, assuming a general linear aggregation scheme, E[x H (τ L j)x H (τ L 1, 2)] = E[(δ 1 x H (τ L j, 1) + (1 δ 1 )x H (τ L j, 2))x H (τ L 1, 2)] = (1 δ 1 )I(j = 1) Similarly, E[x H (τ L j)x H (τ L 1, 1)] = δ 1 I(j = 1) Thus, the second row of C is [1 δ 1, δ 1 ] and all other rows are zeros 13

15 222 Low Frequency Parsimonious Regression Now consider regressing x L onto its own low frequency lags and only one low frequency lag of aggregated x H : x L (τ L ) = q α k,j x L (τ L k) + β j x H (τ L j) + u L,j (τ L ) = [X (q) L (τ L 1), x H (τ L j)] }{{} x j (τ L 1) α 1,j α q,j β j }{{} θ (LF ) j +u L,j (τ L ), j = 1,, h (215) We still impose the Assumption 24 requirement q p Since we are assuming the same DGP (21) as in the mixed frequency case, the pseudo-true value for θ (LF ) (LF ) j, denoted as θ j, can be easily derived by replacing x j (τ L 1) with x j (τ L 1) in (210): (LF ) θ j α 1,j α p,j α p+1,j α q,j β j a 1 a p = 0 + [ E [ x j (τ L 1)x j (τ L 1) ]] 1 E [ x j (τ L 1)X H (τ L 1) ] b (216) }{{}}{{} Γ 1 j,j : (q+1) (q+1) C j : (q+1) pm 0 0 The low frequency max test statistic is constructed in the same way as (27): ˆT (LF ) max 1 j h ( T L w TL,j ˆβ j ) 2 The limit distribution of ˆT (LF ) under H 0 : b = 0 pm 1 has the same structure as the distribution limit in Theorem 21, the difference being that x j (τ L 1) there should be replaced with x j (τ L 1) Therefore, the Gaussian limit distribution covariance V σ 2 L W hrsr W h R h h in (28) is now defined with a different S based on x j (τ L 1) In the sprit of Example 23, we can easily show that ˆT (LF ) is inconsistent: asymptotic power is not one in all deviations from the null hypothesis 3 Max Test : Low-to-High Granger Causality We now consider testing for Granger causality from x L to x H, both in mixed and low frequency settings In view of model (21), the null hypothesis is now H 0 : c = 0 pm 1 14

16 31 Mixed Frequency Approach Consider the same MF-VAR(p) data generating process (21) as before 311 Naïve regression : Wald Test One possible way of testing for low-to-high causality (ie causality from x L to x H ) is a Wald test based on the naïve regression model below, which is a natural extension of Sims (1972) two-sided regression model to the mixed frequency framework: x L (τ L ) = q h α k x L (τ L k) + β j x H (τ L 1, m + 1 j) (317) + j=1 r γ j x H (τ L + 1, j) + u L (τ L ) j=1 Model (317) regresses x L onto q low frequency lags of x L, h high frequency lags of x H, and r high frequency leads of x H If we estimate (317) by least squares, and implement a Wald test with respect to the hypothesis γ = [γ 1,, γ r ] = 0 r 1, then under the null hypothesis of low-to-high non-causality H 0 : c = 0 pm 1, and under Assumptions 21-24, the Wald statistic has a χ 2 r limit distribution, as long as q p and h pm (cfr Sims (1972)) The latter conditions ensure the two-sided regression contains the DGP under the null hypothesis: see Remark 31 below Similarly, under the preceding conditions the Wald statistic is consistent against a general alternative H 1 : c 0 pm 1 Notice that q + h may be quite large, and in general parameter proliferation renders the chi-square distribution a poor approximate of the true small sample distribution of the Wald statistic Since an asymptotic test may lead to size distortions, in the simulation study of Section 5 we bootstrap the p-values for sharper empirical size 312 Parsimonious Regressions: Max Test We first propose parsimonious regression models: q h x L (τ L ) = α k,j x L (τ L k) + β k,j x H (τ L 1, m + 1 k) (318) + γ j x H (τ L + 1, j) + u L,j (τ L ), j = 1,, r Model j regresses x L onto q low frequency lags of x L, h high frequency lags of x H, and only the j-th high frequency lead of x H We can write the parsimonious regression models (318) in matrix form Let n = q + h + 1, the number of regressors in each model Define n 1 vectors y j (τ L 1) = [x L (τ L 1),, x L (τ L q), x H (τ L 1, m + 1 1),, x H (τ L 1, m + 1 h), x H (τ L + 1, j)] and ϕ j = [α 1,j,, α q,j, β 1,j,, β h,j, γ j ] y j (τ L 1) is a vector of all regressors while ϕ j is a vector of all 15

17 parameters in model j Using these notations, model j can be rewritten as x L (τ L ) = y j (τ L 1) ϕ j + u L,j (τ L ) Let ˆγ j be the least squares estimator for γ j from model j, and stack them into ˆγ = [ˆγ 1,, ˆγ r ] Under the null hypothesis of low-to-high non-causality γ = 0 r 1 Using this property, we formulate a mixed frequency max test statistic for low-to-high causality: ( ) 2 Û max TL w TL,jˆγ j, (319) 1 j r where w TL = [w TL,1,, w TL,r] is a same weighting scheme as before Using Assumptions 21-25, the asymptotic null distribution of Û can be derived in the same way as in Theorem 21 The proof is therefore omitted Theorem 31 Let Assumptions hold Under H 0 : c = 0 pm 1 we have that Û d max 1 j r Ñj 2 as T L, where Ñ [Ñ1,, Ñr] is distributed N(0 r 1, Ṽ ) with positive definite covariance matrix: Ṽ σ 2 L W r R S R W r R r r, where σ 2 L E[ϵ2 L (τ L)]; S is defined the same way as S in (29) by replacing the regressors x j (τ L 1) with y j (τ L 1); and selection matrix R is a r-by-(q + h + 1)r matrix that picks [γ 1,, γ r ] out of [ϕ 1,, ϕ r ] Remark 31 Notice that for the distribution limit under H 0 : c = 0 pm 1, we now require Assumption 25 such that the number of high frequency lags h in models (317) and (318) is at least as large as the true lag length pm We did not require h pm in order to derive the high-to-low frequency max test limit distribution in Theorem 21 precisely because for any h 1 the coefficients β are identically 0 under the null of no causality from high-to-low frequency We only imposed h pm to deduce by Corollary 26 that the limit distribution applies if and only if no causation from high-to-low frequency is true We require h pm in Theorem 31, however, in order to ensure that the true DGP is contained in the parsimonious regressions under the null hypothesis of no causation from low-to-high frequency: under no causation it follows γ = 0 r 1, hence ˆγ p 0, only if the model is otherwise correctly specified vis-à-vis model (21) Remark 32 Consistency of the low-to-high max test is an open question, and evidently not yet resolved by our methods 313 MIDAS Polynomials in the Max Test The parsimonious regression models (318) have a fewer number of parameters that the naïve model (317), but the parsimonious models may still have many parameters Since h and r may still be large relative to the sample size, there may still be size distortions in the max test Our simulation study reveals this, where in general a comparatively large low frequency sample size is needed for max test empirical size to be very close to the nominal level 5 5 In our study where m = 12 we find T L {40, 80} is not large enough, but T L 120 is large enough for sharp max-test empirical size If the low frequency is years, such that there are m = 12 high frequency months, then 16

18 Since we are interested in low-to-high causality, we may impose a MIDAS polynomial for the high-to-low causality part for further dimension reduction, and keep the low-to-high causality part unrestricted See Ghysels, Santa-Clara, and Valkanov (2004), Ghysels, Santa-Clara, and Valkanov (2006), Ghysels, Sinko, and Valkanov (2007), and Andreou, Ghysels, and Kourtellos (2010) The model now becomes: x L (τ L ) = q α k x L (τ L k) + h ω k (π)x H (τ L 1, k) (320) + γ j x H (τ L + 1, j) + u L (τ L ), j = 1,, r where ω k (π) represents a MIDAS polynomial with a small-dimensional parameter vector π R s Using a MIDAS polynomial with small s is a common technique to save the number of parameters There are a variety of possible MIDAS polynomials ω k (π) in the literature (see Technical Appendix A of Ghysels (2012)) In our simulation study we use the Almon polynomial with dimension s, namely ω k (π) = s l=1 π lk l Notice that the model is linear in π, allowing for least squares as opposed to nonlinear least squares: h MF ω k (π)x H (τ L 1, k) = h MF ( s ) π l k l x H (τ L 1, k) l=1 (1 1) 1 1 (1 1) s 1 = [x H (τ L 1, ),, x H (τ L 1, h MF )] (h MF 1) 1 1 (h MF 1) s 1 Another important characteristic of the Almon polynomial is that it allows negative and positive values in general (eg w k (π) 0 for k < 3 and w k (π) < 0 for k 4, etc) Many other MIDAS polynomials, like the beta probability density or exponential Almon, assume a single sign for all lags (eg w k (π) 0 for all k) MIDAS regressions, of course, may be misspecified Thus, the least squares estimator of γ may not be consistent for 0 under the null, but rather may be consistent for some non-zero pseudo-true value identified by the resulting first order moment conditions This is precisely the case in our simulation study in Section 52 Nevertheless, we show that a model with misspecified MIDAS polynomials leads to a dramatic improvement in empirical size, even though the max test statistic for that model does not have its intended null limit distribution We also show that size distortions vanish with a large enough sample size (cfr Footnote 5) Another option for improving max test empirical size in small samples is to use a bootstrap procedure for p-value computation We find that using a wild bootstrap, similar to Gonçalves and Killian s (2004) bootstrap, does not alleviate size distortions We therefore leave as open question how best to improve max test empirical size in a way that leads to valid asymptotic T L = 120 years is obviously too large for practical applications in macroeconomics and finance, outside of deep historical studies If the low frequency is quarters such that the high frequency is approximately m = 12 weeks, then T L = 120 quarters is 30 years, which is clearly more practical π 1 π s 17

19 inference 32 Low Frequency Approach Aggregate the high frequency variable with the filtration x H (τ L ) = m j=1 δ jx H (τ L, j), and consider a low frequency counterpart to the parsimonious regression models (318): x L (τ L ) = q h LF α k,j x L (τ L k) + β k,j x H (τ L k) + γ j x H (τ L + j) + u L,j (τ L ), j = 1,, r LF (321) Subscript LF is put on h and r in order to emphasize that they signify the number of low frequency lags and leads of aggregated x H, respectively The subscript will be dropped for clarity when there is no confusion We estimate the parsimonious model (321) by least squares, and then use a low frequency max test statistic as in (319): Û (LF ) max 1 j r LF ( T L w TL,jˆγ j ) 2 Deriving the limit distribution of Û (LF ) under H 0 : x L x H requires an extra assumption other than Assumptions Under the null hypothesis of low-to-high non-causality, a correctly specified two-sided MF regression reduces to (22) In general, each low frequency parsimonious regression model (321) does not contain (22) as a special case The true high-to-low causal pattern based on the non-aggregated x H (τ L, i), pm l=1 b lx H (τ L 1, m + 1 l), may not be fully captured by the low frequency lags of aggregated x H, h LF β k,jx H (τ L k), no matter what the lag length h LF is See Examples 31 and 32 below In order to find a condition that ensures each low frequency parsimonious regression model contains (22) as a special case, we elaborate the relationship between the two summation terms pm l=1 b lx H (τ L 1, m + 1 l) and h LF β k,jx H (τ L k) In the following we write β k instead of β k,j since it is irrelevant which j th lead term of x H is included in the model Observe that the aggregated high frequency variable is: h LF β k x H (τ L k) = h LF m β k l=1 δ l x H (τ L k, l) (322) = β 1 δ m x H (τ L 1, m + 1 1) + + β 1 δ 1 x H (τ L 1, m + 1 m) = + + β hlf δ m x H (τ L h LF, m + 1 1) + + β hlf δ 1 x H (τ L h LF, m + 1 m) h LF m l=1 β l/m δ l/m m+1 l x H (τ L 1, m + 1 l), where x is the smallest integer not smaller than x The last equality exploits the notational convention that the second argument of x H can go below 1 (see Appendix A) Now compare the last term in (322) with the true high-to-low causal pattern pm l=1 b lx H (τ L 1, m + 1 l) 18

20 The following assumption gives a sufficient property for the true b in order for the parsimonious regressions with aggregated x H to contain the true DGP under the null Assumption 31 Fix the linear aggregation scheme δ = [δ 1,, δ m ], and fix the true causal pattern from x H to x L : b = [b 1,, b pm ] There exists β = [β 1,, β p] such that b l = β l/m δ l/m m+1 l for all l {1,, pm} provided a sufficiently large lag length h LF chosen p is Remark 33 Assumption 31 ensures that there exists a pseudo-true β such that h LF β k x H (τ L k) = pm l=1 b l x H (τ L 1, m + 1 l), in which case each parsimonious regression model (321) is correctly specified under H 0 : x L x H Thus, for a given aggregation scheme δ it assumes the DGP itself, specifically b = [b 1,, b pm ], allows for identification of the DGP under low-to-high non-causality using an aggregated high frequency variable x H Remark 34 Assumption 31 is effectively a low frequency version of Assumption 25, with the deeper implication of which DGP can be aggregated and still retain identification of underlying causal patterns If there is non-causality from x H to x L (ie b = 0 pm 1 ), then Assumption 31 is trivially satisfied by choosing any h LF N and letting β k = 0 for all k {1,, h LF } If there is causality (ie b 0 pm 1 ), then Assumption 31 is a relatively stringent assumption on the DGP The following examples show that some DPG s cannot satisfy Assumption 31, in particular that a low frequency test may not be able to reveal whether there is low-to-high frequency causation Example 31 (Lagged Causality with Stock Sampling) Assume m = 3 and p = 2 and consider lagged causality b l = b I(l = 4) for l {1,, 6} with b 0 This means that only x H (τ L 2, 3) has a nonzero coefficient b and all other terms x H (τ L 1, 3), x H (τ L 1, 2), x H (τ L 1, 1), x H (τ L 2, 2), x H (τ L 2, 1) have no impact on x L (τ L ) This can be thought of as an example of delayed information transmission, or seasonality We will show that such a causal pattern can be captured by the low frequency parsimonious regression models if and only if a linear aggregation scheme is stock sampling We first show that the true causal pattern can be captured under stock sampling Since stock sampling is represented as δ l = I(l = 3) for l {1, 2, 3}, the summation term included in each parsimonious regression models, h LF β kx H (τ L k), can be rewritten as h LF β kx H (τ L k, 3) Thus, we can simply choose h LF = 2, β1 = 0, and β 2 = b to replicate the true causal pattern Next we show that the true causal pattern cannot be captured under any other linear aggregation scheme Assume δ l 0 for all l {1, 2, 3}, δ 3 < 1, and 3 l=1 δ l = 1 This aggregation scheme obviously excludes stock sampling, but allows for any other Since δ 3 < 1, at least one of δ 1 and δ 2 should have a positive value Assume δ 1 > 0 without loss of generality Assumption 31 requires b 4 = β 2 δ 3 and b 6 = β 2 δ 1, but the true causal pattern implies b 4 = b 0 and b 6 = 0 Since δ 1 > 0, there does not exist any β 2 that satisfies these four equalities simultaneously 19

21 Thus, for a given non-stock sampling scheme, this DGP does not allow for a low frequency test to be able to identify whether there is low-to-high causality since the true high-to-low causal pattern cannot be duplicated Example 32 (Flow Sampling) Under flow sampling δ l = 1/m for all l {1,, m} Thus, Assumption 31 requires that b l = β l/m /m, hence b 1 = = b m, b m+1 = = b 2m, and so on In other words, Assumption 31 holds only when all m high frequency lags of x H in each low frequency period have an identical coefficient This may be an unrealistic assumption for many macroeconomic time series since we often have both positive and negative signs, lagged causality, or decaying causality It is straightforward to derive the asymptotic distribution of low frequency max test statistic Û (LF ) under H 0 : x L x H Write h and r instead of h LF and r LF for brevity Define an n 1 vector of all regressors in model j: y j (τ L 1) [x L (τ L 1),, x L (τ L q), x H (τ L 1),, x H (τ L h), x H (τ L + j)] Under Assumptions and 31, the asymptotic distribution of Û (LF ) under H 0 : x L x H is identical to that in Theorem 31, except we replace regressors y j (τ L 1) with y j (τ L 1) 4 Local Asymptotic Power Analysis In Sections 2 and 31 we discuss four different tests for high-to-low and low-to-high causality: mixed and low frequency max tests, and mixed and low frequency Wald tests The results of Section 2 characterize the asymptotic global power properties of these four tests in the high-tolow causality case 6 The MF max test and the MF Wald test are both consistent against any deviation from non-causality, as long as the selected number of high frequency lags h is larger than or equal to the true lag order pm Conversely, the LF tests are sensitive to the chosen aggregation scheme: for some DGP s and aggregation schemes power is trivial, hence these tests are not generally consistent In this section we study the local power properties of each test for the high-to-low case Indeed, as Examples 31 and 32 show, the LF tests have asymptotic power of one in some cases depending on the aggregation scheme and DGP An advantage of the low frequency approach is that we often have fewer parameters than in the mixed frequency approach Hence, in some cases the low frequency approach has in fact higher local power than the mixed frequency approach does Impose Assumptions 21-24, and consider DGP (21) The high-to-low non-causality null hypothesis is H 0 : b = 0 pm 1, hence the local alternative hypothesis is H L 1 : b = (1/ T L )ν, 6 Asymptotic and local power in the low-to-high case remains unresolved, and is therefore not discussed here 20

Simple Granger Causality Tests for Mixed Frequency Data

Simple Granger Causality Tests for Mixed Frequency Data Eric Ghysels Jonathan B Hill Kaiji Motegi July 30, 2016 Abstract The paper presents simple Granger causality tests applicable to any mixed frequency