Class: Trend-Cycle Decomposition

Class: Trend-Cycle Decomposition Macroeconometrics - Spring 2011 Jacek Suda, BdF and PSE June 1, 2011

Outline Outline: 1 Unobserved Component Approach 2 Beveridge-Nelson Decomposition 3 Spectral Analysis

Detrending Need stationary series: Y t = X t β + ε t Granger and Newbold (1974, JoE, Spurious Regressions in Econometrics ) If y t and X t are independent random walk (β = 0), ˆβ OLS non-zero random variable, and ˆt β=0 is large: spurious regression phenomenon. Taking difference instead of levels (so we get stationary series) will bring larger standard errors => cannot reject hypothesis. Detrending still allows to analyze levels. Sometimes we are interested in trend alone.

Trend/Cycle Observable series y t y t = τ t + c t τ t is trend, and c t is transitory component, (I(0)). If trend contains stochastic component, random walk, then if we apply HP we get spurious cycle. τ t = µ + τ t 1 + η t We have two unobserved components and if we can model the cycle we can try to use unobserved component estimation.

Unobserved Components Approach Watson (1986, JME), Clark (1987, QJE), Morley, Nelson, Zivot (2003, ReStat) Approach: parametric model for c t Model ( Structural) y t = τ t + c t τ t = µ + τ t 1 + η t, η t iidn(0, ση) 2 φ(l)c t = ε t, ε t iidn(0, σ 2 ε), cov(η t, ε t ) = σ εη

Problem: Identification We have 1 observable series and 2 unobservable components. To get 2 unobservable components, we need some identification assumptions. Identification: If c t = ε t or c t = φc t 1 + ε t, then σ εη is not identified from the data. There can be infinitely many values of σ εη that would produce the same autocovariance generating function for the first series. However, that does not mean that all values of σ εη are equal. If it is set to zero, it imposes restriction on autocovariance generating function of 1 st differences.

Example: AR(1) Example: AR(1) y t = τ t + c t τ t = µ + τ t 1 + η t c t = φc t 1 + ε t Structural model: 5 parameters: µ, ση, 2 σε, 2 φ, σ εη. How many parameters can be identified from data? Reduced-Form First-difference equation y t = τ t + c t (1 L)y t = (1 L)τ t + (1 L)c t y t = µ + η t + (1 L)(1 φl) 1 ε t

Example: AR(1) Multiply both sides by (1 φl): (1 φl) y t = (1 φl)µ + (1 φl)η t + (1 L)ε t = c + η t + φη t 1 + ε t ε t 1, c = (1 φ)µ. They are unobserved but we have a sum of two iid series η t + ε t + ( φ)η t 1 + ( 1)ε t 1 The sum of two white noise processes = white noise: same moments as MA(1). So this model is observationally equivalent to y t = c + φ y t 1 + e t + θe t 1 ARMA(1,1) = 4 parameters: c, φ, θ, σ 2 e, that s how many we can estimate. We have 5 parameters but only 4 observed. So far estimates assumes one of parameters fixed.

Estimation Assume σ εη = 0 (Watson, Harvey, Clark). => shocks that drive transitory movements are not correlated with those that drive long-run behavior. With this assumption the model can be estimated: 1 Find match (functional) of observed/estimated parameters with the ones from structural model, or 2 Cast the model in a state space form and estimate via Kalman Filter:

State-Space Form Observation equation State equation [ τt c t ] y t = [ 1 1 ] [ τ t c t ] y t = Hβ t = [ µ 0 ] [ 1 0 + 0 φ ] [ τt 1 c t 1 ] [ ηt + ε t β t = ˆµ + Fβ t 1 + e t, e t N(0, Q), [ ] σ 2 Q = η 0 0 σε 2 ],

Kalman Filter: Results Kalman Filter does not care about how we came up with state form. KF: τ t t and c t t, τ t T, c t T. We say τ t and c t are uncorrelated with each other, by assumption. corr(η t t, ε t t ) = 1 even though we assume corr(η t, ε t ) = 0. In classical approach corr(x t, ˆε t ) = 0 by construction, even though true relationship is corr(x t, ε t ) 0. Estimates of correlation rather than sample correlation of estimates. Identification: If we estimate the model without assuming σ εη Gauss will not converge as there is many numbers of σ εη for which likelihood doesn t decrease.

Morley, Nelson and Zivot (2003) RW + AR(2) makes model identified. Why? AR(1) cycles is not observationally different from RW. AR(2) has this feature that cannot be proxied by RW. Morley, Nelson and Zivot (2003): σ εη identified for c t ARMA(p, q), with p q + 2.

Example: AR(2) Model: y t = τ t + c t τ t = µ + τ t 1 + η t c t = φ 1 c t 1 + φ 2 c t 2 + ε t 6 parameters: µ, φ 1, φ 2, σ 2 η, σ 2 ε, σ εη. Pre-multiplying both sides with (1 L): y t = (1 L)τ t + (1 L)c t = µ + η t + (1 L)(1 φ 1 L φ 2 L 2 ) 1 ε t (1 φ 1 L φ 2 L 2 ) y t = (1 φ 1 φ 2 )µ + η t φ 1 η t 1 φ 2 η t 2 + ε t ε t 1 The model is observationally equivalent to ARMA(2,2) model: y t ARMA(2, 2) with 6 parameters: c, φ 1, φ 2, θ 1, θ 2, σ 2 e.

Results We can map parameters of ARMA(2,2) to our structural model or estimate KF with. [ ] σ 2 Q = η σ εη σ εη For US real GDP, setting σ εη = 0 can be rejected: ρ εη = 0.9. τ t is volatile Structural model with ARMA(3) has 7 structural parameters but is observationally equivalent to reduced-form version ARMA(3,3) with 8 parameters: overidentification. σ 2 ε Not such a big problem; ρ εη < 0 still holds.

Trend in UC Model τ t t is equivalent to Beveridge-Nelson trend for ARMA(2,2). From Kalman Filter: τ t t E[τ t Ω t ] = lim M E[τ t + c t+m Ω t ] E[cycles] far away in future, given current information, are zero. τ t t = lim M E[τ t + c t+m Ω t ] = lim M E[τ t + M η t+j + c t+m Ω t ] j=1 = lim M E[τ t + Mµ + M η t+j + c t+m Mµ Ω t ] j=1 = lim M E[y t+m Mµ Ω t ] Beveridge Nelson trend BN: expectations about where the series is in the future. They are different for AR(1): restricted UC model, (σ ηε = 0).

Beveridge-Nelson Decomposition BN trend is the long-run conditional forecast (minus deterministic trend) Let y t be y t I(1) and express it as where Then T t = TD t + TS t is trend, y t = TD t + TS t + c t, TD t is deterministic part of trend, and z t = TS t + c t is stochastic component comprising both stochastic trend, TS t, and stochastic cycle, c t. z t I(0)

Beveridge-Nelson Decomposition Since z t is covariance-stationary then it has a Wold form representation z t = Ψ (L)e t, Ψ (L) = Ψ k L k, Ψ 0 = 1, k=0 e t iid Result: where Ψ (1) = Ψ(L) = Ψ (L) = Ψ (1) + (1 L) Ψ(L), Ψ k, = long run impact of forecast error on y t k=0 Ψ j L j, Ψj = Ψ k, j=0 k=j+1 with (1 L) Ψ(L) measuring transitory impact of forecast errors.

Beveridge-Nelson Decomposition Then z t = z t 1 + Ψ (L)e t i.e. z t is like random walk with innovations of Wold form t z t = z 0 + Ψ (L) j=1 t z t = z 0 + Ψ (1) e j + (1 L) Ψ(L) j=1 y t = y 0 + µ t + Ψ (1) and e j j=1 t j=1 e j t e j + (1 L) Ψ(L) t e j, TD t = y 0 + µ t, TS t = ψ (1) c t = (1 L) Ψ(L) t j=1 e j t e j j=1 j=1

Example MA(1) Example: MA(1) y t = µ + e t + θe t 1, e t iid y t = µ + Ψ (L)e t, Ψ (L) = 1 + θl Beveridge-Nelson decomposition: Ψ (L) = Ψ (1) + (1 L) Ψ(L).

Example MA(1) For MA(1) Ψ (1) = 1 + θ (1 L) Ψ(L) = (1 L) BN decomposition: Ψ k L k, j=0 Ψ k = Ψ 0 = (Ψ 1 + Ψ 2 + Ψ 3 +...) = θ Ψ 1 = (Ψ 2 + Ψ 3 + Ψ 4 +...) = 0 Ψ j = 0. y t = y 0 + µ t + (1 θ) t e j θe t, j=1 j=k+1 with BN trend = y 0 + µ t + (1 θ) t j=1 ej, and θe t is transitory, BN cycle. Note that: corr(trend, cycle) = 1 for all models not just AR(1). Ψ j

Example AR(1) Example: AR(1) ( y t µ) = φ( y t 1 µ) + e t E t [( y t+1 µ)] = φ( y t µ) E t [( y t+2 µ)] = φ 2 ( y t µ) E t [( y t+j µ)] = φ j ( y t µ). To calculate a forecast of how far away from the trend you will be, all you need is how far away you are today.

Example AR(1) Sum them up Then, lim J J E t [( y t+j µ)] = (φ 1 + φ 2 + φ 2 +... + φ J )( y t µ) j=1 J j=1 E t [ ] = lim J (φ1 +φ 2 +φ 2 +...+φ J )( y t µ) = φ 1 φ ( y t µ) Beveridge-Nelson decomposition: BN trend t = y t + φ 1 φ ( y t µ) BN cycle t = φ 1 φ ( y t µ)

Remarks BN decomposition vary: for different forecasting model we have different BN decomposition. E[τ t Ω t ] - true trend with unobserved component model. Trend follows random walk in both interpretations. And variability of this RW is the same under both interpretations. BN trend estimate of true trend (KF). BN is applicable to any forecasting model: linear and non-linear. BN avoids spurious cycles (unlike HP and Baxter-King).

Time Domain Wold Form: Y t = µ + Ψ j ε t j, j=0 ε WN The {Y t } process can be decomposed into the sum of linear combination of shocks (errors). It s time domain because we can see Y t as function of past (in time) realization of shocks.

Frequency Domain For covariance-stationary process Y t = µ + π 0 α(ω) cos(ωt)dω + π 0 δ(ω) sin(ωt)dω It s a weighted average (in continuous time) of periodic cycles (sin and cos). ω determines periods: how frequent the cycles are

Cos Plot@Cos@.5 xd, 8x, 0, 2 Pi<, Ticks 880, Pi, 2 Pi<, 8 1, 1<<D 1 p 2 p -1 Plot@Cos@2 xd, 8x, 0, 2 Pi<, Ticks 880, Pi, 2 Pi<, 8 1, 1<<D 1 p 2 p -1 Plot@Cos@4 xd, 8x, 0, 2 Pi<, Ticks 880, Pi, 2 Pi<, 8 1, 1<<D 1 p 2 p -1

Auto-covariance Generating Function Autocovariance Generating Function, where z is a complex scalar j-th autocovariance g y (z) = j= γ j z j γ j = E[(Y t µ)(y t j µ)] Exist if the sequence of autocovariances {γ j} j= is absolutely summable A function of all autocovariances for covariance-stationary process. For a covariance-stationary process it is a finite number.

Population spectrum Population spectrum ω is a real scalar. S Y (ω) = 1 2π g y(e iω ) = 1 2π Since e iωj = cos(ωj) i sin(ωj) S Y (ω) = 1 2π γ 0 + 2 as γ j = γ j. S Y (ω) γ j s. γ j e iωj, j= γ j cos(ωj) We capture how much variation in time sense is due to variability (cycles) at cos and sin at different frequencies. j=1

Spectral Representation Theorem For a covariance-stationary process Y t = µ + π 0 [α(ω) cos(ωt) + δ(ω) sin(ωt)] dω for any frequencies < 0 < ω 1 < ω 2 <... < ω n < π and ω2 ω 1 ω2 ω 1 α(ω)dω uncorrelated with δ(ω)dω uncorrelated with ω4 ω 3 ω4 ω 3 α(ω)dω, δ(ω)dω. Decomposition of covariance stationary series into orthogonal components due to cycles at different frequencies.

Example: White Noise Y t WN SyHwL σ 2 ê2π p w Flat spectrum: 2 area = σ 2 = var(y t ). This defines white noise process there is equal weight on cycles for each frequency: variation is divided equally by cycles with different frequencies. In general, 2 area under spectral density = variance.

Example: AR(1), MA(1) E.g. MA(1), AR(1), ARMA(1,1) SyHwL MAH1L, q=0.5 SyHwL 0.7 ARH1L, f=0.5 0.4 0.6 0.3 0.5 0.4 0.2 0.3 0.1 0.2 0.1 0 p 2 p w 0 0 p 2 p w Low ω corresponds to low frequency and long cycles. The area under the curve depicts how much variability corresponds to given frequency of fluctuations. Hight is just as important as shape of the S Y (ω).

In the short horizon I can t see as much variability as over the longer time. For AR(1): var(y t Ω t 1 ) = σ 2, var(y t ) = σ2 1 φ 2, vary t > var(y t Ω t 1 ). Spectrum only for covariance stationary processes. For AR process there is more variation lower frequency than higher. E.g. AR(2): There are some frequencies that account for a lot of variation in the process. Variation in the process is driven by some middle frequencies. Peak in spectral density could be evidence that RBC are indeed cycles.

Randomness Cycles at frequency zero (it s not a cycle): how much of the movement are due to shocks that never occur cyclically. Spectral density at frequency zero tells about persistence of series (long-run variance). Suppose X t is log GDP and Y t GDP growth: Then Y t = X t S Y (0) = S X (0) extent to which a shock to X has permanent effect on X and is not just transitory cycle. If X t (level) is covariance-stationary then S X (0) = 0 no mass at 0 frequency because no permanent movement in it.

Unit Root If S X (0) 0 then X t is not covariance stationary. For unit root, S X (0) = : accumulation of shocks that never dies out. You can calculate sample spectrum. For non-stationary process we will not get, as we would for population, but a number (we can see it as downward bias).

Filtering It is easy to apply filter to data when you think about spectral representation of the process. Example: GDP has important seasonal component (1-4 quarters). The long-run variability might be swamped by the short-term seasonal variation. The variation in the series might be due to seasonality while we might be more interested in relative lower frequencies. Regressing C on Y might produce wacky results as they might be driven purely by seasonal behavior.

Filtering Filtering: remove or isolate movements in covariance-stationary series at different horizons: e.g. remove seasonality. Filter is set of weights to be applied to different frequencies: Y t = h(l)x t, S Y (ω) = h(e iω ) 2 S X (ω) h(l) is a filter. E.g. h(l) = 1 L 12 : seasonal filter for monthly data. Note: do not apply spectral analysis to integrated time series processes (not covariance stationary). If filter is applied to non-stationary series and then it s differenced the filter is distorted. We may get spurious cycle: cycles in the place where there are no cycles.