CONJUGATE DUMMY OBSERVATION PRIORS FOR VAR S

Similar documents
ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS

Eco517 Fall 2014 C. Sims FINAL EXAM

MAXIMUM LIKELIHOOD, SET ESTIMATION, MODEL CRITICISM

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

ECO 513 Fall 2008 C.Sims KALMAN FILTER. s t = As t 1 + ε t Measurement equation : y t = Hs t + ν t. u t = r t. u 0 0 t 1 + y t = [ H I ] u t.

(I AL BL 2 )z t = (I CL)ζ t, where

MAKING MACRO MODELS BEHAVE REASONABLY

COMMENT ON DEL NEGRO, SCHORFHEIDE, SMETS AND WOUTERS COMMENT ON DEL NEGRO, SCHORFHEIDE, SMETS AND WOUTERS

Econ 423 Lecture Notes: Additional Topics in Time Series 1

TAKEHOME FINAL EXAM e iω e 2iω e iω e 2iω

Financial Econometrics

Cointegrated VAR s. Eduardo Rossi University of Pavia. November Rossi Cointegrated VAR s Financial Econometrics / 56

Testing Restrictions and Comparing Models

MID-TERM EXAM ANSWERS. p t + δ t = Rp t 1 + η t (1.1)

Vector Auto-Regressive Models

VAR Models and Applications

E 4160 Autumn term Lecture 9: Deterministic trends vs integrated series; Spurious regression; Dickey-Fuller distribution and test

Part 6: Multivariate Normal and Linear Models

Financial Econometrics

Eco517 Fall 2014 C. Sims MIDTERM EXAM

ECON 4160, Spring term Lecture 12

Switching Regime Estimation

APPLIED TIME SERIES ECONOMETRICS

Regression. ECO 312 Fall 2013 Chris Sims. January 12, 2014

Vector Autoregression

Simultaneous Equation Models Learning Objectives Introduction Introduction (2) Introduction (3) Solving the Model structural equations

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

New Introduction to Multiple Time Series Analysis

ERROR BANDS FOR IMPULSE RESPONSES *

Multivariate Time Series: Part 4

Vector autoregressions, VAR

Statistics 910, #5 1. Regression Methods

An Introduction to Bayesian Linear Regression

Econometrics Summary Algebraic and Statistical Preliminaries

PROBABILITY MODELS FOR MONETARY POLICY DECISIONS. I.3. Lack of a language to discuss uncertainty about models and parameters.

E 4101/5101 Lecture 9: Non-stationarity

Economics 308: Econometrics Professor Moody

PhD/MA Econometrics Examination January 2012 PART A

ECON 4160, Lecture 11 and 12

Eco504, Part II Spring 2010 C. Sims PITFALLS OF LINEAR APPROXIMATION OF STOCHASTIC MODELS

Christopher Dougherty London School of Economics and Political Science

The regression model with one stochastic regressor.

Lecture 2: Univariate Time Series

TESTING FOR CO-INTEGRATION

Inference when identifying assumptions are doubted. A. Theory B. Applications

Inference when identifying assumptions are doubted. A. Theory. Structural model of interest: B 1 y t1. u t. B m y tm. u t i.i.d.

STA 4273H: Sta-s-cal Machine Learning

Eco517 Fall 2004 C. Sims MIDTERM EXAM

Using EViews Vox Principles of Econometrics, Third Edition

Bayesian Linear Regression

Økonomisk Kandidateksamen 2005(I) Econometrics 2 January 20, 2005

Lecture 5: Unit Roots, Cointegration and Error Correction Models The Spurious Regression Problem

FORECASTING SUGARCANE PRODUCTION IN INDIA WITH ARIMA MODEL

Multivariate ARMA Processes

Advanced Econometrics

Gibbs Sampling in Linear Models #2

Stat 5101 Lecture Notes

Univariate ARIMA Models

Econometrics Review questions for exam

A Guide to Modern Econometric:

Introduction to Eco n o m et rics

Dynamic Regression Models (Lect 15)

FINANCIAL ECONOMETRICS AND EMPIRICAL FINANCE -MODULE2 Midterm Exam Solutions - March 2015

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

7. Integrated Processes

Introduction to Econometrics

Stat 5100 Handout #12.e Notes: ARIMA Models (Unit 7) Key here: after stationary, identify dependence structure (and use for forecasting)

1 Teaching notes on structural VARs.

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

BEAR 4.2. Introducing Stochastic Volatility, Time Varying Parameters and Time Varying Trends. A. Dieppe R. Legrand B. van Roye ECB.

7. Integrated Processes

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I.

USING A LIKELIHOOD PERSPECTIVE TO SHARPEN ECONOMETRIC DISCOURSE: THREE EXAMPLES

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Multivariate forecasting with VAR models

Multivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8]

Vector Autogregression and Impulse Response Functions

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Econometría 2: Análisis de series de Tiempo

The linear model is the most fundamental of all serious statistical models encompassing:

EC408 Topics in Applied Econometrics. B Fingleton, Dept of Economics, Strathclyde University

Economic modelling and forecasting

It is easily seen that in general a linear combination of y t and x t is I(1). However, in particular cases, it can be I(0), i.e. stationary.

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

Linear Models in Econometrics

Multivariate Time Series: VAR(p) Processes and Models

The multiple regression model; Indicator variables as regressors

Elements of Multivariate Time Series Analysis

Non-Stationary Time Series, Cointegration, and Spurious Regression

Matrix Factorizations

ECON 4160: Econometrics-Modelling and Systems Estimation Lecture 7: Single equation models

Dynamic Regression Models

PhD/MA Econometrics Examination. January, 2015 PART A. (Answer any TWO from Part A)

Forecasting 1 to h steps ahead using partial least squares

Cointegration Lecture I: Introduction

PRIORS FOR THE LONG RUN

ERROR BANDS FOR IMPULSE RESPONSES

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Transcription:

ECO 513 Fall 25 C. Sims CONJUGATE DUMMY OBSERVATION PRIORS FOR VAR S 1. THE GENERAL IDEA As is documented elsewhere Sims (revised 1996, 2), there is a strong tendency for estimated time series models, when the estimates are based on a flat prior and on likelihood conditional on initial observations, to attribute unreasonable explanatory power to initial conditions. Since classical estimation methods (like OLS in simple unrestricted VAR s) are close to flat-prior Bayesian methods, they have this same tendency. The pathology is reflected, in classical distribution theory, in the small-sample bias toward stationarity in estimated models. The problem with the flat-prior estimates is that in these models a flat prior implies that we give considerable credence to the possibility that in our sample initial conditions are very far from the model s steady state. If this were true, it would imply that the data contain an initial transient" that allows the model to be estimated with much higher precision in the sample at hand than would be the case in a typical sample taken from data after the end of the current sample period. While this may be a realistic possibility for some applications, in most economic applications it is not. One solution to this problem is to use the unconditional likelihood. That is, derive the mapping from the model s parameters to the unconditional distribution of the initial conditions, and combine that pdf with the usual likelihood function in doing inference. This approach requires the use of iterative nonlinear optimization methods, but it is quite feasible. The main disadvantage of this approach is that, because there is no unconditional distribution of the initial conditions if the model implies non-stationarity, the approach applies only when we are certain in advance that the model should imply stationarity, and this is seldom the case in economic applications. Another approach is to introduce a prior that captures our belief that it is implausible that initial transients explain a large part of observed long-run variation, which is what we explain how to do here. 2. MECHANICS The basic idea of dummy observation priors for VAR s is the same as that for dummy observation priors in ordinary regression models. Intuitively, one adds extra data" to the sample that express prior beliefs about the parameters. The prior takes the form of the likelihood function for the dummy observations. The difference from single-equation regression models is that the added observations are used in all the equations of the system, not just in one. Circumstances can arise where one has beliefs about parameters in some equations but not others, or has different beliefs about different equations. In fact, the Minnesota Prior" that has been widely used in VAR forecasting models and is automated in the Date: May 9, 26. c 26 by Christopher A. Sims. This document may be reproduced for educational and research purposes, so long as the copies contain this notice and are retained for personal use or distributed free.

2 CONJUGATE DUMMY OBSERVATION PRIORS FOR VAR S RATS program has this character. It might seem we could handle such a prior just by creating separate sets of dummy observation for each equation, and indeed this is the way RATS and most of the research in the literature that uses priors has implemented them. However, once the dummy observations differ across equations, the seemingly unrelated regressions algebra that justifies equation-by-equation estimation no longer applies. So to maintain internal consistency in likelihood calculations, it is best to stick to pure systemwide dummy observations. Such dummy observations amount to stating prior beliefs about how the whole system will behave. In all our discussion below we will assume a model in the form (1) y(t) = c + k B s y(t s) + ε(t) with data for t = 1,..., T and ε(t) {y(t s), s 1; B, c, Σ} N(, Σ). 2.1. The one-unit-root prior. The type of prior we describe here is called in the published literature a dummy initial observation" prior. This is not very descriptive however, so we ll call it the one-unit-root prior. We introduce data for the artificial date t in which y(t ) n 1 = y(t 1) =... = y(t k) = ȳ λ n 1 and the vector of 1 s that corresponds to the constant term in the data matrix is set to λ in the t observation. The vector ȳ is usually set to the sample mean of the initial conditions, i.e. (2) ȳ = 1 k k y(1 k). If we write out the equation for this observation, we get ( k (3) ȳλ = B s )ȳλ + cλ + ε(t ). If I B(1) is full rank, this equation can be rearranged to read (4) ȳ = ( I B(1) ) 1 c + ( I B(1) ) 1ε(t)λ 1. It might appear, then, that the dummy observation is equivalent to asserting an unconditional pdf for ȳ, i.e. ( (I ) 1c, (5) ȳ {B, c, Σ} N B(1) λ 2 ( I B(1) ) 1 ( ) ) 1 Σ I B(1). In the case where n = k = 1, so ȳ = y(), this would simply be an unconditional pdf for y(). Otherwise, it is an unconditional pdf for a particular set of linear combination of elements of the initial conditions {y(),..., y( k + 1)}, so amounts to an improper unconditional pdf for the initial conditions. However, the dummy observation does not actually add to the log likelihood the same terms that we would add by using (5) as an unconditional pdf for the initial conditions. The dummy observation adds the terms (6) 1 2 log Σ λ2 2 (( I B(1) )ȳ c ) Σ 1 (( I B(1) ) ȳ c ).

CONJUGATE DUMMY OBSERVATION PRIORS FOR VAR S 3 Using (5) as an unconditional distribution for ȳ would add the same terms, but in addition would add log λ + log I B(1). The extra term in λ has no effect on the posterior for c, B and Σ, but the term in B(1) has an important effect. In using the dummy observation, then, we are acting as if we have an unconditional distribution for ȳ given by (5), and at the same time have an improper prior on B proportional to I B(1) 1. The reason for calling this dummy observation a single-unit root" prior is that I B(1) = is exactly the condition that there be at least one unit root in the system. Thus the prior is centered on a part of the parameter space where either c = and the system contains a unit root with ȳ as its eigenvector, or c =, y is stationary, and y() is close to the model s implied population mean. If one wants the prior to lean more sharply toward the presence of a unit root, one can introduce separate dummy observations in which all the y s are zero and the element of the data matrix to which c applies is non-zero. In conjunction with the single-unit-root prior as we have set it out, this forces c closer to zero and thus beliefs closer to a unit-root case. One could also add dummy observations that have the same form as the single-unitroot dummy observation, except that the column of the data matrix corresponding to c is set to zero. This type of dummy observation used alone would center on the region where there is at least one unit root and would not give a priori plausibility to any stationary models. If used in isolation, though, it puts no limits on c, and thus could allow estimates in which, despite the presence of a unit root, there are still strong initial transients. 3. NO-COINTEGRATION DUMMY OBSERVATIONS Cointegration is the situation where a model has some stable roots and some unit roots. If I B(1) has m unit eigenvalues (with y n 1), none of which are repeated 1, then there are n m linear combinations z γ (t) = γy(t) that are stationary, even though it may well be that every element of the y vector is non-stationary. The single-unit-root prior is consistent with the presence of cointegration, since it favors only the existence of some unit root. A set of n dummy observations that centers on unit-root behavior separately for each element of y can be formed by taking as the j th dummy observation one in which, at the artificial date t j,. (7) y(t j ) = ȳ j λ = y(t j 1) =... = y(t j k),. 1 In the sense that none gives rise to off-diagonal entries in the Jordan decomposition. The literature on cointegration has tended to focus on this case, even though it is hard to see why we should be confident a priori that unit roots will not repeat.

4 CONJUGATE DUMMY OBSERVATION PRIORS FOR VAR S with the ȳ j in the bracketed vector occurring in the j th position, and the column of the data matrix corresponding to c set to. This dummy observation then reads as, for the j th equation (8) λȳ j = s and for k = j, (9) = s B jjs ȳ j λ + ε j (t j ) B kjs ȳ j + ε k (t j ). It is easy to see that this prior favors B s whose off-diagonal elements are small, and also B s that are close to putting a unit root in the B jj (L) polynomial for each j. This prior works to suppress initial transients, but it does much more than that. It may be useful as an approximate representation of widely shared prior beliefs (that unit roots are present, that cross-variable relations are weak) but it should not be justified solely by appeal to the idea that initial transients are implausible. This kind of dummy observation also puts no limits on c. Thus if one wants to downweight versions of the model in which there are unit roots and c 2 j >> Σ jj, i.e. models in which deterministic trend components are much more important than the error term, then one should combine these no-cointegration dummy observations with dummy observations that favor c =. Note that dummy observation priors can be combined. It has been common in applied work to apply both single-unit-root and no-cointegration dummy observations. 4. THE MINNESOTA PRIOR We are working back in historical time. The use of no-cointegration priors in applied work preceded the use of single-unit-root priors, but preceding them both was the Minnesota Prior. The Minnesota Prior postulates a separate set of dummy observations for each equation, implementing a prior that specifies independent distributions for all coefficients, with the coefficient B jks having mean unless j = k and s = 1, in which case the mean is 1, and variance π1 1 π1 δ(j,k) 2 s π 3, where δ(j, k) is a function equal to 1 if j = k and otherwise. With π 2 < 1, this prior specifies that coefficients on own lags (B jj (s), s = 1,..., k) are likely to be larger in absolute value than coefficients on other variables, and with π 3 >, it specifies that coefficients on more distant lags are likely to be smaller. This prior has usually been implemented, inconsistently, via separate OLS estimates using a distinct set of dummy observations on each equation. As we have already pointed out, the usual ML or posterior mean interpretation of equation-by-equation OLS is not correct when the data matrix (dummy observations or otherwise) is different across equations. If we omit the π 2 term (that is, set it to 1), then own lag coefficients and crossvariable coefficients are treated symmetrically, and the prior can be implemented with system-wide dummy observations. If t ks is the artificial date for the dummy observation applying to the s th lag of the k th variable, one of these dummy observations has the

form (1) CONJUGATE DUMMY OBSERVATION PRIORS FOR VAR S 5.. y(t k1 ) = π 1 σ k = y(t k1 1), y(t ks s) = π 1 s π 3σ k,. (11) y(t ks v) = for v = s > 1 and for v > s = 1. The parameter σ k is a measure of the degree of variability in the k th variable. It has most commonly, again somewhat inconsistently, been set by estimating a low-order univariate AR regression for each variable and taking σ k to be the standard deviation of the residual of that regression. It could also be set as the sample standard deviation of the initial conditions for each variable. In a recent paper Sims and Zha (1998) a new approach to variable-by-variable setting of the prior has been proposed that captures the spirit of the Minnesota Prior without the inconsistencies and with nearly the same degree of convenience. The approach of that paper also generalizes to the case of structural VAR s (models with restrictions on Σ), which the Minnesota Prior does not. In this course, we will not have time to go into the details of this new approach. 5. EXERCISE DUE FRIDAY, 12/17 Repeat the VAR estimation and testing that you carried out for the last exercise, this time with a single-unit-root prior with λ = 1 on each system you estimate. Compare the results to those you obtained originally. For each estimated system, compute and plot the actual time series y(t), t = 1,..., T and the projections from initial conditions, E [y(t)], t = 1,..., T. Was there a problem with initial transients to start with? Did the single-unit-root prior fix" it? REFERENCES SIMS, C. A. (2): Using a Likelihood Perspective to Sharpen Econometric Discourse: Three Examples, Journal of Econometrics, 95(2), 443 462, http://www.princeton. edu/~sims/. (revised 1996): Inference for Multivariate Time Series with Trend, Discussion paper, presented at the 1992 American Statistical Association Meetings, http: //sims.princeton.edu/yftp/trends/asapaper.pdf. SIMS, C. A., AND T. ZHA (1998): Bayesian Methods for Dynamic Multivariate Models, International Economic Review..