Introduction to Bayesian Inference

Size: px

Start display at page:

Download "Introduction to Bayesian Inference"

Edwin Murphy
5 years ago
Views:

1 University of Pennsylvania EABCN Training School May 10, 2016

2 Bayesian Inference Ingredients of Bayesian Analysis: Likelihood function p(y φ) Prior density p(φ) Marginal data density p(y ) = p(y φ)p(φ)dφ Bayes Theorem: p(φ Y ) = p(y φ)p(φ) p(y )

3 Linear Regression / AR Models Consider AR(1) model: y t = y t 1 φ + u t, u t iidn(0, 1). Let x t = y t 1. Write as or y t = x tφ + u t, u t iidn(0, 1), Y = X φ + U. We can easily allow for multiple regressors. Assume φ is k 1. Notice: we treat the variance of the errors as know. The generalization to unknown variance is straightforward but tedious. Likelihood function: p(y φ) = (2π) T /2 exp { 1 } 2 (Y X φ) (Y X φ).

4 A Convenient Prior Prior: φ N ) { (0 k 1, τ 2 I k k, p(φ) = (2πτ 2 ) k/2 exp 1 } 2τ 2 φ φ Large τ means diffuse prior. Small τ means tight prior.

5 Deriving the Posterior Bayes Theorem: p(φ Y ) p(y φ)p(φ) { exp 1 } 2 [(Y X φ) (Y X φ) + τ 2 φ φ]. Guess: what if φ Y N( φ T, V T ). Then { p(θ Y ) exp 1 } 2 (φ φ T ) 1 V T (φ φ T ). Rewrite exponential term Y Y φ X Y Y X φ + φ X X φ + τ 2 φ φ = Y Y φ X Y Y X φ + φ (X X + τ 2 I)φ ( ) ( ) = φ (X X + τ 2 I) 1 X Y X X + τ 2 I ( ) φ (X X + τ 2 I) 1 X Y +Y Y Y X (X X + τ 2 I) 1 X Y.

6 Deriving the Posterior Exponential term is a quadratic function of φ. Deduce: posterior distribution of φ must be a multivariate normal distribution φ Y N( φ T, V T ) with φ T = (X X + τ 2 I) 1 X Y τ : τ 0: V T = (X X + τ 2 I) 1. φ Y approx ( ) N ˆφ mle, (X X ) 1. φ Y approx Pointmass at 0

7 Marginal Data Density Plays an important role in Bayesian model selection and averaging. Write p(y θ)p(θ) p(y ) = p(θ Y ) { = exp 1 } 2 [Y Y Y X (X X + τ 2 I) 1 X Y ] (2π) T /2 I + τ 2 X X 1/2. The exponential term measures the goodness-of-fit. I + τ 2 X X is a penalty for model complexity.

8 Posterior We will often abbreviate posterior distributions p(φ Y ) by π(φ) and posterior expectations of h(φ) by E π [h] = E π [h(φ)] = h(φ)π(φ)dφ = h(φ)p(φ Y )dφ. We will focus on algorithms that generate draws {φ i } N i=1 from posterior distributions of parameters in time series models. These draws can then be transformed into objects of interest, h(φ i ), and under suitable conditions a Monte Carlo average of the form h N = 1 N N h(φ i ) E π [h]. i=1 Strong law of large numbers (SLLN), central limit theorem (CLT)...

9 Direct Sampling In the simple linear regression model with Gaussian posterior it is possible to sample directly. For i = 1 to N, draw φ i from N ( φ, Vφ ). Provided that V π [h(φ)] < we can deduce from Kolmogorov s SLLN and the Lindeberg-Levy CLT that a.s. h N E π [h] N ( h N E π [h] ) = N ( 0, V π [h(φ)] ).

10 Decision Making The posterior expected loss associated with a decision δ( ) is given by ρ ( δ( ) Y ) = L ( θ, δ(y ) ) p(θ Y )dθ. Θ A Bayes decision is a decision that minimizes the posterior expected loss: δ (Y ) = argmin d ρ ( δ( ) Y ). Since in most applications it is not feasible to derive the posterior expected risk analytically, we replace ρ ( δ( ) Y ) by a Monte Carlo approximation of the form ρ N ( δ( ) Y ) = 1 N N L ( θ i, δ( ) ). i=1 A numerical approximation to the Bayes decision δ ( ) is then given by δ N(Y ) = argmin d ρ N ( δ( ) Y ).

11 Inference Point estimation: Quadratic loss: posterior mean Absolute error loss: posterior median Interval/Set estimation P π {θ C(Y )} = 1 α: highest posterior density sets equal-tail-probability intervals

12 Forecasting Example: h 1 y T +h = θ h y T + θ s u T +h s s=0 h-step ahead conditional distribution: y T +h (Y 1:T, θ) N (θ h y T, 1 ) θh. 1 θ Posterior predictive distribution: p(y T +h Y 1:T ) = p(y T +h y T, θ)p(θ Y 1:T )dθ. For each draw θ i from the posterior distribution p(θ Y 1:T ) sample a sequence of innovations u i T +1,..., ui T +h and compute y i T +h as a function of θ i, u i T +1,..., ui T +h, and Y 1:T.

13 Model Uncertainty Assign prior probabilities γ j,0 to models M j, j = 1,..., J. Posterior model probabilities are given by γ j,t = γ j,0 p(y M j ) J j=1 γ j,0p(y M j ), where p(y M j ) = p(y θ (j), M j )p(θ (j) M j )dθ (j) Log marginal data densities are one-step-ahead predictive scores: ln p(y M j ) T = ln p(y t θ (j), Y 1:t 1, M j )p(θ (j) Y 1:t 1, M j )dθ (j). t=1 Model averaging: J p(h Y ) = γ j,t p(h j (θ (j) ) Y, M j ). j=1

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference 1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE