I. Bayesian econometrics A. Introduction B. Bayesian inference in the univariate regression model C. Statistical decision theory D. Large sample results E. Diffuse priors F. Numerical Bayesian methods 1. Importance sampling Generic Bayesian problem: py likelihood (known) p prior (known) goal: calculate p Y py p G for G py pd Analytical approach: choose p from a family such that G can be found with clever algebra. Numerical approach: satisfied to be able to generate draws 1, 2,..., D from the distribution p Y without ever knowing the distribution (i.e., without calculating G) 1
Importance sampling: Step (1): Generate j from an (essentially arbitrary) importance density g. Step (2): Calculate j py j p j g j. Step (3): Weight the draw j by j to simulate distribution of p Y. Examples: E Y p Yd D j j D j D j j j Var Y D j 2
D Prob 2 0 2 j 0 j D j How does this work? D j j D D1 j j j D 1 D j D Numerator: D D 1 j j p E j j gd py p g py pd gd 3
Denominator: D D 1 j p E j gd py p gd g py pd py Conclusion: D j j p py pd D j py p Yd Example: 0, 1 Importance density g: U0, 1 4
Draws from uniform importance density 3.0 3.0 2.5 2.5 2.0 2.0 1.5 1.5 1.0 1.0 0.5 0.5 0.0 0.00 0.25 0.50 0.75 1.00 0.0 Reweighted draws 3.0 3.0 2.5 2.5 2.0 2.0 1.5 1.5 1.0 1.0 0.5 0.5 0.0 0.00 0.25 0.50 0.75 1.00 0.0 Algorithm will converge faster the more the importance density resembles the target 5
Draws from the importance density g(x) = 3x 2 3.0 3.0 2.5 2.5 2.0 2.0 1.5 1.5 1.0 1.0 0.5 0.5 0.0 0.00 0.25 0.50 0.75 1.00 0.0 Reweighted draws 3.0 3.0 2.5 2.5 2.0 2.0 1.5 1.5 1.0 1.0 0.5 0.5 0.0 0.00 0.25 0.50 0.75 1.00 0.0 What s required of g.? j j j py j p j g j should satisfy Law of Large Numbers. 6
Khintchine s Theorem: If x j D is i.i.d. with finite mean, then D 1 D p x j Note: does not require x j to have finite variance j are drawn i.i.d. from g by construction So we only need E Y p Yd exists p Y kpy p support of g includes However, convergence may be very slow if variance of j py j p j g j is infinite. Practical observations: works best if g has fatter tails than py p works best when g is good approximation to p Y 7
Always produces an answer, good idea to check it. (1) Try special cases where result is known analytically. (2) Try different g. to see if get the same result. (3) Use analytic results for components of in order to keep dimension that must be importance-sampled small. I. Bayesian econometrics F. Numerical Bayesian methods 1. Importance sampling 2. The Gibbs sampler Suppose the parameter vector can be partitioned as 1 with the property that p Y is of unknown form but p 1 Y, 2, 3 p 2 Y, 1, 3 p 3 Y, 1, 2, 2, 3 are of known form (same idea works for 2, 4, or n blocks) 8
(1) Start with arbitrary initial guesses j 1, j 2, j 3 for j 1. (2) Generate: 1 from p 1 Y, j 2, j 3 2 from p 2 Y, 1, j 3 3 from p 3 Y, 1, 2 (3) Repeat step (2) for j 1,2,...,D Notice the sequence j D is a Markov chain with transition kernel j p 3 Y, 1, 2 p 2 Y, 1, 3 j p 1 Y, 2 j, 3 j Under quite general conditions, the realizations from a Markov chain for D converge to draws from the ergodic distribution of the chain satisfying k j j d j 9
Claim: the ergodic distribution of this chain corresponds to the posterior distribution: p Y Proof: k j j d j k p 3 Y, 1, 2 p 2 Y, 1, 3 j p 1 Y, 2 j, 3 j p j Yd j k p, j Yd j p Y 10
Implication: if we throw out the first D 0 draws (for D 0 large), then D01, D02,..., D represent draws from the posterior distribution p Y. Checks: (1) Change 1 same answer? (2) Change D 0, D same answer? (3) Plot elements of j as function of j to see if it looks same across blocks. Example of bad mixing 8 6 4 2 0 0 1 2 3 4 5 6 7 8 9 10 x 10 5 0 2 4 6 8 0 1 2 3 4 5 6 7 8 9 10 x 10 5 11
Checks: (4) Calculate autocorrelations of elements of j Note: throwing out observations does not "cure" the problem (5) Do formal statistical tests for stability Geweke s diagnostic: Test whether mean of i for first 10% of draws is same as for last 50%. Repeat for each parameter i. (1) Calculate mean of parameter i over first subsample: 1 N q 1 N 1 1 j i for say N 1 0. 1D (2) Estimateŝ 1 2 times spectrum at frequency 0 over this subsample 12
(3) Do same for second subsample, e.g. 1 D j q 2 N 2 jj2 1 i for J 2 N 2 0. 5D (4) Calculate q 1 q 2 ŝ 1 /N 1 ŝ 2 /N 2 d N0, 1, e.g., reject stability if exceeds 1. 96 Popular approach to estimate s (e.g., Dynare): (a) Divide subsample into 100 blocks (i.e., block 1 first 1% of draws) (b) Calculate mean over each block and autocovariances of these means (c) Use Newey-West with 4, 8, or 15 lags ( 4%, 8%, or 15% of sample) to getŝ 1 I. Bayesian econometrics F. Numerical Bayesian methods 1. Importance sampling 2. The Gibbs sampler 3. Metropolis-Hastings algorithm 13
T Suppose s t t1 is an ergodic K-state Markov chain, s t 1,2,...,K with transition probabilities p ij Prs t j s t1 i K p ij 1 for i 1,...,K p ij 0 for i,j 1,...,K The ergodic or unconditional probabilities satisfy Prs t j K i1 Prs t j,s t1 i j K i1 p ij i 14
Proposition: Suppose we can find a set of numbers f 1,f 2,..,f K such that f j 0 for j 1,...,K K f j 1 f i p ij f j p ji Then f j j Proof: We re given that f i p ij f j p ji sum over i: K i1 f i p ij f j K i1 p ji f j which satisfy definitions of i, K i1 i p ij j Works also for continuous-valued Markov chains. If x t k is Markov with transition kernel px, y (meaning that): Prx t A x t1 x px, ydy A 15
then the ergodic density y, which signifies that Prx t A ydy, A satisfies y px, yxdx k Proposition: if fx 0 for all x fxdx 1 k fxpx, y fypy, x for all x,y then x fx Goal in Metropolis-Hastings: We know how to calculate hx (where h may be an unknown constant) and want to sample from it. Solution: generate a sample x t from a Markov chain whose ergodic density isx 16
How MH works: We previously generated x t1 x We now generate a candidate y from some known density qx, y We ll then set x t y if y/x is big and otherwise keep x t x Let x, y be probability we set x t y If xqx, y 0, then x, y min yqy, x xqx,y,1 otherwise x,y 1 17
When x y, the transition kernel of this chain is qx, yx, y. To show that y is the ergodic density of this chain, we must show that xx, yqx, y But yy, xqy, x xx, yqx, y xqx, ymin yqy,x xqx,y, 1 minyqy, x, xqx, y yy, xqy, x Options for candidate density: (1) independent qy, x qy e.g., y N, where is our guess of mean of y 18
Options for candidate density: (2) random walk qy,x qy x e.g., qy,x 2 n/2 1/2 exp1/2y x 1 y x 19