Econ Some Bayesian Econometrics

Size: px

Start display at page:

Download "Econ Some Bayesian Econometrics"

Rachel Blair
6 years ago
Views:

1 Econ Some Bayesian Econometrics Patrick Bajari Patrick Bajari () Econ Some Bayesian Econometrics 1 / 72

2 Motivation Next we shall begin to discuss Gibbs sampling and Markov Chain Monte Carlo (MCMC) These are methods from Bayesian statistics and econometrics We shall emphasize the practical value of these approaches over philosophical issues in Bayes vs. Classical estimation However, it is important to review a couple of basic results from Bayesian statistics rst Please read 3.8 carefully Patrick Bajari () Econ Some Bayesian Econometrics 2 / 72

3 Motivation In Bayes, we normally start by specifying a likelihood function p(y jθ) and a prior p(θ) The likelihood depends on parameters θ Note that this is di erent from GMM where we only specify moment equations We use Bayes Theorem to learn about the posterior distribution of the parameter conditional on the data: p(θjy) = p(θ)p(yjθ) R p(yjθ)p(θ)dθ Patrick Bajari () Econ Some Bayesian Econometrics 3 / 72

4 Motivation In many models, the posterior is a very complicated function and cannot be expressed analytically However, it turns out that in many models of interest it is possible to simulate the posterior distribution This is often possible even when classical methods of inference cannot be done That is, we will draw pseudo random deviates θ (1),..., θ (S ) from p(θjy) Patrick Bajari () Econ Some Bayesian Econometrics 4 / 72

5 Motivation We will often be interested in functions g(θ) For example, g(θ) might be pro t, consumer surplus or other functions of demand parameters The expected value of g(θ) can be simulated as: 1 S s g θ (s) Z! p g(θ)p(θjy)dθ We may also be interested in maximizing expected utility/pro t (or minimizing loss to statisticians) from some function g(a, θ) where a is an action max a 1 S s g a, θ (s) Patrick Bajari () Econ Some Bayesian Econometrics 5 / 72

6 Motivation Bayes requires to the speci cation of a prior p(θ) This choice is frequently criticized as ad hoc However, the authors note that under weak regularity condtitions asymptotically: p(θjy) N(bθ MLE, H 1 bθ MLE ) Asymptotically, the prior swamps the likelihood Patrick Bajari () Econ Some Bayesian Econometrics 6 / 72

7 Motivation Some practical advantages of Bayes- 1 Simulation of complicated models (especially w/ latent variables) 2 Exact inference (e.g. asymptotic approximations not used) 3 Link to decision theory Patrick Bajari () Econ Some Bayesian Econometrics 7 / 72

8 Bayesian Analysis of Regression An important case to study will be the normal linear regression model We shall show that a key step in estimating many rich models is to analyze the normal regression model Consider the following multiple regression model: y i = xi 0 β + ε i ε i N(0, σ 2 ), y N(X β, σ 2 I ) In the above, y is a vector formed of the y i and X is has ith row x i Patrick Bajari () Econ Some Bayesian Econometrics 8 / 72

9 Bayesian Analysis of Regression Given the assumptions above, the likelihood for y is: 1 p(yjx, β, σ 2 ) (σ 2 ) n/2 exp 2σ 2 (y X β)0 (y X β) In Bayesian econometrics, it is convenient to work with prior distributions that are conjugate. That is, the posterior is in the same "class" of distributions as the prior The posterior is always proportional to the prior times the likelihood If we are clever in choosing the right density for the prior, the posterior will be analytically convenient to work with Our concerns are mainly comptutational- we do not necessarily take the parametric form of the prior seriously Patrick Bajari () Econ Some Bayesian Econometrics 9 / 72

10 Bayesian Analysis of Regression It will be helpful to rewrite the likelihood function as follows: First note that by substituting y = X bβ + (y X bβ) y (y X β) 0 (y X β) = 0 0 bx β y bx β + β bβ X 0 X β 0 = vs 2 + β bβ X 0 X β bβ bβ where bβ is ols estimate, v = n k Patrick Bajari () Econ Some Bayesian Econometrics 10 / 72

11 Bayesian Analysis of Regression We can then write the likelihood as: p(yjx, β, σ 2 ) (σ 2 ) n/2 exp = (σ 2 ) n/2 exp( 1 2σ 2 vs2 ) exp( = (σ 2 ) v /2 exp( exp( 1 2σ 2 vs2 1 2σ 2 β bβ 0 X 0 X 1 0 2σ 2 β bβ X 0 X We will specify the prior in the form: 1 0 2σ 2 β bβ X 0 X 1 2σ 2 vs2 ) (σ 2 ) β (n v )/2 bβ ) β β bβ ) bβ p(β, σ 2 ) = p(σ 2 )p(βjσ 2 ) Looking at the above, we guess that a prior of the form p(θ) θ λ δ exp( θ ) (inverse gamma) will be conjugate and a normal for β will be conjugate Patrick Bajari () Econ Some Bayesian Econometrics 11 / 72

12 Bayesian Analysis of Regression The exact form we shall use is: p(σ 2 ) σ 2 v 0 /2+1 exp( v 0 s 2 0 2σ 2 ) We can think of this as a posterior arising from a sample size v 0 with su cient statistic s0 2 The natural conjugate prior on β is normal: p(βjσ 2 ) σ 2 k 1 exp 2σ 2 (β β)0 A(β β) This is a normal with mean β and precision (inverse of variance) A Patrick Bajari () Econ Some Bayesian Econometrics 12 / 72

13 Bayesian Analysis of Regression A key insight is that the posterior has the form: p(β, σ 2 ) p(yjx, β, σ 2 )p(σ 2 )p(βjσ 2 ) After some tedious (but not deep) algebra, we nd: p(β, σ 2 ) σ 2 k 1 exp 2σ 2 (β eβ) 0 A(β eβ) σ 2 ((n+v 0 )/2+1) ns 2 + v 0 s 2 exp 0 2σ 2 eβ = X 0 X + A 1 X 0 X bβ + Aβ Patrick Bajari () Econ Some Bayesian Econometrics 13 / 72

14 Bayesian Analysis of Regression Clearly the posterior for β conditional on y, X, σ 2 is normal It is a weighted average depending on the prior precision A and the precision from the data X 0 X It also depends on the prior mean β and the classical estimator bβ As the number of observations becomes large, this converges to the OLS estimate bβ Convincing yourself of this derivation would be good preparation for the prelims Patrick Bajari () Econ Some Bayesian Econometrics 14 / 72

15 Bayesian Analysis of Regression Derivation of this result is done by "completing the square" Since this is a common operation in Bayesian econometrics, it is worth reviewing Patrick Bajari () Econ Some Bayesian Econometrics 15 / 72

16 Bayesian Analysis of Regression Derivation of the posterior for β is done by "completing the square" Since this is a common operation in Bayesian econometrics, it is worth reviewing Patrick Bajari () Econ Some Bayesian Econometrics 16 / 72

17 Bayesian Analysis of Regression A key insight is that the posterior has the form: p(β, σ 2 ) p(yjx, β, σ 2 )p(σ 2 )p(βjσ 2 ) σ 2 n/2 1 exp 2σ 2 (y X β)0 (y X β) σ 2 k /2 1 exp 2σ 2 β β 0 A β β σ 2 v 0 /2+1 exp( v 0 s 2 0 2σ 2 ) Patrick Bajari () Econ Some Bayesian Econometrics 17 / 72

18 Bayesian Analysis of Regression We guess that β is going to be normally distributed. We strive to rewrite the equation below as follows: (y X β) 0 (y X β) + β β 0 A β β = 0 β eβ V β eβ + junk not involving β If we can write it in this form, then eβis the posterior mean and V the precision matrix Patrick Bajari () Econ Some Bayesian Econometrics 18 / 72

19 Bayesian Analysis of Regression (y X β) 0 (y X β) + β β 0 A β β = (v W β) 0 (v W β) 0 y X v =, W =, A = U 0 U Uβ U Patrick Bajari () Econ Some Bayesian Econometrics 19 / 72

20 Bayesian Analysis of Regression De ne eβ as the ols projection of v on W, eβ = (W 0 W ) 1 W 0 v = X 0 X + A 1 X 0 X bβ + Aβ Then an application of the plus/minus trick shows: 0 (v W β) 0 (v W β) 0 = β eβ W 0 W β eβ + junk not involving β 0 = β eβ X 0 X + A β eβ + junk not involving β Patrick Bajari () Econ Some Bayesian Econometrics 20 / 72

21 Markov Chain Monte Carlo Methods We will often begin with an unnormalized posterior density π (θ) (e.g. known up to a constant) The dimension of θ may be high For example, we may be studying a discrete choice model with 2000 consumers, 5 controls There are 10,000 random coe cients (which we treat as parameters) There are also hyper parameters linking the random coe cients Importance sampling and classical simulators may not be useful in this setting Patrick Bajari () Econ Some Bayesian Econometrics 21 / 72

22 Markov Chain Monte Carlo Methods Our goal will be to construct a Markov chain F to "simulate the poster" Generate a sequence of R pseudo random deviates θ 0,..., θ R However, they will not be iid Instead we will draw them from a chain θ r +1 F (θ r ) from an arbitrary starting point θ 0 Patrick Bajari () Econ Some Bayesian Econometrics 22 / 72

23 Markov Chain Monte Carlo Methods We will want the invariant, or long run, distribution of the Markov chain to be π If this is the case, then for a function h of interest: Z 1 R h (θ r )! p h (θ) π (θ) dθ r Patrick Bajari () Econ Some Bayesian Econometrics 23 / 72

24 Some Markov Chain Basics Restrict attention to nite support Let the state space S = nθ o 1,..., θ d Seqeunce of random variables θ 0, θ 1, θ 2,... h i Pr θ r +1 = θ j jθ r = θ i = p ij Initial distribution π 0 (θ) Patrick Bajari () Econ Some Bayesian Econometrics 24 / 72

25 Some Markov Chain Basics The distribution of θ 1 is: h Pr θ 1 = θ j = π 0 (θ i )p ij i π 1 = π 0 P Distribution over states is a row vector of probabilities After r iterations: π r = π 0 P r Patrick Bajari () Econ Some Bayesian Econometrics 25 / 72

26 Some Markov Chain Basics In practice, it is useful to restrict attention to chains that don t get "stuck" Assume that p ij > 0 all i, j Then there exists a stationary distribution: Note that π satis es: π = lim r! π 0 P r π = πp We also call π the invariant distribution. Characterizes the long run behavior of the chain after the impact of π 0 wears o. Patrick Bajari () Econ Some Bayesian Econometrics 26 / 72

27 Some Markov Chain Basics It is also useful to look at time reversibility to nd the invariant distribution Roughly speaking, time reversibliity means that it is equally likely to see a transition from i to j as seeing a transition from j to i h i Pr θ r = θ j jθ r +1 = θ i 1, θ r +2 = θ i 2,..., θ r +s = θ i s = Pr h θ r = θ j, θ r +1 = θ i 1, θ r +2 = θ i 2,..., θ r +s = θ i s i h i Pr θ r +1 = θ i 1, θ r +2 = θ i 2,..., θ r +s = θ i s " θ r = θ j ] Pr hθ i # r +1 = θ i 1 jθ r = θ j Pr Pr[θ r +2 = θ i 2,..., θ r +s = θ i s jθ r = θ j, θ r +1 = θ i 1 = h i h Pr θ r +1 = θ i 1 i Pr θ r +1 = θ i 1, θ r +2 = θ i 2,..., θ r +s = θ i s jθ r +1 = θ i 1 The second by the multiply/divide trick Patrick Bajari () Econ Some Bayesian Econometrics 27 / 72

28 Some Markov Chain Basics However, since we have a Markov chain: Hence: Pr[θ r +2 = θ i 2,..., θ r +s = θ i s jθ r = θ j, θ r +1 = θ i 1 ] = h i Pr θ r +1 = θ i 1, θ r +2 = θ i 2,..., θ r +s = θ i s jθ r +1 = θ i 1 h i Pr θ r = θ j jθ r +1 = θ i 1, θ r +2 = θ i 2,..., θ r +s = θ i s = hθ i = Pr r = θ j ] Pr θ r +1 = θ i 1 jθ r = θ j h i Pr θ r +1 = θ i 1 p ij = π jp ji π i Where p ij denotes the reverse chain Patrick Bajari () Econ Some Bayesian Econometrics 28 / 72

29 Some Markov Chain Basics Time reversability means p ij = p ij This in turns implies that: p ij = π jp ji π i π i p ij = π j p ji Patrick Bajari () Econ Some Bayesian Econometrics 29 / 72

30 Some Markov Chain Basics The above shows π stationary distribution implies π i p ij = π j p ji We can also prove the result in the other direction Patrick Bajari () Econ Some Bayesian Econometrics 30 / 72

31 Some Markov Chain Basics Suppose that the chain is reversible wrt ω i, all i, j ω i p ij = ω j p ji Hence ωp = ω i ω i p ij = ω j p ji i = ω j p ji = ω j All of this can be generalized to the continuous case i Patrick Bajari () Econ Some Bayesian Econometrics 31 / 72

32 Gibbs Sampling A very important method to simulate the posterior distribution is called Gibbs sampling Suppose that we have a distribution f (θ) Let s us "block" θ into p groups (θ 1, θ 2,..., θ p ) Let f 1 (θ 1 jθ 2,..., θ p ) denote the distribution of θ 1 given θ 2,..., θ p De ne f 2,..., f p analogously Patrick Bajari () Econ Some Bayesian Econometrics 32 / 72

33 Gibbs Sampling We will then simulate the following Markov chain f(θ 1r, θ 2r,..., θ pr )g R r =1 starting from an arbitrary value simulate using the following steps: step 1 : θ 1r +1 f 1 (θ 1 jθ 2r,..., θ pr ) step 2 : θ 2r +1 f 2 (θ 2 jθ 1r +1, θ 3r..., θ pr ) step 3 : θ 3r +1 f 3 (θ 3 jθ 1r +1, θ 2r +1, θ 4r..., θ pr ). step p : θ pr +1 f p (θ p jθ 1r +1, θ 2r +1,..., θ p 1r +1 ) Return to step 1 Note that you update the values of each block at each step. Patrick Bajari () Econ Some Bayesian Econometrics 33 / 72

34 Gibbs Sampling Gibbs sampling is useful for a couple of reasons: 1 You can do it in many models of interest 2 It is simple to do 3 It is highly accurate in many applications of interest 4 When combined with data augmentation (which we will discuss later), it is very powerful for the analysis of latent variable problems Patrick Bajari () Econ Some Bayesian Econometrics 34 / 72

35 Gibbs Sampling The posterior distribution can be simulated using Gibbs This is because the posterior is the invariant or long run distribution This can be shown using proof by induction Consider a two block Gibbs sampler Let π(θ) denote the posterior Patrick Bajari () Econ Some Bayesian Econometrics 35 / 72

36 Gibbs Sampling Suppose that we run the following Gibbs sampler: θ 2,r +1 π 2j1 (θ 1,r ) θ 1,r +1 π 1j2 (θ 2,r ) Suppose by the induction hypothesis that θ r π (). Z θ 1,r π 1 () = π(θ 1, θ 2 )dθ 2 Then: The distribution of θ 2,r +1 is then: Z θ 2 π 2j1 (θ 1,r ) π 1 (θ 1,r ) dθ 1 = π 2 Hence θ 2,r +1 is a draw from the invariant distribution A symmetric argument shows θ 1,r +1 is a draw from the invariant distribution Patrick Bajari () Econ Some Bayesian Econometrics 36 / 72

37 Application to OLS Returning to our ols example we can Gibbs this model as follows: βjσ 2 N(eβjσ 2 X 0 X + A 1 ) σ 2 v 1s1 2 χ 2 v 1 v 1 = v 0 + n s 2 1 = v 0s ns2 v 0 + n While it is a bit tedious to set up this simulator, it is not di cult technically Also, it will typically be highly accurate and fast The text also gives examples of SUR and other related models Patrick Bajari () Econ Some Bayesian Econometrics 37 / 72

38 Conjugacy for Covariance Matrices Consider y i an m by 1 random normal vector, y i N (0, Σ) Let there be i = 1,..., n observations: p (y 1,..., y n jσ) jσj i = jσj 1/2 exp 1 2 y 0 i Σ 1 y i n/2 exp 1 2 i y 0 i Σ 1 y i! Next note that: 1 2 i y 0 i Σy i = i tr y i y 0 i Σ 1 = tr(sσ 1 ) Patrick Bajari () Econ Some Bayesian Econometrics 38 / 72

39 Conjugacy for Covariance Matrices Use the notation etr () for exponential of the trace function p (y 1,..., y n jσ) jσj n/2 1 etr 2 SΣ 1 This suggests a natural conjugate form as follows: (v p (Σjv 0, V 0 ) jσj 0 +m+1)/2 1 etr 2 V 0Σ 1 This is an inverted Wishart distribution Patrick Bajari () Econ Some Bayesian Econometrics 39 / 72

40 Conjugacy for Covariance Matrices You can interpret V 0 as a location of the prior (e.g. you prior beliefs about the value of Σ) The term v 0 is the spread of the distribution Patrick Bajari () Econ Some Bayesian Econometrics 40 / 72

41 SUR Model Next consider the Seemingly Unrelated Regression Model (SUR) This is a set of m regressions that are related through correlated error terms y i = X i β i + ε i i = 1,..., m k = 1,..., n In an unfortunate notational choice, k is the number of observations Patrick Bajari () Econ Some Bayesian Econometrics 41 / 72

42 SUR Model For all k, (ε k,1, ε k,2,..., ε k,m ) N (0, Σ) The SUR model is a well known framework in simultaneous equations The SUR model speci es that the m equations are related through a set of correlated error terms However, they are not connected through k Patrick Bajari () Econ Some Bayesian Econometrics 42 / 72

43 SUR Model Stack equations as follows: y = X β + ε ε N (0, Σ I n ), ε 0 = (ε 1..., ε m ) y 0 = (y 1,..., y m ) 2 3 X X = 0 X X m β = β 0 1,..., β0 m Patrick Bajari () Econ Some Bayesian Econometrics 43 / 72

44 SUR Model We will work with conditionally conjugate priors p (β, Σ) = p (βj) p (Σ) β N β, A 1 Σ IW (v 0, V 0 ) Patrick Bajari () Econ Some Bayesian Econometrics 44 / 72

45 SUR Model We use a standard trick from the theory of GLS and multiply by a variance matrix which makes our system iid Consider the LU decomposition Σ = U 0 U Multiply both sides of our linear equation by U 1 0 ey = ex β + eε ey = U 1 0 In y ex = U 1 0 In X As in the theory of GLS, we are now homoskedastic Patrick Bajari () Econ Some Bayesian Econometrics 45 / 72

46 SUR Model Arguing as in our linear model, we can show that: βjσ, y, X N eβ = 1 eβ, X e 0 ex + A 1 X e 0 ex + A X e 0 ex bβ GLS + Aβ Where bβ GLS is the GLS estimator Note that once again we have a Bayesian "shrinkage" estimator Patrick Bajari () Econ Some Bayesian Econometrics 46 / 72

47 SUR Model The postrior of Σjβ is inverted Wishart Σjβ IW (v 0 + n, S + V 0 ) S = E 0 E E = (ε 1,..., ε m ) ε i = X i β ε i That is, the posterior involves a sum of squared errors plus where you centered the prior, V 0 Patrick Bajari () Econ Some Bayesian Econometrics 47 / 72

48 SUR Model To Gibbs sample, we draw a pseudo random sequence (Σ 0, β 0 ), (Σ 1, β 1 ),..., (Σ R, β R ) We use the following Markov Chain 1 Start with an arbitrary point of support, (Σ 0, β 0 ). Then for r < R 1 2 Draw β r +1 jσ r N eβ, X e 0 ex + A 3 Draw Σ r +1 jβ r +1 IW (v 0 + n, S + V 0 ) 4 Return to step 2. Patrick Bajari () Econ Some Bayesian Econometrics 48 / 72

49 Hierarchical Models In a Bayesian framework, it is common to work with hierarchical models As we shall show, this allows for extreme rich modeling and a simple way to form Gibbs samplers in many applications We next explain how these models can be analyzed by piecing together a graph The authors show that you can conceptualize of simulating from the model as a "Directed Acyclic Graph" (DAG) p(θ)! p (yjθ) θ! y Patrick Bajari () Econ Some Bayesian Econometrics 49 / 72

50 Hierarchical Models In a hierarchical model, we use a sequence of conditional distributions for θ E.g., for two conditional distributions: p(θ 2 )! p(θ 1 jθ 2 )! p (yjθ 1 ) θ 2! θ 1! y Or working with joint and marginal distributions: Z p (θ 1 ) = p(θ 1 jθ 2 )p(θ 2 )dθ 2 Patrick Bajari () Econ Some Bayesian Econometrics 50 / 72

51 Hierarchical Models θ 2 and y are independent after conditioning on y p(θ 1, θ 2, y) = p(θ 2 )p(θ 1 jθ 2 )p(yjθ 1 ) Notice that there is no term that involves θ 2 and y This suggests the following Gibbs sampler: θ 2 jθ 1 θ 1 jθ 2, y Patrick Bajari () Econ Some Bayesian Econometrics 51 / 72

52 θ 1! θ 3 Hierarchical Models The authors note that there are the following other types of structures in hierarchical models:! θ 2 θ 1! θ 3 θ 2 θ 3 Patrick Bajari () Econ Some Bayesian Econometrics 52 / 72

53 Hierarchical Models There are three rules for "reading" the dependence. on A node depends 1 any node it points to 2 any node that points to it 3 any node that pionts to the node directly "downstream" Patrick Bajari () Econ Some Bayesian Econometrics 53 / 72

54 Hierarchical Linear Model We illustrate these principals using a regression model with random coe cients: y i = X i β i + ε i ε i iid N(0, σ 2 i I ni ) Note that both β i and σ 2 i are speci c to the observation i β i = 0 z i + v i v i iid N 0, V β σ 2 i v i s 2 0i χ 2 v i Here z i is a set of covariates which could in uence the β i in the above equation We model our prior equation by equation on σ 2 i Patrick Bajari () Econ Some Bayesian Econometrics 54 / 72

55 Hierarchical Linear Model Note that this model is hierarchical in the sense that β i depends on V β and Also, σ 2 i depends on v i and s0i 2 The directed graph can for this model is on page 71 Patrick Bajari () Econ Some Bayesian Econometrics 55 / 72

56 Hierarchical Linear Model To nish specifying the model, we must give priors for the hyper parameters V β and V β IW (v, V ) vec ( ) N vec, V β A 1 Patrick Bajari () Econ Some Bayesian Econometrics 56 / 72

57 Hierarchical Linear Model To summarize the model can be written as the following set of conditional distributions y i jx i, β i, σ 2 i β i jz i,, V β σ 2 i jv i, s 2 0i V β jv, V jv β,, A This can be expressed as a directed graph on page 71 Patrick Bajari () Econ Some Bayesian Econometrics 57 / 72

58 Patrick Bajari () Econ Some Bayesian Econometrics 58 / 72

59 Hierarchical Linear Model This generates the following Gibbs sampler: β i jy i, X i,, z i, V β, σ 2 i σ 2 i jy i, X i, β i, v i, s 2 0i V β j fβ i g, v, V β, Z,, A j fβ i g, V β, Z, A, Note that most of these will be normally distributed The variance parameters should have inverse Wishart/gamma distributions Patrick Bajari () Econ Some Bayesian Econometrics 58 / 72

60 Data Augmentation Next we explore Gibbs Sampling and Data Augmentation Essentially, this is an approach to handling latent variables This is an extremely powerful technique in practice as we shall show The authors illustrate using a binary probit example Patrick Bajari () Econ Some Bayesian Econometrics 59 / 72

61 Data Augmentation Consider the binary probit model: z i = x i β + ε i, ε i N(0, 1) 0 if zi < 0 y i = 1 otherwise We observe (y i, X i ) while z i is latent Patrick Bajari () Econ Some Bayesian Econometrics 60 / 72

62 Data Augmentation We use a normal prior for β, β N 0, A 1 We treat the parameter vector θ = (β, z) where z includes all of the latent z i The directed graph for this model is: β! z! y Patrick Bajari () Econ Some Bayesian Econometrics 61 / 72

63 Data Augmentation The posterior distribution can be simulated using the Gibbs sampler: zjβ, X, y βjz, X z i is distributed as truncated normal with "mean" parameter x i β and standard deviation 1 z i truncated above at 0 if y i = 0 and below at 0 if y i = 1 The GHK algorithm of Chapter 2 can be used to simulate the truncated normal Patrick Bajari () Econ Some Bayesian Econometrics 62 / 72

64 Data Augmentation The posterior distribution for β satis es: βjz, X N e β, X 0 X + A 1 eβ = X 0 X + A 1 X 0 z + Aβ Patrick Bajari () Econ Some Bayesian Econometrics 63 / 72

65 Semiparametric Bayes The models that we have talked about thus far rely on fairly strict parametric assumptions We can weaken these assumptions in some cases through the use of mixture distributions A mixture of normal distributions can approximate an arbitrary distribution quite well In practice, a handful of mixtures is often adequate Patrick Bajari () Econ Some Bayesian Econometrics 64 / 72

66 Semiparametric Bayes The basic mixture model can be written as follows: y i N (µ ind, Σ ind ) ind i Multinomial (pvec) In the above, y i is a p-dimensional vector and pvec is a vector of probabilities for which normal deviate you use in your analysis Patrick Bajari () Econ Some Bayesian Econometrics 65 / 72

67 Semiparametric Bayes A set of convenient conditionally conjugate priors are: pvec Dirichlet (α) µ k N(µ, Σ k aµ 1 ) k = 1,..., K Σ k IW (v, V ) The Dirichlet distribution can be written as: f (x 1,.., x p ; α 1,..., a p ) = 1 B(α) K Γ (α i ) i=1 B(α) = Γ K i=1 α i K i=1 x α i 1 i Patrick Bajari () Econ Some Bayesian Econometrics 66 / 72

68 Semiparametric Bayes One convenient property to note is that: E [x i ] = α i K i=1 α i Clearly, this can be used as part of a hierarchical model For example, you could put mixtures on the random coe cients One paper we will probably discuss uses a mixture of normals on random coe cients in a multinomial probit This results in a semiparametric distribution of random coe cients Also, you could form a semiparametric error term in this fashion The text provides the Gibbs samplers for this hierarchical part of the model The text notes that there are identi cation issues, however. Patrick Bajari () Econ Some Bayesian Econometrics 67 / 72

69 Metropolis Algorithm There are many problems in which the distributions may be more complicated than a simple normal, Wishart or other form for which random number generators are readily available We will produce a time reversible chain where the invariant distribution is the posterior The chain will also be fairly simple to simulate Patrick Bajari () Econ Some Bayesian Econometrics 68 / 72

70 Metropolis Algorithm Here is a discrete version of the algorithm Let π denote the posterior 1 Start in state i, θ 0 = θ i 2 Draw state j w/ prob q ij (multinomial draw) n o 3 Compute α = min 1, π j q ji π i q ij 4 With probability α, θ 1 = θ j (move) or else θ 1 = θ i (don t move) 5 Repeat Patrick Bajari () Econ Some Bayesian Econometrics 69 / 72

71 Metropolis Algorithm We wish to show that the algorithm is time reversible with respect to π This means π i p ij = π j p ji where p ij is the probability of transitioning from i to j This can be proven as follows: π i p ij = π i q ij min 1, π jq ji = min fπ i q ij, π j q ji g π i q ij π i p ji = π j q ji min 1, π i q ij = min fπ i q ij, π j q ji g π j q ji Patrick Bajari () Econ Some Bayesian Econometrics 70 / 72

72 Metropolis Algorithm You can extend these ideas to continuous distributions Let π once again be the posterior up to a factor of proportionality The normal random walk model works as follows: 1 Start with θ 0 2 Draw eθ = θ 0 + ε, ε N(0, s 2 Σ) n o 3 Compute α = min 1, π( eθ) π(θ 0 ) 4 With probability α, set θ 1 = eθ (move) otherwise θ 1 = θ 0 (stay) Patrick Bajari () Econ Some Bayesian Econometrics 71 / 72

73 Metropolis Algorithm It is useful to start with a guess at the MLE and use the variance matrix estimate to form Σ Many like to tune s so that the accept/reject ratio is about.3-.5 The text describes more elaborate procedures The text shows how to use this in a random coe cient logit model The chain may exhibit high autocorrelation, inhibiting convergence The authors use the "metropolis within Gibbs" to study the random coe cient logit Patrick Bajari () Econ Some Bayesian Econometrics 72 / 72

74 6 Target Marketing In The Value of Purchase History Data in Target Marketing Rossi et. al. attempt to estimate household level preference parameters. This is of interest as a marketing problem. CMI checkout coupon uses purchase information to customize coupons to a particular household. In principal, the entire purchase history (from consumer loyalty cards) could be used to customize coupons (and hence prices) If a household level preference parameter can be forecasted with high precision, this is essentially first degree price discrimination!

75 Even with short purchase histories, they find that profits are increased 2.5 fold through the use of purchase data compared to blanket couponing strategies. Even one observation can boost profits from couponingby50%. This application is of interest to economists as well. The methods in this paper allow us to account for consumer heterogeneity in a very rich manner. This might be useful to examine the distribtion of welfare consequences of a policy intervention (e.g. a merger or market regulation). Beyond that, these methods demonstrate the power of Bayesian methods in latent variable problems.

76 7 Random Coefficients Model Multinomial probit with panel data on household level choices y h,t = X h,t β h + ε h,t ε h,t N(0, Λ) β h = z h + v h, v h N(0,V β ) Households h =1,...,H and time t =1,...,T X h,t covariates and z h demographics Note that household specific random coefficients β h remain fixed over time I h,t observed choice

77 The posterior distributions are derived in Appendix A. Formally, the derivations are very close to our multinomial probit model above. Gibbs sampling is used to simulate the posterior distribution of Λ,,V β 8 Predictive Distributions The authors wish to give different coupons to different households. A rational (Bayesian) decision maker would form her beliefs about household h s preference parameters given her posterior about the model parameters.

78 This will involve, as we show below, forming a predictive distribution for β h given the econometrician s information set. As a first case, suppose that the econometrician only knew z h, the demographics of household h From our model, p(β h z h,,v β )isn( z h,v β ) Given the posterior p(,v β Data), the econometricians predictive distribution for β h is: p(β h z h, Data) = Z p(β h z h,,v β )p(,v β Data) We can simulate p(β h z h, Data) using Gibbs sampling given our posterior simulations (s),v (s) β s =1,...,S: 1 S X p(βh z h, (s),v (s) β )

79 We could draw random β h from p(β h z h,data). For each (s),v (s), draw β(s) β h from p(β h z h, (s),v (s) β ) Given β (s) h,s=1,..., S, we could then simulate purchase probabilities. Draw ε (s) ht from ε h,t N(0, Λ (s) ) The posterior purchase probability for j, given X ht and z h is: 1 X ½ 1 X jht β S h + ε (s) jht >X j 0 ht β h + ε (s) j 0 ht for j0 6= j s ¾ This would allow us to simulate the purchase response to different couponing strategies for a specific householdh.

80 The paper runs through different couponing strategies given different information set (e.g. full or choice only information sets). The key ideas are similar- form a predictive distribution for h s preferences and simulate purchase behavior in an analogous fashion. In the case of a full purchase information history, we could use the raw Gibbs output since the markov chain will simulate β (s) h s =1,...,S. Thiscouldthenbeusedtosimulatechoicebehavior as in the example above (given draws of ε (s) ht )

81 9 Data AC Neilson scanner panel data for tuna in Springfield Missouri. 400 households, 1.5 years, 1-61 purchases. Brands and covariates in Table 2. Demographics Table 3. Table4,deltacoefficients. Poorer people prefer private label. Goodness of fit moderatefordemographiccoefficients

82 Figures 1 and 2, household level coefficient estimates with different information sets Table 5,return to different marketing strategies. Bottom line, you gain.5 cents to 1.0 cents per customer through better estimates. With a lot of customers, this could be quite profitable.

Index. Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables.

Index. Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables. Index Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables. Adaptive rejection metropolis sampling (ARMS), 98 Adaptive shrinkage, 132 Advanced Photo System (APS), 255 Aggregation