Econ Some Bayesian Econometrics

Size: px
Start display at page:

Download "Econ Some Bayesian Econometrics"

Transcription

1 Econ Some Bayesian Econometrics Patrick Bajari Patrick Bajari () Econ Some Bayesian Econometrics 1 / 72

2 Motivation Next we shall begin to discuss Gibbs sampling and Markov Chain Monte Carlo (MCMC) These are methods from Bayesian statistics and econometrics We shall emphasize the practical value of these approaches over philosophical issues in Bayes vs. Classical estimation However, it is important to review a couple of basic results from Bayesian statistics rst Please read 3.8 carefully Patrick Bajari () Econ Some Bayesian Econometrics 2 / 72

3 Motivation In Bayes, we normally start by specifying a likelihood function p(y jθ) and a prior p(θ) The likelihood depends on parameters θ Note that this is di erent from GMM where we only specify moment equations We use Bayes Theorem to learn about the posterior distribution of the parameter conditional on the data: p(θjy) = p(θ)p(yjθ) R p(yjθ)p(θ)dθ Patrick Bajari () Econ Some Bayesian Econometrics 3 / 72

4 Motivation In many models, the posterior is a very complicated function and cannot be expressed analytically However, it turns out that in many models of interest it is possible to simulate the posterior distribution This is often possible even when classical methods of inference cannot be done That is, we will draw pseudo random deviates θ (1),..., θ (S ) from p(θjy) Patrick Bajari () Econ Some Bayesian Econometrics 4 / 72

5 Motivation We will often be interested in functions g(θ) For example, g(θ) might be pro t, consumer surplus or other functions of demand parameters The expected value of g(θ) can be simulated as: 1 S s g θ (s) Z! p g(θ)p(θjy)dθ We may also be interested in maximizing expected utility/pro t (or minimizing loss to statisticians) from some function g(a, θ) where a is an action max a 1 S s g a, θ (s) Patrick Bajari () Econ Some Bayesian Econometrics 5 / 72

6 Motivation Bayes requires to the speci cation of a prior p(θ) This choice is frequently criticized as ad hoc However, the authors note that under weak regularity condtitions asymptotically: p(θjy) N(bθ MLE, H 1 bθ MLE ) Asymptotically, the prior swamps the likelihood Patrick Bajari () Econ Some Bayesian Econometrics 6 / 72

7 Motivation Some practical advantages of Bayes- 1 Simulation of complicated models (especially w/ latent variables) 2 Exact inference (e.g. asymptotic approximations not used) 3 Link to decision theory Patrick Bajari () Econ Some Bayesian Econometrics 7 / 72

8 Bayesian Analysis of Regression An important case to study will be the normal linear regression model We shall show that a key step in estimating many rich models is to analyze the normal regression model Consider the following multiple regression model: y i = xi 0 β + ε i ε i N(0, σ 2 ), y N(X β, σ 2 I ) In the above, y is a vector formed of the y i and X is has ith row x i Patrick Bajari () Econ Some Bayesian Econometrics 8 / 72

9 Bayesian Analysis of Regression Given the assumptions above, the likelihood for y is: 1 p(yjx, β, σ 2 ) (σ 2 ) n/2 exp 2σ 2 (y X β)0 (y X β) In Bayesian econometrics, it is convenient to work with prior distributions that are conjugate. That is, the posterior is in the same "class" of distributions as the prior The posterior is always proportional to the prior times the likelihood If we are clever in choosing the right density for the prior, the posterior will be analytically convenient to work with Our concerns are mainly comptutational- we do not necessarily take the parametric form of the prior seriously Patrick Bajari () Econ Some Bayesian Econometrics 9 / 72

10 Bayesian Analysis of Regression It will be helpful to rewrite the likelihood function as follows: First note that by substituting y = X bβ + (y X bβ) y (y X β) 0 (y X β) = 0 0 bx β y bx β + β bβ X 0 X β 0 = vs 2 + β bβ X 0 X β bβ bβ where bβ is ols estimate, v = n k Patrick Bajari () Econ Some Bayesian Econometrics 10 / 72

11 Bayesian Analysis of Regression We can then write the likelihood as: p(yjx, β, σ 2 ) (σ 2 ) n/2 exp = (σ 2 ) n/2 exp( 1 2σ 2 vs2 ) exp( = (σ 2 ) v /2 exp( exp( 1 2σ 2 vs2 1 2σ 2 β bβ 0 X 0 X 1 0 2σ 2 β bβ X 0 X We will specify the prior in the form: 1 0 2σ 2 β bβ X 0 X 1 2σ 2 vs2 ) (σ 2 ) β (n v )/2 bβ ) β β bβ ) bβ p(β, σ 2 ) = p(σ 2 )p(βjσ 2 ) Looking at the above, we guess that a prior of the form p(θ) θ λ δ exp( θ ) (inverse gamma) will be conjugate and a normal for β will be conjugate Patrick Bajari () Econ Some Bayesian Econometrics 11 / 72

12 Bayesian Analysis of Regression The exact form we shall use is: p(σ 2 ) σ 2 v 0 /2+1 exp( v 0 s 2 0 2σ 2 ) We can think of this as a posterior arising from a sample size v 0 with su cient statistic s0 2 The natural conjugate prior on β is normal: p(βjσ 2 ) σ 2 k 1 exp 2σ 2 (β β)0 A(β β) This is a normal with mean β and precision (inverse of variance) A Patrick Bajari () Econ Some Bayesian Econometrics 12 / 72

13 Bayesian Analysis of Regression A key insight is that the posterior has the form: p(β, σ 2 ) p(yjx, β, σ 2 )p(σ 2 )p(βjσ 2 ) After some tedious (but not deep) algebra, we nd: p(β, σ 2 ) σ 2 k 1 exp 2σ 2 (β eβ) 0 A(β eβ) σ 2 ((n+v 0 )/2+1) ns 2 + v 0 s 2 exp 0 2σ 2 eβ = X 0 X + A 1 X 0 X bβ + Aβ Patrick Bajari () Econ Some Bayesian Econometrics 13 / 72

14 Bayesian Analysis of Regression Clearly the posterior for β conditional on y, X, σ 2 is normal It is a weighted average depending on the prior precision A and the precision from the data X 0 X It also depends on the prior mean β and the classical estimator bβ As the number of observations becomes large, this converges to the OLS estimate bβ Convincing yourself of this derivation would be good preparation for the prelims Patrick Bajari () Econ Some Bayesian Econometrics 14 / 72

15 Bayesian Analysis of Regression Derivation of this result is done by "completing the square" Since this is a common operation in Bayesian econometrics, it is worth reviewing Patrick Bajari () Econ Some Bayesian Econometrics 15 / 72

16 Bayesian Analysis of Regression Derivation of the posterior for β is done by "completing the square" Since this is a common operation in Bayesian econometrics, it is worth reviewing Patrick Bajari () Econ Some Bayesian Econometrics 16 / 72

17 Bayesian Analysis of Regression A key insight is that the posterior has the form: p(β, σ 2 ) p(yjx, β, σ 2 )p(σ 2 )p(βjσ 2 ) σ 2 n/2 1 exp 2σ 2 (y X β)0 (y X β) σ 2 k /2 1 exp 2σ 2 β β 0 A β β σ 2 v 0 /2+1 exp( v 0 s 2 0 2σ 2 ) Patrick Bajari () Econ Some Bayesian Econometrics 17 / 72

18 Bayesian Analysis of Regression We guess that β is going to be normally distributed. We strive to rewrite the equation below as follows: (y X β) 0 (y X β) + β β 0 A β β = 0 β eβ V β eβ + junk not involving β If we can write it in this form, then eβis the posterior mean and V the precision matrix Patrick Bajari () Econ Some Bayesian Econometrics 18 / 72

19 Bayesian Analysis of Regression (y X β) 0 (y X β) + β β 0 A β β = (v W β) 0 (v W β) 0 y X v =, W =, A = U 0 U Uβ U Patrick Bajari () Econ Some Bayesian Econometrics 19 / 72

20 Bayesian Analysis of Regression De ne eβ as the ols projection of v on W, eβ = (W 0 W ) 1 W 0 v = X 0 X + A 1 X 0 X bβ + Aβ Then an application of the plus/minus trick shows: 0 (v W β) 0 (v W β) 0 = β eβ W 0 W β eβ + junk not involving β 0 = β eβ X 0 X + A β eβ + junk not involving β Patrick Bajari () Econ Some Bayesian Econometrics 20 / 72

21 Markov Chain Monte Carlo Methods We will often begin with an unnormalized posterior density π (θ) (e.g. known up to a constant) The dimension of θ may be high For example, we may be studying a discrete choice model with 2000 consumers, 5 controls There are 10,000 random coe cients (which we treat as parameters) There are also hyper parameters linking the random coe cients Importance sampling and classical simulators may not be useful in this setting Patrick Bajari () Econ Some Bayesian Econometrics 21 / 72

22 Markov Chain Monte Carlo Methods Our goal will be to construct a Markov chain F to "simulate the poster" Generate a sequence of R pseudo random deviates θ 0,..., θ R However, they will not be iid Instead we will draw them from a chain θ r +1 F (θ r ) from an arbitrary starting point θ 0 Patrick Bajari () Econ Some Bayesian Econometrics 22 / 72

23 Markov Chain Monte Carlo Methods We will want the invariant, or long run, distribution of the Markov chain to be π If this is the case, then for a function h of interest: Z 1 R h (θ r )! p h (θ) π (θ) dθ r Patrick Bajari () Econ Some Bayesian Econometrics 23 / 72

24 Some Markov Chain Basics Restrict attention to nite support Let the state space S = nθ o 1,..., θ d Seqeunce of random variables θ 0, θ 1, θ 2,... h i Pr θ r +1 = θ j jθ r = θ i = p ij Initial distribution π 0 (θ) Patrick Bajari () Econ Some Bayesian Econometrics 24 / 72

25 Some Markov Chain Basics The distribution of θ 1 is: h Pr θ 1 = θ j = π 0 (θ i )p ij i π 1 = π 0 P Distribution over states is a row vector of probabilities After r iterations: π r = π 0 P r Patrick Bajari () Econ Some Bayesian Econometrics 25 / 72

26 Some Markov Chain Basics In practice, it is useful to restrict attention to chains that don t get "stuck" Assume that p ij > 0 all i, j Then there exists a stationary distribution: Note that π satis es: π = lim r! π 0 P r π = πp We also call π the invariant distribution. Characterizes the long run behavior of the chain after the impact of π 0 wears o. Patrick Bajari () Econ Some Bayesian Econometrics 26 / 72

27 Some Markov Chain Basics It is also useful to look at time reversibility to nd the invariant distribution Roughly speaking, time reversibliity means that it is equally likely to see a transition from i to j as seeing a transition from j to i h i Pr θ r = θ j jθ r +1 = θ i 1, θ r +2 = θ i 2,..., θ r +s = θ i s = Pr h θ r = θ j, θ r +1 = θ i 1, θ r +2 = θ i 2,..., θ r +s = θ i s i h i Pr θ r +1 = θ i 1, θ r +2 = θ i 2,..., θ r +s = θ i s " θ r = θ j ] Pr hθ i # r +1 = θ i 1 jθ r = θ j Pr Pr[θ r +2 = θ i 2,..., θ r +s = θ i s jθ r = θ j, θ r +1 = θ i 1 = h i h Pr θ r +1 = θ i 1 i Pr θ r +1 = θ i 1, θ r +2 = θ i 2,..., θ r +s = θ i s jθ r +1 = θ i 1 The second by the multiply/divide trick Patrick Bajari () Econ Some Bayesian Econometrics 27 / 72

28 Some Markov Chain Basics However, since we have a Markov chain: Hence: Pr[θ r +2 = θ i 2,..., θ r +s = θ i s jθ r = θ j, θ r +1 = θ i 1 ] = h i Pr θ r +1 = θ i 1, θ r +2 = θ i 2,..., θ r +s = θ i s jθ r +1 = θ i 1 h i Pr θ r = θ j jθ r +1 = θ i 1, θ r +2 = θ i 2,..., θ r +s = θ i s = hθ i = Pr r = θ j ] Pr θ r +1 = θ i 1 jθ r = θ j h i Pr θ r +1 = θ i 1 p ij = π jp ji π i Where p ij denotes the reverse chain Patrick Bajari () Econ Some Bayesian Econometrics 28 / 72

29 Some Markov Chain Basics Time reversability means p ij = p ij This in turns implies that: p ij = π jp ji π i π i p ij = π j p ji Patrick Bajari () Econ Some Bayesian Econometrics 29 / 72

30 Some Markov Chain Basics The above shows π stationary distribution implies π i p ij = π j p ji We can also prove the result in the other direction Patrick Bajari () Econ Some Bayesian Econometrics 30 / 72

31 Some Markov Chain Basics Suppose that the chain is reversible wrt ω i, all i, j ω i p ij = ω j p ji Hence ωp = ω i ω i p ij = ω j p ji i = ω j p ji = ω j All of this can be generalized to the continuous case i Patrick Bajari () Econ Some Bayesian Econometrics 31 / 72

32 Gibbs Sampling A very important method to simulate the posterior distribution is called Gibbs sampling Suppose that we have a distribution f (θ) Let s us "block" θ into p groups (θ 1, θ 2,..., θ p ) Let f 1 (θ 1 jθ 2,..., θ p ) denote the distribution of θ 1 given θ 2,..., θ p De ne f 2,..., f p analogously Patrick Bajari () Econ Some Bayesian Econometrics 32 / 72

33 Gibbs Sampling We will then simulate the following Markov chain f(θ 1r, θ 2r,..., θ pr )g R r =1 starting from an arbitrary value simulate using the following steps: step 1 : θ 1r +1 f 1 (θ 1 jθ 2r,..., θ pr ) step 2 : θ 2r +1 f 2 (θ 2 jθ 1r +1, θ 3r..., θ pr ) step 3 : θ 3r +1 f 3 (θ 3 jθ 1r +1, θ 2r +1, θ 4r..., θ pr ). step p : θ pr +1 f p (θ p jθ 1r +1, θ 2r +1,..., θ p 1r +1 ) Return to step 1 Note that you update the values of each block at each step. Patrick Bajari () Econ Some Bayesian Econometrics 33 / 72

34 Gibbs Sampling Gibbs sampling is useful for a couple of reasons: 1 You can do it in many models of interest 2 It is simple to do 3 It is highly accurate in many applications of interest 4 When combined with data augmentation (which we will discuss later), it is very powerful for the analysis of latent variable problems Patrick Bajari () Econ Some Bayesian Econometrics 34 / 72

35 Gibbs Sampling The posterior distribution can be simulated using Gibbs This is because the posterior is the invariant or long run distribution This can be shown using proof by induction Consider a two block Gibbs sampler Let π(θ) denote the posterior Patrick Bajari () Econ Some Bayesian Econometrics 35 / 72

36 Gibbs Sampling Suppose that we run the following Gibbs sampler: θ 2,r +1 π 2j1 (θ 1,r ) θ 1,r +1 π 1j2 (θ 2,r ) Suppose by the induction hypothesis that θ r π (). Z θ 1,r π 1 () = π(θ 1, θ 2 )dθ 2 Then: The distribution of θ 2,r +1 is then: Z θ 2 π 2j1 (θ 1,r ) π 1 (θ 1,r ) dθ 1 = π 2 Hence θ 2,r +1 is a draw from the invariant distribution A symmetric argument shows θ 1,r +1 is a draw from the invariant distribution Patrick Bajari () Econ Some Bayesian Econometrics 36 / 72

37 Application to OLS Returning to our ols example we can Gibbs this model as follows: βjσ 2 N(eβjσ 2 X 0 X + A 1 ) σ 2 v 1s1 2 χ 2 v 1 v 1 = v 0 + n s 2 1 = v 0s ns2 v 0 + n While it is a bit tedious to set up this simulator, it is not di cult technically Also, it will typically be highly accurate and fast The text also gives examples of SUR and other related models Patrick Bajari () Econ Some Bayesian Econometrics 37 / 72

38 Conjugacy for Covariance Matrices Consider y i an m by 1 random normal vector, y i N (0, Σ) Let there be i = 1,..., n observations: p (y 1,..., y n jσ) jσj i = jσj 1/2 exp 1 2 y 0 i Σ 1 y i n/2 exp 1 2 i y 0 i Σ 1 y i! Next note that: 1 2 i y 0 i Σy i = i tr y i y 0 i Σ 1 = tr(sσ 1 ) Patrick Bajari () Econ Some Bayesian Econometrics 38 / 72

39 Conjugacy for Covariance Matrices Use the notation etr () for exponential of the trace function p (y 1,..., y n jσ) jσj n/2 1 etr 2 SΣ 1 This suggests a natural conjugate form as follows: (v p (Σjv 0, V 0 ) jσj 0 +m+1)/2 1 etr 2 V 0Σ 1 This is an inverted Wishart distribution Patrick Bajari () Econ Some Bayesian Econometrics 39 / 72

40 Conjugacy for Covariance Matrices You can interpret V 0 as a location of the prior (e.g. you prior beliefs about the value of Σ) The term v 0 is the spread of the distribution Patrick Bajari () Econ Some Bayesian Econometrics 40 / 72

41 SUR Model Next consider the Seemingly Unrelated Regression Model (SUR) This is a set of m regressions that are related through correlated error terms y i = X i β i + ε i i = 1,..., m k = 1,..., n In an unfortunate notational choice, k is the number of observations Patrick Bajari () Econ Some Bayesian Econometrics 41 / 72

42 SUR Model For all k, (ε k,1, ε k,2,..., ε k,m ) N (0, Σ) The SUR model is a well known framework in simultaneous equations The SUR model speci es that the m equations are related through a set of correlated error terms However, they are not connected through k Patrick Bajari () Econ Some Bayesian Econometrics 42 / 72

43 SUR Model Stack equations as follows: y = X β + ε ε N (0, Σ I n ), ε 0 = (ε 1..., ε m ) y 0 = (y 1,..., y m ) 2 3 X X = 0 X X m β = β 0 1,..., β0 m Patrick Bajari () Econ Some Bayesian Econometrics 43 / 72

44 SUR Model We will work with conditionally conjugate priors p (β, Σ) = p (βj) p (Σ) β N β, A 1 Σ IW (v 0, V 0 ) Patrick Bajari () Econ Some Bayesian Econometrics 44 / 72

45 SUR Model We use a standard trick from the theory of GLS and multiply by a variance matrix which makes our system iid Consider the LU decomposition Σ = U 0 U Multiply both sides of our linear equation by U 1 0 ey = ex β + eε ey = U 1 0 In y ex = U 1 0 In X As in the theory of GLS, we are now homoskedastic Patrick Bajari () Econ Some Bayesian Econometrics 45 / 72

46 SUR Model Arguing as in our linear model, we can show that: βjσ, y, X N eβ = 1 eβ, X e 0 ex + A 1 X e 0 ex + A X e 0 ex bβ GLS + Aβ Where bβ GLS is the GLS estimator Note that once again we have a Bayesian "shrinkage" estimator Patrick Bajari () Econ Some Bayesian Econometrics 46 / 72

47 SUR Model The postrior of Σjβ is inverted Wishart Σjβ IW (v 0 + n, S + V 0 ) S = E 0 E E = (ε 1,..., ε m ) ε i = X i β ε i That is, the posterior involves a sum of squared errors plus where you centered the prior, V 0 Patrick Bajari () Econ Some Bayesian Econometrics 47 / 72

48 SUR Model To Gibbs sample, we draw a pseudo random sequence (Σ 0, β 0 ), (Σ 1, β 1 ),..., (Σ R, β R ) We use the following Markov Chain 1 Start with an arbitrary point of support, (Σ 0, β 0 ). Then for r < R 1 2 Draw β r +1 jσ r N eβ, X e 0 ex + A 3 Draw Σ r +1 jβ r +1 IW (v 0 + n, S + V 0 ) 4 Return to step 2. Patrick Bajari () Econ Some Bayesian Econometrics 48 / 72

49 Hierarchical Models In a Bayesian framework, it is common to work with hierarchical models As we shall show, this allows for extreme rich modeling and a simple way to form Gibbs samplers in many applications We next explain how these models can be analyzed by piecing together a graph The authors show that you can conceptualize of simulating from the model as a "Directed Acyclic Graph" (DAG) p(θ)! p (yjθ) θ! y Patrick Bajari () Econ Some Bayesian Econometrics 49 / 72

50 Hierarchical Models In a hierarchical model, we use a sequence of conditional distributions for θ E.g., for two conditional distributions: p(θ 2 )! p(θ 1 jθ 2 )! p (yjθ 1 ) θ 2! θ 1! y Or working with joint and marginal distributions: Z p (θ 1 ) = p(θ 1 jθ 2 )p(θ 2 )dθ 2 Patrick Bajari () Econ Some Bayesian Econometrics 50 / 72

51 Hierarchical Models θ 2 and y are independent after conditioning on y p(θ 1, θ 2, y) = p(θ 2 )p(θ 1 jθ 2 )p(yjθ 1 ) Notice that there is no term that involves θ 2 and y This suggests the following Gibbs sampler: θ 2 jθ 1 θ 1 jθ 2, y Patrick Bajari () Econ Some Bayesian Econometrics 51 / 72

52 θ 1! θ 3 Hierarchical Models The authors note that there are the following other types of structures in hierarchical models:! θ 2 θ 1! θ 3 θ 2 θ 3 Patrick Bajari () Econ Some Bayesian Econometrics 52 / 72

53 Hierarchical Models There are three rules for "reading" the dependence. on A node depends 1 any node it points to 2 any node that points to it 3 any node that pionts to the node directly "downstream" Patrick Bajari () Econ Some Bayesian Econometrics 53 / 72

54 Hierarchical Linear Model We illustrate these principals using a regression model with random coe cients: y i = X i β i + ε i ε i iid N(0, σ 2 i I ni ) Note that both β i and σ 2 i are speci c to the observation i β i = 0 z i + v i v i iid N 0, V β σ 2 i v i s 2 0i χ 2 v i Here z i is a set of covariates which could in uence the β i in the above equation We model our prior equation by equation on σ 2 i Patrick Bajari () Econ Some Bayesian Econometrics 54 / 72

55 Hierarchical Linear Model Note that this model is hierarchical in the sense that β i depends on V β and Also, σ 2 i depends on v i and s0i 2 The directed graph can for this model is on page 71 Patrick Bajari () Econ Some Bayesian Econometrics 55 / 72

56 Hierarchical Linear Model To nish specifying the model, we must give priors for the hyper parameters V β and V β IW (v, V ) vec ( ) N vec, V β A 1 Patrick Bajari () Econ Some Bayesian Econometrics 56 / 72

57 Hierarchical Linear Model To summarize the model can be written as the following set of conditional distributions y i jx i, β i, σ 2 i β i jz i,, V β σ 2 i jv i, s 2 0i V β jv, V jv β,, A This can be expressed as a directed graph on page 71 Patrick Bajari () Econ Some Bayesian Econometrics 57 / 72

58 Patrick Bajari () Econ Some Bayesian Econometrics 58 / 72

59 Hierarchical Linear Model This generates the following Gibbs sampler: β i jy i, X i,, z i, V β, σ 2 i σ 2 i jy i, X i, β i, v i, s 2 0i V β j fβ i g, v, V β, Z,, A j fβ i g, V β, Z, A, Note that most of these will be normally distributed The variance parameters should have inverse Wishart/gamma distributions Patrick Bajari () Econ Some Bayesian Econometrics 58 / 72

60 Data Augmentation Next we explore Gibbs Sampling and Data Augmentation Essentially, this is an approach to handling latent variables This is an extremely powerful technique in practice as we shall show The authors illustrate using a binary probit example Patrick Bajari () Econ Some Bayesian Econometrics 59 / 72

61 Data Augmentation Consider the binary probit model: z i = x i β + ε i, ε i N(0, 1) 0 if zi < 0 y i = 1 otherwise We observe (y i, X i ) while z i is latent Patrick Bajari () Econ Some Bayesian Econometrics 60 / 72

62 Data Augmentation We use a normal prior for β, β N 0, A 1 We treat the parameter vector θ = (β, z) where z includes all of the latent z i The directed graph for this model is: β! z! y Patrick Bajari () Econ Some Bayesian Econometrics 61 / 72

63 Data Augmentation The posterior distribution can be simulated using the Gibbs sampler: zjβ, X, y βjz, X z i is distributed as truncated normal with "mean" parameter x i β and standard deviation 1 z i truncated above at 0 if y i = 0 and below at 0 if y i = 1 The GHK algorithm of Chapter 2 can be used to simulate the truncated normal Patrick Bajari () Econ Some Bayesian Econometrics 62 / 72

64 Data Augmentation The posterior distribution for β satis es: βjz, X N e β, X 0 X + A 1 eβ = X 0 X + A 1 X 0 z + Aβ Patrick Bajari () Econ Some Bayesian Econometrics 63 / 72

65 Semiparametric Bayes The models that we have talked about thus far rely on fairly strict parametric assumptions We can weaken these assumptions in some cases through the use of mixture distributions A mixture of normal distributions can approximate an arbitrary distribution quite well In practice, a handful of mixtures is often adequate Patrick Bajari () Econ Some Bayesian Econometrics 64 / 72

66 Semiparametric Bayes The basic mixture model can be written as follows: y i N (µ ind, Σ ind ) ind i Multinomial (pvec) In the above, y i is a p-dimensional vector and pvec is a vector of probabilities for which normal deviate you use in your analysis Patrick Bajari () Econ Some Bayesian Econometrics 65 / 72

67 Semiparametric Bayes A set of convenient conditionally conjugate priors are: pvec Dirichlet (α) µ k N(µ, Σ k aµ 1 ) k = 1,..., K Σ k IW (v, V ) The Dirichlet distribution can be written as: f (x 1,.., x p ; α 1,..., a p ) = 1 B(α) K Γ (α i ) i=1 B(α) = Γ K i=1 α i K i=1 x α i 1 i Patrick Bajari () Econ Some Bayesian Econometrics 66 / 72

68 Semiparametric Bayes One convenient property to note is that: E [x i ] = α i K i=1 α i Clearly, this can be used as part of a hierarchical model For example, you could put mixtures on the random coe cients One paper we will probably discuss uses a mixture of normals on random coe cients in a multinomial probit This results in a semiparametric distribution of random coe cients Also, you could form a semiparametric error term in this fashion The text provides the Gibbs samplers for this hierarchical part of the model The text notes that there are identi cation issues, however. Patrick Bajari () Econ Some Bayesian Econometrics 67 / 72

69 Metropolis Algorithm There are many problems in which the distributions may be more complicated than a simple normal, Wishart or other form for which random number generators are readily available We will produce a time reversible chain where the invariant distribution is the posterior The chain will also be fairly simple to simulate Patrick Bajari () Econ Some Bayesian Econometrics 68 / 72

70 Metropolis Algorithm Here is a discrete version of the algorithm Let π denote the posterior 1 Start in state i, θ 0 = θ i 2 Draw state j w/ prob q ij (multinomial draw) n o 3 Compute α = min 1, π j q ji π i q ij 4 With probability α, θ 1 = θ j (move) or else θ 1 = θ i (don t move) 5 Repeat Patrick Bajari () Econ Some Bayesian Econometrics 69 / 72

71 Metropolis Algorithm We wish to show that the algorithm is time reversible with respect to π This means π i p ij = π j p ji where p ij is the probability of transitioning from i to j This can be proven as follows: π i p ij = π i q ij min 1, π jq ji = min fπ i q ij, π j q ji g π i q ij π i p ji = π j q ji min 1, π i q ij = min fπ i q ij, π j q ji g π j q ji Patrick Bajari () Econ Some Bayesian Econometrics 70 / 72

72 Metropolis Algorithm You can extend these ideas to continuous distributions Let π once again be the posterior up to a factor of proportionality The normal random walk model works as follows: 1 Start with θ 0 2 Draw eθ = θ 0 + ε, ε N(0, s 2 Σ) n o 3 Compute α = min 1, π( eθ) π(θ 0 ) 4 With probability α, set θ 1 = eθ (move) otherwise θ 1 = θ 0 (stay) Patrick Bajari () Econ Some Bayesian Econometrics 71 / 72

73 Metropolis Algorithm It is useful to start with a guess at the MLE and use the variance matrix estimate to form Σ Many like to tune s so that the accept/reject ratio is about.3-.5 The text describes more elaborate procedures The text shows how to use this in a random coe cient logit model The chain may exhibit high autocorrelation, inhibiting convergence The authors use the "metropolis within Gibbs" to study the random coe cient logit Patrick Bajari () Econ Some Bayesian Econometrics 72 / 72

74 6 Target Marketing In The Value of Purchase History Data in Target Marketing Rossi et. al. attempt to estimate household level preference parameters. This is of interest as a marketing problem. CMI checkout coupon uses purchase information to customize coupons to a particular household. In principal, the entire purchase history (from consumer loyalty cards) could be used to customize coupons (and hence prices) If a household level preference parameter can be forecasted with high precision, this is essentially first degree price discrimination!

75 Even with short purchase histories, they find that profits are increased 2.5 fold through the use of purchase data compared to blanket couponing strategies. Even one observation can boost profits from couponingby50%. This application is of interest to economists as well. The methods in this paper allow us to account for consumer heterogeneity in a very rich manner. This might be useful to examine the distribtion of welfare consequences of a policy intervention (e.g. a merger or market regulation). Beyond that, these methods demonstrate the power of Bayesian methods in latent variable problems.

76 7 Random Coefficients Model Multinomial probit with panel data on household level choices y h,t = X h,t β h + ε h,t ε h,t N(0, Λ) β h = z h + v h, v h N(0,V β ) Households h =1,...,H and time t =1,...,T X h,t covariates and z h demographics Note that household specific random coefficients β h remain fixed over time I h,t observed choice

77 The posterior distributions are derived in Appendix A. Formally, the derivations are very close to our multinomial probit model above. Gibbs sampling is used to simulate the posterior distribution of Λ,,V β 8 Predictive Distributions The authors wish to give different coupons to different households. A rational (Bayesian) decision maker would form her beliefs about household h s preference parameters given her posterior about the model parameters.

78 This will involve, as we show below, forming a predictive distribution for β h given the econometrician s information set. As a first case, suppose that the econometrician only knew z h, the demographics of household h From our model, p(β h z h,,v β )isn( z h,v β ) Given the posterior p(,v β Data), the econometricians predictive distribution for β h is: p(β h z h, Data) = Z p(β h z h,,v β )p(,v β Data) We can simulate p(β h z h, Data) using Gibbs sampling given our posterior simulations (s),v (s) β s =1,...,S: 1 S X p(βh z h, (s),v (s) β )

79 We could draw random β h from p(β h z h,data). For each (s),v (s), draw β(s) β h from p(β h z h, (s),v (s) β ) Given β (s) h,s=1,..., S, we could then simulate purchase probabilities. Draw ε (s) ht from ε h,t N(0, Λ (s) ) The posterior purchase probability for j, given X ht and z h is: 1 X ½ 1 X jht β S h + ε (s) jht >X j 0 ht β h + ε (s) j 0 ht for j0 6= j s ¾ This would allow us to simulate the purchase response to different couponing strategies for a specific householdh.

80 The paper runs through different couponing strategies given different information set (e.g. full or choice only information sets). The key ideas are similar- form a predictive distribution for h s preferences and simulate purchase behavior in an analogous fashion. In the case of a full purchase information history, we could use the raw Gibbs output since the markov chain will simulate β (s) h s =1,...,S. Thiscouldthenbeusedtosimulatechoicebehavior as in the example above (given draws of ε (s) ht )

81 9 Data AC Neilson scanner panel data for tuna in Springfield Missouri. 400 households, 1.5 years, 1-61 purchases. Brands and covariates in Table 2. Demographics Table 3. Table4,deltacoefficients. Poorer people prefer private label. Goodness of fit moderatefordemographiccoefficients

82 Figures 1 and 2, household level coefficient estimates with different information sets Table 5,return to different marketing strategies. Bottom line, you gain.5 cents to 1.0 cents per customer through better estimates. With a lot of customers, this could be quite profitable.

Index. Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables.

Index. Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables. Index Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables. Adaptive rejection metropolis sampling (ARMS), 98 Adaptive shrinkage, 132 Advanced Photo System (APS), 255 Aggregation

More information

Wageningen Summer School in Econometrics. The Bayesian Approach in Theory and Practice

Wageningen Summer School in Econometrics. The Bayesian Approach in Theory and Practice Wageningen Summer School in Econometrics The Bayesian Approach in Theory and Practice September 2008 Slides for Lecture on Qualitative and Limited Dependent Variable Models Gary Koop, University of Strathclyde

More information

Lecture Notes based on Koop (2003) Bayesian Econometrics

Lecture Notes based on Koop (2003) Bayesian Econometrics Lecture Notes based on Koop (2003) Bayesian Econometrics A.Colin Cameron University of California - Davis November 15, 2005 1. CH.1: Introduction The concepts below are the essential concepts used throughout

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation 1 Outline. 1. Motivation 2. SUR model 3. Simultaneous equations 4. Estimation 2 Motivation. In this chapter, we will study simultaneous systems of econometric equations. Systems of simultaneous equations

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Gibbs Sampling in Endogenous Variables Models

Gibbs Sampling in Endogenous Variables Models Gibbs Sampling in Endogenous Variables Models Econ 690 Purdue University Outline 1 Motivation 2 Identification Issues 3 Posterior Simulation #1 4 Posterior Simulation #2 Motivation In this lecture we take

More information

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9 Metropolis Hastings Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601 Module 9 1 The Metropolis-Hastings algorithm is a general term for a family of Markov chain simulation methods

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

David Giles Bayesian Econometrics

David Giles Bayesian Econometrics David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined

More information

Gibbs Sampling in Linear Models #2

Gibbs Sampling in Linear Models #2 Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Linear Models A linear model is defined by the expression

Linear Models A linear model is defined by the expression Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Bayesian GLMs and Metropolis-Hastings Algorithm

Bayesian GLMs and Metropolis-Hastings Algorithm Bayesian GLMs and Metropolis-Hastings Algorithm We have seen that with conjugate or semi-conjugate prior distributions the Gibbs sampler can be used to sample from the posterior distribution. In situations,

More information

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix Labor-Supply Shifts and Economic Fluctuations Technical Appendix Yongsung Chang Department of Economics University of Pennsylvania Frank Schorfheide Department of Economics University of Pennsylvania January

More information

DAG models and Markov Chain Monte Carlo methods a short overview

DAG models and Markov Chain Monte Carlo methods a short overview DAG models and Markov Chain Monte Carlo methods a short overview Søren Højsgaard Institute of Genetics and Biotechnology University of Aarhus August 18, 2008 Printed: August 18, 2008 File: DAGMC-Lecture.tex

More information

MFM Practitioner Module: Risk & Asset Allocation. John Dodson. February 3, 2010

MFM Practitioner Module: Risk & Asset Allocation. John Dodson. February 3, 2010 MFM Practitioner Module: Risk & Asset Allocation Estimator February 3, 2010 Estimator Estimator In estimation we do not endow the sample with a characterization; rather, we endow the parameters with a

More information

Gibbs Sampling in Latent Variable Models #1

Gibbs Sampling in Latent Variable Models #1 Gibbs Sampling in Latent Variable Models #1 Econ 690 Purdue University Outline 1 Data augmentation 2 Probit Model Probit Application A Panel Probit Panel Probit 3 The Tobit Model Example: Female Labor

More information

Bayesian Modeling of Conditional Distributions

Bayesian Modeling of Conditional Distributions Bayesian Modeling of Conditional Distributions John Geweke University of Iowa Indiana University Department of Economics February 27, 2007 Outline Motivation Model description Methods of inference Earnings

More information

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference 1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH Lecture 5: Spatial probit models James P. LeSage University of Toledo Department of Economics Toledo, OH 43606 jlesage@spatial-econometrics.com March 2004 1 A Bayesian spatial probit model with individual

More information

The Metropolis-Hastings Algorithm. June 8, 2012

The Metropolis-Hastings Algorithm. June 8, 2012 The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings

More information

ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008

ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008 ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008 Instructions: Answer all four (4) questions. Point totals for each question are given in parenthesis; there are 00 points possible. Within

More information

MARKOV CHAIN MONTE CARLO

MARKOV CHAIN MONTE CARLO MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with

More information

Online Appendix to: Marijuana on Main Street? Estimating Demand in Markets with Limited Access

Online Appendix to: Marijuana on Main Street? Estimating Demand in Markets with Limited Access Online Appendix to: Marijuana on Main Street? Estating Demand in Markets with Lited Access By Liana Jacobi and Michelle Sovinsky This appendix provides details on the estation methodology for various speci

More information

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline

More information

Riemann Manifold Methods in Bayesian Statistics

Riemann Manifold Methods in Bayesian Statistics Ricardo Ehlers ehlers@icmc.usp.br Applied Maths and Stats University of São Paulo, Brazil Working Group in Statistical Learning University College Dublin September 2015 Bayesian inference is based on Bayes

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

STAT Advanced Bayesian Inference

STAT Advanced Bayesian Inference 1 / 32 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics Jan 23, 218 The Dirichlet distribution 2 / 32 θ Dirichlet(a 1,...,a k ) with density p(θ 1,θ 2,...,θ k ) = k j=1 Γ(a j) Γ(

More information

MONTE CARLO METHODS. Hedibert Freitas Lopes

MONTE CARLO METHODS. Hedibert Freitas Lopes MONTE CARLO METHODS Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes hlopes@chicagobooth.edu

More information

Probabilistic Graphical Networks: Definitions and Basic Results

Probabilistic Graphical Networks: Definitions and Basic Results This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical

More information

MCMC and Gibbs Sampling. Sargur Srihari

MCMC and Gibbs Sampling. Sargur Srihari MCMC and Gibbs Sampling Sargur srihari@cedar.buffalo.edu 1 Topics 1. Markov Chain Monte Carlo 2. Markov Chains 3. Gibbs Sampling 4. Basic Metropolis Algorithm 5. Metropolis-Hastings Algorithm 6. Slice

More information

Bayesian Inference. Chapter 9. Linear models and regression

Bayesian Inference. Chapter 9. Linear models and regression Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering

More information

CS281A/Stat241A Lecture 22

CS281A/Stat241A Lecture 22 CS281A/Stat241A Lecture 22 p. 1/4 CS281A/Stat241A Lecture 22 Monte Carlo Methods Peter Bartlett CS281A/Stat241A Lecture 22 p. 2/4 Key ideas of this lecture Sampling in Bayesian methods: Predictive distribution

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 10 Alternatives to Monte Carlo Computation Since about 1990, Markov chain Monte Carlo has been the dominant

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Short Questions (Do two out of three) 15 points each

Short Questions (Do two out of three) 15 points each Econometrics Short Questions Do two out of three) 5 points each ) Let y = Xβ + u and Z be a set of instruments for X When we estimate β with OLS we project y onto the space spanned by X along a path orthogonal

More information

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric

More information

INTRODUCTION TO BAYESIAN STATISTICS

INTRODUCTION TO BAYESIAN STATISTICS INTRODUCTION TO BAYESIAN STATISTICS Sarat C. Dass Department of Statistics & Probability Department of Computer Science & Engineering Michigan State University TOPICS The Bayesian Framework Different Types

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee September 03 05, 2017 Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles Linear Regression Linear regression is,

More information

One-parameter models

One-parameter models One-parameter models Patrick Breheny January 22 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/17 Introduction Binomial data is not the only example in which Bayesian solutions can be worked

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

LECTURE 15 Markov chain Monte Carlo

LECTURE 15 Markov chain Monte Carlo LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

Bayesian Inference: Probit and Linear Probability Models

Bayesian Inference: Probit and Linear Probability Models Utah State University DigitalCommons@USU All Graduate Plan B and other Reports Graduate Studies 5-1-2014 Bayesian Inference: Probit and Linear Probability Models Nate Rex Reasch Utah State University Follow

More information

Metropolis-Hastings Algorithm

Metropolis-Hastings Algorithm Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to

More information

MCMC: Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Introduction to Bayesian Methods

Introduction to Bayesian Methods Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative

More information

November 2002 STA Random Effects Selection in Linear Mixed Models

November 2002 STA Random Effects Selection in Linear Mixed Models November 2002 STA216 1 Random Effects Selection in Linear Mixed Models November 2002 STA216 2 Introduction It is common practice in many applications to collect multiple measurements on a subject. Linear

More information

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Matthew S. Johnson New York ASA Chapter Workshop CUNY Graduate Center New York, NY hspace1in December 17, 2009 December

More information

Introduction: structural econometrics. Jean-Marc Robin

Introduction: structural econometrics. Jean-Marc Robin Introduction: structural econometrics Jean-Marc Robin Abstract 1. Descriptive vs structural models 2. Correlation is not causality a. Simultaneity b. Heterogeneity c. Selectivity Descriptive models Consider

More information

Bayesian IV: the normal case with multiple endogenous variables

Bayesian IV: the normal case with multiple endogenous variables Baesian IV: the normal case with multiple endogenous variables Timoth Cogle and Richard Startz revised Januar 05 Abstract We set out a Gibbs sampler for the linear instrumental-variable model with normal

More information

Bayesian Econometrics

Bayesian Econometrics Bayesian Econometrics Christopher A. Sims Princeton University sims@princeton.edu September 20, 2016 Outline I. The difference between Bayesian and non-bayesian inference. II. Confidence sets and confidence

More information

Mixture models. Mixture models MCMC approaches Label switching MCMC for variable dimension models. 5 Mixture models

Mixture models. Mixture models MCMC approaches Label switching MCMC for variable dimension models. 5 Mixture models 5 MCMC approaches Label switching MCMC for variable dimension models 291/459 Missing variable models Complexity of a model may originate from the fact that some piece of information is missing Example

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis Summarizing a posterior Given the data and prior the posterior is determined Summarizing the posterior gives parameter estimates, intervals, and hypothesis tests Most of these computations are integrals

More information

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Lecture 8: Bayesian Estimation of Parameters in State Space Models in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space

More information

Clustering using Mixture Models

Clustering using Mixture Models Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior

More information

10. Exchangeability and hierarchical models Objective. Recommended reading

10. Exchangeability and hierarchical models Objective. Recommended reading 10. Exchangeability and hierarchical models Objective Introduce exchangeability and its relation to Bayesian hierarchical models. Show how to fit such models using fully and empirical Bayesian methods.

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Notes on Markov Networks

Notes on Markov Networks Notes on Markov Networks Lili Mou moull12@sei.pku.edu.cn December, 2014 This note covers basic topics in Markov networks. We mainly talk about the formal definition, Gibbs sampling for inference, and maximum

More information

Eco517 Fall 2014 C. Sims MIDTERM EXAM

Eco517 Fall 2014 C. Sims MIDTERM EXAM Eco57 Fall 204 C. Sims MIDTERM EXAM You have 90 minutes for this exam and there are a total of 90 points. The points for each question are listed at the beginning of the question. Answer all questions.

More information

Modeling conditional distributions with mixture models: Theory and Inference

Modeling conditional distributions with mixture models: Theory and Inference Modeling conditional distributions with mixture models: Theory and Inference John Geweke University of Iowa, USA Journal of Applied Econometrics Invited Lecture Università di Venezia Italia June 2, 2005

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Econ 8208 Homework 2 Due Date: May 7

Econ 8208 Homework 2 Due Date: May 7 Econ 8208 Homework 2 Due Date: May 7 1 Preliminaries This homework is all about Hierarchical Linear Bayesian Models (HLBM in what follows) The formal setup of these can be introduced as follows Suppose

More information

Katsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus. Abstract

Katsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus. Abstract Bayesian analysis of a vector autoregressive model with multiple structural breaks Katsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus Abstract This paper develops a Bayesian approach

More information

Sequential Bayesian Updating

Sequential Bayesian Updating BS2 Statistical Inference, Lectures 14 and 15, Hilary Term 2009 May 28, 2009 We consider data arriving sequentially X 1,..., X n,... and wish to update inference on an unknown parameter θ online. In a

More information

Environmental Econometrics

Environmental Econometrics Environmental Econometrics Syngjoo Choi Fall 2008 Environmental Econometrics (GR03) Fall 2008 1 / 37 Syllabus I This is an introductory econometrics course which assumes no prior knowledge on econometrics;

More information

The Normal Linear Regression Model with Natural Conjugate Prior. March 7, 2016

The Normal Linear Regression Model with Natural Conjugate Prior. March 7, 2016 The Normal Linear Regression Model with Natural Conjugate Prior March 7, 2016 The Normal Linear Regression Model with Natural Conjugate Prior The plan Estimate simple regression model using Bayesian methods

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Lecture 16: Mixtures of Generalized Linear Models

Lecture 16: Mixtures of Generalized Linear Models Lecture 16: Mixtures of Generalized Linear Models October 26, 2006 Setting Outline Often, a single GLM may be insufficiently flexible to characterize the data Setting Often, a single GLM may be insufficiently

More information

On Bayesian Computation

On Bayesian Computation On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints

More information

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative

More information

Bayesian Analysis of Multivariate Normal Models when Dimensions are Absent

Bayesian Analysis of Multivariate Normal Models when Dimensions are Absent Bayesian Analysis of Multivariate Normal Models when Dimensions are Absent Robert Zeithammer University of Chicago Peter Lenk University of Michigan http://webuser.bus.umich.edu/plenk/downloads.htm SBIES

More information

Monetary and Exchange Rate Policy Under Remittance Fluctuations. Technical Appendix and Additional Results

Monetary and Exchange Rate Policy Under Remittance Fluctuations. Technical Appendix and Additional Results Monetary and Exchange Rate Policy Under Remittance Fluctuations Technical Appendix and Additional Results Federico Mandelman February In this appendix, I provide technical details on the Bayesian estimation.

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43 Panel Data March 2, 212 () Applied Economoetrics: Topic March 2, 212 1 / 43 Overview Many economic applications involve panel data. Panel data has both cross-sectional and time series aspects. Regression

More information

A Bayesian Probit Model with Spatial Dependencies

A Bayesian Probit Model with Spatial Dependencies A Bayesian Probit Model with Spatial Dependencies Tony E. Smith Department of Systems Engineering University of Pennsylvania Philadephia, PA 19104 email: tesmith@ssc.upenn.edu James P. LeSage Department

More information