Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested oly i θ? o Loss fuctio depeds o θ oly Eg y ~ N ( µσ, ), both ukow The θ = ( µσ, ) ad L( θδ, ) ( µ δ( x)) = o Ca show that the expected posterior loss ivolves the margial posterior p( θ y) = p( θ, θ yd ) θ = p( θ θ, y) p( θ yd ) θ Note that the margial posterior of θ is a mixture of coditioal posteriors give θ Whe θ takes discrete values (,,,M), possibly deotig differet models, the the posterior of θ is a weighted average of posteriors give each model The weights deped o the combied evidece from the prior ad the data, p( θ y) This is the key idea uderlyig Bayesia Model Averagig (BMA) Give a strog belief i the prior specificatio, Bayesia eed ot select a model BMA would miimize expected risk We ca always draw samples from the joit posterior p( θ, θ y) o If it is easier to sample from p( θ y ), draw samples from this distributio, ad the for each of these samples, draw from p y To obtai samples from p( θ y ), igore the θcoordiate ( θ, θ )
Aalysis for Normal Data, N ( µσ, ) with No-iformative prior Prior p ( µ σ ) c; p( σ ) ( σ ), ie, ( µ,l σ) Uiform Normal likelihood: Give ( µσ, ), iid observatios lead to the sufficiet statistics ys, = ( yi y) ad the likelihood fuctio ( ) / ( σ ) exp{ [( ) s + y ( µ )]} σ The posterior of ( µσ, ), proportioal to likelihood times the prior, factorizes ito two parts: exp{ ( y µ )}; σ ( + )/ ( σ ) exp{ ( ) s } σ The first term represets the kerel of a Normal desity with mea y, ad var σ /, except for the costat / πσ / o Whe usig the precisio otatio { τ = (/ σ )}, we say that give y ad τ, the coditioal posterior of µ is Normal with mea y, ad precisio τ Note that for a sample of size, give ( µσ, ), y has a Normal distributio with precisio τ For the margial posterior of τ or σ : o We must itegrate out the first term correspodig to the coditioal posterior of µ, give τ, i the above joit posterior expressio This yields a term proportioal to σ Alteratively, the first term is proportioal to σ * N( y, σ / ) Now, the margial posterior desity of σ is proportioal to ( + ) + ν ( σ ) exp{ }, where ν = ( ) s σ Note that this correspods to a iverse-gamma desity, or a scaled iverse chi-squared For its summary statistic, mea, media ad mode, see Table A i the text book (pp 574)
o However, i order to obtai the desity of τ, we must accout for a chage of variables from σ to τ Sice, the absolute value of dσ / dτ yields the term τ, the margial posterior desity of τ is ( + ) proportioal to τ exp{ ντ }, which correspods to a scaled chisquared with (-) degrees of freedom ad scale parameter ν, or a - ν Gamma(, ) Note that, the posterior desity of ντ is a chi-squared radom variable with (-) degrees of freedom, which is same as the samplig distributio of ν = ( ) s Give this o-iformative prior o the parameters, the distributio of the pivotal quatity uchaged ( ) s σ remaied The posterior distributio of ( µσ, ) belogs to Normal-Iverted Gamma family, ad that of ( µτ, ) belogs to Normal-Gamma family We ca easily draw samples from this joit posterior by first drawig samples from a Gamma (scaled chi-squared), ad the give each τ, draw a sample from the coditioal Normal distributio of µ For the Margial Posterior of µ o Sice, the coditioal posterior distributio of µ give τ is Normal with mea y, ad var σ /, therefore, give τ Z = ( µ y) τ ~ N(0,) Sice the distributio of Z does t deped o the coditioig variable τ, (Z, τ ) are idepedet radom variables Thus ντ is chi-squared radom variable with (-) df, idepedet of Z Hece, Z ( µ y) = ~ ντ/( ) s Studet s t with (-) degrees of freedom Thus, the margial posterior of µ is a t-distributio with locatio y ad scale s/
o Note that with this o-iformative prior, the samplig distributio of the pivotal quatity, t = ( y µ ) s also has the Studet s t distributio with (-) df o Note that the t-distributio represets a scale-mixture of Normal radom variables, whe the scale has a iverted Gamma distributio Posterior-predictive desity of Future Observatio(s) o I order to predict a future observable ỹ, whose desity depeds o ( µσ, ), we eed to fid the predictive desity, p( ỹ y), whe the ucertaity about the parameters ( µσ, ) is give by its posterior o Of course, give the samples from the posterior of ( µσ, ), ad the desity ỹ ( µσ, ), oe ca ow draw samples from the joit desity of ( y, µτ, ) Igorig the secod ad third colums provides the samples from the posterior-predictive desity of ỹ o However, if the future observatio is also from Normal( µσ, ) populatio, oe ca easily get the aalytic expressio of the posterior predictive desity Give ( µσ, ), y = µ + σz, where Z is a stadard Normal radom variable Furthermore, µ ( y, σ ) is Normal with mea y, ad var σ /, it follows that y y, σ is Normal with mea Hece, give ( y, τ ), τ / ( y y) U = ( + )/ y, ad var ( / ) σ + is a Normal (0,) radom variable Furthermore, sice ντ is a idepedet Chi-squared radom variable with (-) degrees of freedom, it follows that y y s ( + )/ has a t-distributio I other words, the posterior-predictive desity of ỹ is a t- distributio with locatio y ad scale s ( + )/ Note that if we wat to predict m future observatios from this same populatio, kowig that the y m, smis the sufficiet statistic, we ca achieve this task by first predictig oe observatio from N( µσ, / m), as above, as well as oe from
the predictive desity of s m, which ca be foud similarly Now, give y m, sm, the coditioal distributio of Y,, Ym does t deped o the parameters Thus we ca ow draw Y s from this distributio The Example o the speed of light is worth readig, sice i this case the outliers do ot satisfy the ormal model, ad the posterior based o this data model does ot look good I fact, i this problem, the sigal to oise ratio is very small, so the model has to be really good o I fact, the values of the physical costats are reviewed every five years by the Committee o Data for Sciece ad Techology (CODATA), see, eg, http://physicsistgov/cuu/referece/cotetshtml ad a iterestig article o implicatios of o-costat velocity of light at http://wwwldolphiorg/cdkcoseqhtml CODATA evaluates the collectio of observatios made i the iterveig five years for outliers etc, ad the updates the values of physical costats Of course, the chage i a few least sigificat digits Aalysis of Normal data N µσ Gamma prior (, ) with cojugate Normal-iverted Give the likelihood of iid observatios from Normal, the cojugate prior should also have two terms of the same form It suggests the cojugate Prior: µ σ ~ N( µ 0, σ / κ 0) [This prior is equivalet to the posterior from a state startig with uiform prior, ad drawig κ0 with observed mea µ 0 whe the variace is kow] I additio, τ ~scaled χ with scale ντ ( ν / σ ) ν ντ (, ) ν0 0 0 0 0 σ = =, or a Gamma 0 0 0 Note that i the cojugate prior, the two parameters are depedet, but we are assigig idepedet distributios to ( µ / σ ad / σ ) The sigal to oise ratio ( µ / σ ) is a very popular parameter i Egieerig applicatios o I effect, the prior is same as a radom effect model for µ, which may ot be suitable i some applicatios [See the textbook o this issue]
O multiplyig the likelihood by the prior, it is easy to see that the posterior is also a Normal-Iverted Gamma form with updated parameters µ ωµ ω ω κ ν 0 = 0 + ( ) y, where =, κ = κ +, 0 0 = ν +, ad νσ = νσ + ( ) s + ω ( y µ ) 0 0 0 Agai, samplig from this distributio is self-explaatory Now, for the margial posterior distributio of µ, followig the discussio i the o-iformative prior case, it is easy to see that we get a t-distributio with locatio µ ad scale ( σ / κ ) Similarly, the predictive desity of a future observatio ca be obtaied Aalysis of Normal data with semi-cojugate prior I some applicatios, the prior o (, ) µσ may be required to be idepedet I this case, the joit posterior will ot factorize ay more, but oe ca still obtai the coditioal ad margial posteriors κ