1 Bayesian Computation.

Size: px

Start display at page:

Download "1 Bayesian Computation."

Berenice Evans
5 years ago
Views:

1 ISyE8843A, Brai Vidakovic Hadout 9 1 Bayesia Computatio If the selectio of a adequate prior was the major coceptual ad modelig challege of Bayesia aalysis, the major implemetatioal challege is computatio As soo as the model deviates from the cojugate structure, fidig the posterior (first the margial) distributio ad the Bayes rule is all but simple A closed form solutio is more a exceptio tha the rule, ad eve for such closed form solutios, lucky mathematical coicideces, coveiet mixtures, ad other tricks are eeded Up to this poit I believe you got a sese of this calculatioal challege If classical statistics relies o optimizatio, Bayesia statistics relies o itegratio The margial eeded for the posterior is a itegral m(x) = f(x θ)π(θ)dθ, ad the Bayes estimator of h(θ), with respect to the squared error loss is a ratio of itegrals, Θ δ π (x) = h(θ)π(θ x)dθ = h(θ)f(x θ)π(θ)dθ Θ f(x θ)π(θ)dθ Θ Θ The difficulties i calculatig the above Bayes rule are that (i) the posterior caot be represeted i a fiite form, ad (ii) the itegral of h(θ) does ot have a closed form itegral uder the possibly closed form posterior distributio Adoptig a differet loss fuctio usually makes calculatio eve more difficult A exceptio is absolute loss for which the Bayes rule is the mode of the posterior, ad the mode is ot iflueced by ormalizig (trouble makig) costat, m(x) The last two decades of research i Bayesia statistics cotributed to tremedous broadeig of the scope of Bayesia models Models that could ot be hadled before are ow routiely solved This is doe by Markov Chai Mote Carlo (MCMC) Methods, ad their itroductio to the field of statistics revolutioized Bayesia statistics This hadout overviews pre MCMC techiques: Mote Carlo Itegratio, Importace Samplig, ad Aalytic Approximatios (Riema, Laplace, ad Saddlepoit) 11 Bayesia CLT Suppose that 1,,, f(x θ), where θ is p-dimesioal parameter, ad that the prior o θ is π(θ) The prior π(θ) could be improper, but we assume that the posterior is proper ad that its mode exists The, whe, where θ M is posterior mode, ie, a solutio of [θ x] MVN p (θ M, H 1 (θ M )), π (θ x) θ i = 0, i = 1,, p, where π (θ x) = f(x θ)π(θ) is o-ormalized posterior Let H be the Hessia defied as ( π ) (θ x) H(θ) = θ i θ j 1

2 The asymptotic covariace matrix is H 1 (θ M ) = (H(θ)) 1 θ=θm The proof ca be foud i stadard texts o asymptotic theory Example: Beroulli s Assume that 1,, Ber(θ) ad that the prior o θ is 1 Show that θ M =, P H(θ) = i + P i This gives H 1 (θ θ 1 θ M ) = θ M (1 θ M The Bayesia CLT gives expected result, [θ 1,, ] N (θ M, θ M(1 θ M ) ) Example: Poisso/Gamma Let 1,, Poi(θ) ad θ θ α 1 exp{ βθ} The, ( α + i 1 [θ x] N, α + ) i 1 + β ( + β) This follows from the fact that the mode is θ M = α+p i 1 +β ad that H(θ) = α+p i 1 θ Posterior approximatios by usig Bayesia Cetral Limit Theorem are called first order approximatios or modal approximatios Sice the posterior is approximated by ormal distributio, this approximatio may be poor if the true posterior is skewed or if the sample size is small 1 Laplace Approximatio Suppose we are iterested i fidig A f(x θ)dx for a particular value of θ Let f(x θ) be represeted as exp{ h(x θ)} ( Let x θ is the value of x that miimizes h(x θ) (or equivaletly maximizes f(x θ)) If h (x θ θ) = h(x θ), the x )x=x θ b a e h(x θ) dx e h(x θ θ) θ h (x θ θ) [ Φ( h (x θ θ)(b x θ ) Φ( ] h (x θ θ)(a x θ ) Usig Laplace method, approximate Gamma itegral b x α 1 a Γ(α)β e x/β dx, for α = 3, β = 3 ad (a, b) = α (3, 5), (7, 1), ad (5, ) Compare with exact values ad discuss Cosider the posterior expectatio of iterest, E θ x (g(θ)) = Θ g(θ)f(x θ)π(θ)dθ Θ f(x θ)π(θ)dθ = Θ b N(θ) exp{ h N (θ)}dθ Θ b D(θ) exp{ h D (θ)}dθ If h N (θ) = h D (θ), it is said that the represetatio is i stadard form If, o the other had, b N (θ) = b D (θ), it is said that the represetatio is i fully expoetial form If E θ x (g(θ)) ca be writte i the stadard form, α α+β E θ x (g(θ)) = ĝ + σ D ˆ b Dĝ ˆ b D + σ ˆ D g + σ4 ˆ D h ĝ + O( ) For the fully expoetial form, if g is positive ad g(θ D ) is uiformly bouded away from zero, E θ x (g(θ)) = ˆb N σn ˆbD σd exp{ (ĥn ĥd)} + O( ) To illustrate the above formulas, let s fid approximatio to expectatio of Beta B(α, β) distributio,

3 13 Classical Mote Carlo Itegratio Suppose F (x) is a probability distributio, ad h(x) is a measurable fuctio for which Eh() < whe F Let 1,, be a sample from F The This is the same as if we wrote h(x)df (x) = E h() 1 h(x)df (x) P 1( i x) h( i ) h(x)df (x 1,, ), where F (x 1,, ) = is the empirical distributio fuctio For simplicity of otatio, assume that F is cotiuous ad that desity is f, although all results remai valid for geeral probability distributios Sice, by assumptio, Eh() = h(x)f(x)dx is fiite for f, from the strog law of large umbers (SLLN) it follows I = 1 h( i ) as h(x)f(x)dx = I The symbol as stads for almost sure covergece, meaig that the the covergece fails o the set that has probability 0 The speed of this almost sure covergece is measured by the speed of decay of variace of the Mote Carlo approximatios Theoretically, V ar(i ) = I(1 I) Example: Normal Likelihood/Cauchy Prior The followig model is give: θ N (θ, 1) θ Ca(0, 1) The Bayes rule is approximately equal to δ π (x) = R R θ exp{ 1/(x θ) }dθ 1+θ 1 exp{ 1/(x θ) 1+θ }dθ δ π (x) η i 1+ηi 1 1+ηi, where η i N (, 1) Assume that = The high precisio umerical algorithms (up to 0 decimal places) i MATHEMAT- ICA give (a) δ() = , ad (b) P θ (θ 1) = Cosider those exact values ad apply the simulatio to check the performace of Mote Carlo Uiform o -Sphere This is a example of Christia Robert that is a multivariate geeralizatio of the represetatio of symmetric distributios as (scale) mixture of uiforms 3

4 Assume θ MV N p (θ, I), θ c U( θ = c), c Ga(α, β) Assume that for α =, β = 3 ad x = (0, 0, 0, 0, 0, 1) we wat to fid Bayes rule, δ(x) Robert (199, 001) shows that the Bayes rule ca be expressed i almost closed form (up to cofluet hypergeometric fuctios), δ(x) = α p β 1F 1 (α + 1; (p + )/; x /( + 4β)) 1F 1 (α; p/; x, /( + 4β)) where 1 F 1 (a, b; z) = k=1 (a) k/(b) k z k /k!, ad (a) k = a(a+1) (a+k 1) Thus, δ(x) = (0, 0, 0, 0, 0, ) To approximate this rule by MC method, oe may do prior simulatio 1 Geerate c from Ga(, 3) [BayesLab fuctio rad gamma] Simulate M uiform variates θ o 6-dimesioal sphere The polar represetatio of a elemet θ from sphere is: θ 1 = c cos ϕ 1 θ = c si ϕ 1 cos ϕ θ 3 = c si ϕ 1 si ϕ cos ϕ 3 θ p 1 = c si ϕ 1 si ϕ cos ϕ p 1 θ p = c si ϕ 1 si ϕ si ϕ p 1 where 0 ϕ 1,, ϕ p π ad 0 ϕ p 1 π If ϕ-agles are selected uiformly from their respective domais, the θ is uiformly distributed o the sphere of radius c agles = pi*rad(1,p-1); agles(p-1) = *agles(p-1); agles(p)=0; theta(1)=sqrt(c) * cos(agles(1)); for i=:p theta(i)=theta(i-1)*si(agles(i-1))/cos(agles(i-1))*cos(agles(i)); ed 3 Approximate δ by M θ i exp{ x θ i /} M exp{ x θ i /} Ripley s Example Cosider p = dx, the tail of the stadard Cauchy distributio Of course, a π(1+x ) Cauchy radom variable has a explicit cumulative distributio fuctio, F (x) = π arcta(x) ad the above tail probability is p = 1 1 π arcta() = 1 π arcta(1/) = Discuss! 4

5 14 Importace Samplig Importace samplig, or weighted samplig, is a Mote Carlo techique i which the itegral of iterest is trasformed i a coveiet way to ehace the simulatio Suppose f is a desity ad h(x)f(x)dx is of iterest Assume that samplig from f is either difficult or impossible ad direct applicatio of Mote Carlo method is hard The idea of importace samplig is to multiply ad divide the expressio h(x)f(x) by a coveiet desity g(x), h(x)f(x) g(x) g(x) The, h(x)f(x) h(x)f(x)dx = g(x)dx g(x) The coditioally, the desity g is easy to sample from, ad the Mote Carlo approximatio to the itegral is h(x)f(x) g(x)dx 1 g(x) h( i )f( i ) iid, i g (1) g( i ) The desity g is called the importace desity ad its choice depeds o f There are several guidelies to how to select the importace desity g A obvious requiremet is that support of f has to be subset of support of g, supp(f) supp(g) sice otherwise we may have a udefied itegrad Several authors ivestigated the form of importace desity that miimizes the variace of the simulatios Theoretical results are available but the optimal desity requires kowledge of h(x)f(x)dx, the itegral we are approximatig, ad has o practical value However, from the form of the optimal desity g, oe cocludes that for importace desities for which h f/g is almost costat ad has fiite variace, the importace scheme works well A attractive feature of importace samplig is that a sigle radom sample from g ca be used for differet f ad h If the ratio f/g i (1) is kow up to a costat, which may ofte be the case i Bayesia calculatios, the followig approximatio is used, Refereces h(x)f(x) g(x)dx g(x) h( i )f( i ) / g( i ) f( i ) g( i ), i iid g [1] Robert, C (001) Bayesia Choice, Secod Editio, Spriger Verlag Appedix: Simulatio of Radom Numbers The BayesLab c matlab suite cotais radom umber geerators for major discrete ad cotiuous probability laws Here we give a couple of geeral methods that may be of help if the distributio is out of list Theorem 11 (Iverse Trasformatio Method) Let U be the uiform (0,1) radom variable ad F a cdf for which F 1 exists The F 1 (U) is a draw from distributio F 5

6 Expoetial Radom Variates Sice for the expoetial E(λ) distributio y = F (x) = 1 e λx, λ, x 0, the iverse fuctio is x = 1 λ log(1 y) Thus = 1 log(1 U), λ has E(λ) distributio I fact, sice U d = (1 U) oe ca use = 1 log U λ Accept-Reject Method (ARM) This method was origially proposed by vo Neuma Give the desity of iterest (target desity), f, fid proposal desity (evelope desity, istrumetal desity) g such that ( x supp(f)) f(x) Mg(x) The algorithm is: Step 1 Geerate a cadidate g Geerate U U(0, 1) Step Accept Y = if U f() Mg() ; Step 3 Retur to Step 1 Ideed, this geerates Y f The distributio of Y is give by ( P (Y y) = P y U f() ) = Mg() P (U This ratio is, P (Y y) = y f(x)/mg(x) 0 du g(x)dx y 0 du g(x)dx P ( y, U f() Mg() ) f() Mg() ) = 1/M y f(x)dx y 1/M f(x)dx = f(x)dx Sice each proposal will be accepted with probability P (U f() Mg() ) = 1/M, ( success ) the umber of trials ecessary to produce a draw from f is geometric Ge(1/M) The tight boud o M will icrease efficiecy as well as proposal desities with boud M close to 1 Exercises: 1 Usig ARM, geerate from calbe(, 4) usig uiform U(0, 1) proposals Show first that 0x(1 x) 3 is maximized at x = 1/4 ad that M = 135/64 Geeratig radom ormal is of great iterest ad several well established methods exist Here is a ARM versio Take proposals from DE(1), g(x) = 1 e x They are simply expoetials E(1) multiplied by radom sig, ie, S = B 1, where B Ber(1/) Estimate M (M e/π 13155) 3 The desity f(x) = ( cos ( )) πx, 1 x 1, called Bickel-Levit prior is of iterest i some areas of decisio theory (approximatio to the least favorable prior i estimatig a bouded ormal mea) Propose ARM method for simulatig from Bickel-Levit prior 4 Let f(x θ) be DE(θ, 1), ad let the prior distributio for θ is a symmetric two-poit distributio (cocetrated at µ ad µ, µ > 0) 6

7 (a) Fid the margial distributio, m(x) (b) Propose a samplig scheme to draw from m(x) 5 Suppose that Y 1,, Y Ber(θ i ), i = 1,, Suppose that vector of covariates i = ( i1,, ip ) correspod to each Y i, ad that θ i = exp{ i β} 1 + exp{ i β}, where β is the vector of regressio coefficiets (a) Show that the likelihood is { } f(x β) = exp [Y i iβ log(1 + e i β )] (b) Assume π(β) = 1 Show that the posterior mode is the MLE for β, ad (c) Show logπ (β y) β i β j = k=1 exp{ i ki β} kj (1 + exp{ i, 1 i, j p β}) [β Y ] MVN p ( ˆβ mle, ( V ) 1 ), where ( exp{ 1 V = diag ˆβ mle } (1 + exp{ 1 ˆβ mle }),, exp{ ˆβ ) mle } (1 + exp{ ˆβ mle }) 5 Show that correlated radom variables F 1 (U) ad G 1 (U) have maximum positive correlatio ad have the distributios F ad G; for maximal egative correlatio it is eough take G 1 ( U) Uiform o Sphere This MATHEMATICA code gives the exact value of the Bayes rule for x = (0, 0, 0, 0, 0, 1) The code alpha=; beta=3; p=6;x = Table[0,{p}]; x[[6]]=1; delta = * alpha/p 1/( + beta) Hypergeometric1F1[alpha+1, (p+)/, Norm[x]ˆ/( + 4 beta)]/ Hypergeometric1F1[alpha,p/, Norm[x]ˆ/( + 4 beta)] x//n results i δ = {0, 0, 0, 0, 0, } 7

Monte Carlo Integration

Monte Carlo Integration Mote Carlo Itegratio I these otes we first review basic umerical itegratio methods (usig Riema approximatio ad the trapezoidal rule) ad their limitatios for evaluatig multidimesioal itegrals. Next we itroduce