Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors

Size: px

Start display at page:

Download "Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors"

Amber Garrison
6 years ago
Views:

1 Stat60: Bayesan Modelng and Inference Lecture Date: February, 00 Reference Prors Lecturer: Mchael I. Jordan Scrbe: Steven Troxler and Wayne Lee In ths lecture, we assume that θ R; n hgher-dmensons, reference prors are defned n a sequental manner based on the parameter of prmary nterest, and we wll address ths constructon n later lectures. Motvatng Reference Prors Consder an nference scenaro n whch we have data X comng from a dstrbuton px θ dependng on a parameter θ, and suppose that TX s a suffcent statstc for θ. Ths mples that px θ s n one-to-one correspondence wth pt θ := ptx θ. Our goal s to develop a non-nformatve pror for θ. One possble way to choose a non-nformatve pror s va nformaton: we select the pror πθ to maxmze mutual nformaton between T k and θ : take π θ = argmax pθ I pθ θ,t k, where. I pθ θ,t = [ pt pθ dθ pθ tlog pθ t } {{ } KLpθ t,pθ The nner term n the double ntegral, KLpθ t,pθ := pθ t k log pθ tk pθ dθ, s the Kullback-Lebler dvergence between the posteror and pror when we observe a partcular value T = t. The mutual nformaton, then, s an average of Kullback-Lebler dvergence wth respect to the margnal dstrbuton pt of T. Ths dea s clever, but does not qute work as posed. Unfortunately, the problem of maxmzng the mutual nformaton between T and θ s often not analytcally tractable. We mght hope to solve the problem numercally, but ths can be dffcult. An alternatve s to use asymptotcs, whch often results n more analytcally tractable expressons. To ths end, we consder the followng hypothetcal stuaton: nstead of observng TX for just a sngle experment, we repeat the experment k ndependent condtonal on θ, whch remans the same throughout tmes, obtanng a vector T k consstng of k ndependent copes of T. Instead of maxmzng the mutual nformaton just between T and θ, we maxmze the nformaton between the vector T k and θ, obtanng π k θ = argmax pθ I pθ θ,t k, where. I pθ θ,t k = [ pt k pθ t k log pθ tk pθ dθ dt k d We can obtan an analytcally tractable unnformatve pror by then takng πθ = lm k π k θ, where the lmt s n a loose sense that allows for mproper prors. Bernardo Bernardo 005 argues that not only does takng k to nfnty gve us a convenent way to compute unnformatve prors, but t s also phlosophcally sense the rght thng to do. Hs argument s loosely that, when choosng a pror, we want to not only consder the nformaton we obtan from a partcular experment, but the nformaton we mght obtan from many future experments.

2 Reference Prors Computng Reference Prors and the Bernsten Von Mses Theorem. Solvng the Mutual Informaton Problem To fnd a more convenent form of π k so that we may apply asymptotc theory, we rewrte I pθ θ,t k as [ I pθ θ,t k = pt k pθ t k log pθ tk pθ dθ dt k = pθlog f kθ pθ dθ, where f k θ = exp{ pt k θlog pθ t k dt k }. Usng a functonal form of a Lagrangan to nclude the constrant that pθ =, the problem becomes π k θ = sup pθ pθlog f kθ pθ + λ pθdθ p θ f k θ Ths can be solved va methods of calculus of varatons, and the soluton s π k θ f k θ. Although we wll not go through the calculus-of-varatons argument here, we wll motvate ths soluton usng the dscrete case: f T and θ are both dscrete, then the problem s of the form π = argmax p p log q p + λ p. Takng partal dervatves wth respect to p j, we obtan [ q p log + λ p p j p = logq j /p + p j q j /p j q j /p j + λ = log p j + log q j + λ, and settng ths partal dervatve to zero we obtan log p = log q + λ p = q e λ π = q.. The Bernsten Von Mses Theorem and an Asymptotc Soluton We have now reduced the problem to computng { f k θ = exp pt k θlog pθ t k dt k }. It s possble to obtan an analytcal soluton for the lmt as k usng the fact that pθ t k s asymptotcally Gaussan, concentrated at the true value θ 0.e. the value of θ such that T k j d pt θ 0. Ths fact, whch ensures smlar behavor of Bayesan posterors and frequentsts samplng dstrbutons as the sample sze tends to nfnty, s a consequence of the Bernsten Von Mses Theorem, sometmes called the Bayesan Central Lmt Theorem:

3 Reference Prors 3 Theorem. Assume regularty condtons on the model whch ensure asymptotc normalty of an asymptotcally effcent n the frequentst sense estmator θ k and also assume that the pror satsfes regularty assumptons, n partcular that t s near θ 0. If T k denotes a vector of d components Tj k drawn from the dstrbuton of T θ 0, then pθ t k N θ k,i k θ 0 0, 3 where I k θ 0 denotes the Fsher nformaton at θ 0 and the convergence denotes convergence n probablty. Here, denotes the total varaton dstance. In general, any asymptotcally effcent estmator θ k s also asymptotcally suffcent, so we may replace t k n 3 wth θ k, obtanng pθ θ k N θk,i k 0 θ 0. 4 Also, usng the densty of a multvarate normal, we know that f y N θk,i k θ 0, then py = I k θ 0 exp I kθ 0 y θ k By ndependence, we know that I k θ 0 = /k I θ 0 wth I θ 0 := I θ 0, so the precedng expresson, combned wth lmt n 4, and the fact that θ k s consstent, leads to the followng approxmate representaton for large k : pθ θ k k I θk exp k I θ k θ θ k The remander of the argument wll be somewhat loose; for rgorous arguments of our fnal result that a one-dmensonal reference pror s a Jeffreys pror, see Bernardo s revew paper Bernardo 005. Suppose now that θ k results from k ndependent draws of X, where X has dstrbuton of X θ 0 for a partcular θ 0. Then, because θ k s consstent, θ k p θ 0 and under regularty condtons, also I θ k Iθ 0. Hence, pθ 0 θ k k I θk exp k I θ0 exp = k I θ0. k I θ k θ 0 θ k k Iθ 0 θ 0 θ 0 Returnng now to equaton, snce the nner ntegral s an expectaton wth respect to pt k θ so that the precedng theory apples so long as there exsts some asymptotcally normal effcent estmator θ k = θt k, we obtan { f k θ exp pt k θlog } I θ dt k. Snce the term nsde the ntegral does not depend on t k, and t s beng ntegrated aganst a densty, as k we have f k θ I θ0. In other words, when there s an asymptotcally normal asymptotcally effcent estmator θ k, the Jeffreys pror s a reference pror n one dmenson!

4 4 Reference Prors 3 Example: Reference Pror for Exponental Dstrbuton Let X Expθ be d. A suffcent statstc for θ s x := n, and the maxmum lkelhood estmator s ˆθ MLE = x. We could use the Jeffreys pror drectly, but let us nstead work through some of the work above for ths specfc case. P X Set X = X,...,X n. Then px θ = θ n exp n xθ. Snce the Bernsten Von Mses theorem ensures that the posteror s the same as n regardless of the pror we use, we may just take the pror to be flat for convenence. Hence, asymptotcally, we have pθ ˆθ ML θ n exp nθ/ˆθ ML, and snce ˆθ ML s consstent,.e. ˆθ ML θ 0 when X Expθ 0, we obtan π n θ = n+ n ˆθ ML Γk + θn exp nθ ˆθML=θ ˆθ ML θ, where we evaluated at ˆθ ML = θ because, n the defnton of f n, we are ntegratng wth respect to px θ, so that the consstency apples. 4 Invarance to Transformatons We have already showed that reference prors are the same as Jeffreys prors under regularty condtons, so they are nvarant to transformatons n that settng, but t also follows drectly from the defnton that they are nvarant to transformaton even n the absence of such regularty condtons. Specfcally, the reference pror was defned n terms of mutual nformaton, but mutual nformaton s transformaton-nvarant. That s, Iθ,T k = pt k pθ t k log pθ tk pθ dθdtk = pt k pφ t k log pφ tk pφ dφdtk. The equalty holds because, when we do the changes of varables pφ = pθφ dθ dφ and pφ t k = pθφ t k dθ dφ, the Jacoban terms nsde the logarthm cancel, so that the logarthms n the ntegrals are equal. The term dθ dφ n pφ = pθφ dθ dφ, on the other hand, s exactly what we obtan f we do a change of varables from θ to φ n the nner ntegral on the left hand sde, obtanng pθ t k log pθ tk pθ dθ = pθφ t k log pθφ tk dθ pθ dφ dφ. Hence, mutual nformaton s transformaton-nvarant, and so are reference prors. 5 Example: Locaton and Scale Famles 5. Locaton Famles For a gven densty f whch we dentfy wth the nduced dstrbuton defne a class of measures m = {fx µ : x R,µ R}.

5 Reference Prors 5 For a partcular µ and random varable X fx µ, let Y = X + α, and θ = µ + α. If π denotes the reference pror under the parametrzaton n our defnton of m above, then f f y = fy α µ, the smlarly constructed famly m = {f y θ : y R,θ R} s actually equal to m, reparametrzed by the transformaton θ = µ + α. Snce we know reference prors are transformaton-nvarant and the Jacoban of the transformaton s equal to, π θ = πµ. But snce the two famles are dentcal, we also have π θ = πµ + α, and hence πµ + α = πµ. In other words, reference prors for one-dmensonal locaton famles are flat. 5. Scale Famles Defne the famly m = { } σ fx σ : x > 0,σ > 0. Takng y = log x and φ = log σ, defne an equvalent reparametrzed famly m = {fexpy φ : y R,φ R}. In m, φ s a locaton famly, so π φ s flat by the work n Secton 5.. But snce π φ = σπσ by a change of varables and transformaton-nvarance, we therefore obtan πσ σ. References Bernardo, J Reference analyss. Handbook of Statstcs, 5:7 60.

Conjugacy and the Exponential Family

Conjugacy and the Exponential Family CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the