Outlne Pror Inforaton and Subjectve Probablty u89603 1 Subjectve Probablty Subjectve Deternaton of the Pror Densty Nonnforatve Prors Maxu Entropy Prors Usng the Margnal Dstrbuton to Deterne the Pror Herarchcal Pror Crtcss Subjectve Probablty Subjectve Deternaton of the Pror Densty Pror nforaton Classcal concept of probablty: frequency vewpont Subjectve probablty: deal wth rando that frequency vewpont does not apply Ex: con tossng & uneployent rate for next year The Hstogra Approach The Relatve Lelhood Approach Matchng a Gven Functonal For CDF Deternaton 3 4 The Hstogra Approach The Relatve Lelhood Approach When s an nterval of real lne, the ost approach to use s the hstogra. Dvde nto ntervals, deterne the subjectve probablty of each nterval, and plot a probablty hstogra. Short cut: how any te nterval? what sze of ntervals? When s a subset of the real lne, copare the ntutve lelhoods of varous ponts n, and setch a pror densty. Ex: =[0,1] Deterne the ost lely paraeter pont =, whch s three tes as lely as 3 4 = 0, the least lely ones. Then deterne three other ponts copared wth = 0 and setch the result. 5 6
Matchng a Gven Functonal For Assue that () s of a gven functonal for, and choose the densty whch ost closely atched pror belefs After deterned the functonal for, choose paraeters for the functon fro estated pror oents. subjectvely estate several fractles of pror dstrbuton, and atchng these fractles Draw bacs: Only useful when certan specfc functonal fors of pror are assued. 7 Matchng a Gven Functonal For Exaple: =(-, ), pror s thought to be fro noral faly. Deterne the edan s 0, and the quartles are -1 and 1. Snce ean s equal to edan, =0. P(Z<-1/(.19) 1/ )=1/4 when Z s N(0,1). the densty of pror s N(0,.19) 8 CDF Deternaton Ths approach can be done by subjectvely deternng several -fractles, z(), plottng the ponts (z(), ), and setchng a sooth curve jonng the. Dscusson Multvarate pror densty can be consderable The easest way s the use of a gven functonal for, then only a few paraeters need to be deterned subjectvely. Also, ore easer s the case n whch the coordnate,, of are thought to be ndependent. The pror s then the product of the unvarate pror densty of the. Ex: ( )= ( 1, ) If not, the best way s to deterne condtonal and argnal pror denstes Ex: ( 1, ) = ( 1 ) ( 1 ) 9 10 Nonnforatve Prors Because of the copellng reasons to perfor a condtonal analyss and the attractveness of usng Bayesan achnery to do so, there have been attepts to use the Bayesan approach even when no pror nforaton s avalable. Ex: Suppose the paraeter of nterest s a noral ean, so =(-, ). Nonnforatve pror s chosen to be ()=1 (not ()=c>0) (called the unfor densty on R 1, and was ntroduce by Laplace(189)) Nonnforatve Prors Soetes, nonnforatve cannot antan consstency. The lac of nvarance of the constant pror has led to a search for nonnforatve prors whch are approprately nvarant under transforatons. 11 1
Nonnforatve Prors for Locaton and Scale Probles Efforts to derve nonforatve prors through consderaton of transforaton of a proble had ts begnnngs wth Jeffreys (cf. Jeffreys(1961)). It has been extensvely used n Hartgan (1964), Jaynes (1968,1983), Vllegas (1977,1981,1984), and elsewhere. Exaple: Locaton Paraeters and are subset of R p, and the densty of X s of the for f(x-), called locaton densty. s called a locaton paraeter. The N(, )( fxed), T(,, )( and fxed), (,)( fxed), and N p (,)( fxed) denstes are all exaples of locaton denstes. Also, a saple of..d rando varables s sad to be for a locaton densty f ther coon densty s a locaton densty. To derve a nonnforatve pror for ths stuaton, we observe the r.v. Y=X+c (c R p ) 13 14 Defnng =+c,t s clear that Y has densty f(y-). If now ==R p, then saple space and paraeter space for the (Y,) proble are also R p. The (X,) and (Y,) are thus dentcal n strcture. Let and * denote the nonnforatve prors n the (X,) and (Y,) respectvely, the above ples P ( A)=P * ( A) for any set n R p. Snce =+c, t should also be true P * (A)=P (+c A)=P ( A-c) Then, P ( A)=P ( A-c) 15 Assung that the pror has a densty, we can wrte ( ) d ( ) ( ) A d c d Ac A If ths hold for all sets A, t can t ust be true that ()= (-c) for all. Settng =c thus gves (c)=(0) Ths should be hold for all c R p. The concluson s that ust be a constant functon. It s convenent to choose the constant to be 1, so nonnforatve pror densty for a locaton paraeter s ()= 1 16 Nonnforatve Prors n general Settngs For general proble, varous suggestons have been advanced for deternng a nonforatve pror. The ost wdely used ethod s that of Jeffreys (1961), whch s to choose ()=[I()] 1/ I() s the expected Fsher nforaton, log f( X ) I()= -E [ ] If =( 1,, p ) t s a vector, Jeffreys (1961) suggest the use of ()=[det I()] 1/ log f( X ) I j ()= -E [ ] 17 Dscusson A nuber of crtcss have rased concernng the use of nonnforatve prors. Volatng the Lelhood Prncpal. See Gesser (1984a)) Margnalzaton paradox of Dawd, Stone, and Zde (1973) 18
Dscusson There are two coon responses to these crtcss of nonnforatve pror Bayesan analyss. The frst response, attepted by soe nonnforatve pror Bayesans, s to argue for the correctness of ther favorte nonnforatve pror approach, together wth attepts to rebut the paradoxes and counterexaples. The second response s to argue that, operatonally, t s rare for the choce of a nonnforatve pror to aredly affect the answer, so that any reasonable nonnforatve pror can be used. Maxu Entropy Prors Frequently partal pror nforaton s avalable, outsde of whch t s desred to use a pror that s as nonnforatve as possble Defnton 1: Assue s dscrete, let be a probablty densty on. The entropy of, to be denoted ()= - ( )log( ) Entropy has a drect relatonshp to nforaton theory, and n a sense easures the aount of uncertanty nherent n the probablty dstrbuton 19 0 Assue that partal pror nforaton concernng s avalable. E [g ()]= ( )g ( )=, =1,, * It sees reasonable to see the pror dstrbuton whch axzes entropy aong all those dstrbutons whch satsfy the gven set of restrctons. The soluton s gven by exp 1g ( ) ( ##proof ) exp 1g ( ), where are constants to be deterned for the constrant n * If s contnuous, the use of axu entropy becoes ore coplcated. Jaynes (1968) aes a stronger case for defnng entropy as ()= -E ( ) ( ) [log ]= -,where 0 () 0( ) ( )log( ) 0( ) d s the natural nvarant nonnforatve pror for the proble. In the presence of partal pror nforaton of the for E [g ()]= g ( ) ( ) d =, =1,,, ** the pror densty whch axzes () s gven by 0( )exp 1g ( ) ( ) 0( ) exp 1g ( ), where are constants to be deterned for the constrant n ** 1 Exaple: Assue =R 1, s a locaton paraeter. The natural nonnforatve pror s then 0 () =1. It s beleved that The true pror ean s and varance s. These restrcton are of the for ** wth g 1 ()=, 1 =, =,and g ()= (-) The axu entropy pror, subject to these restrcton s exp 1 ( ) ( ) exp 1 ( ) d, where 1 and are to be chosen fro **. Clearly 1 + (-) = [- ] 1 1 +[ 1 - ] 4 3 Exaple (cont) Hence ( ) exp [ ( / ) 1 1 The denonator s a constant, so ( ) s noral densty wth ean - 1 / and varance -1/. Choose 1 =0 and =-1/ satsfes **. Thus ( ) s a N(, ) densty. Dffcultes arsng fro ths approach: Although the need to use a nonnforatve pror n the dervaton of s not too serous, a ore serous proble s that often won t exst. exp [ ( / ) d 4
Usng the Margnal Dstrbuton to Deterne the Pror If X has probablty densty f(x ), and has probablty densty (), then the jont densty of X and s h(x,)=f(x ) (). Defnton : The argnal densty of X s (x )= f( x ) df ( ) = f ( x ) ( ) d (cont case) f( x ) ( ) (dscrete case) Bayesans have long used to chec assuptons. If (for the actual observed data x) turns out to be sall, then the assuptons (the odel f and pror ) have not predcted what actually occurred and are suspect. Inforaton About Subjectve nowledge The data tself 5 6 The ML- Approach to Pror Selecton In Defnton, t was ponted out that (x ) reflects the plausblty of f and, n the lght of the data. If we treat f as defntely nown, t follows that (x ) reflects the plausblty of It s reasonable to consder (x ) as a lelhood functon for. Faced wth a lelhood functon for, a natural ethod of choosng s to use axu lelhood. 7 The ML- Approach to Pror Selecton (cont) Defnton 3: Suppose s a class of prors under consderaton, and that * satsfes (for the observed data x) (x *)= sup(x ) Then * wll be called type axu lelhood pror, or ML- pror for short. When s the class ={: ()=g( ), }, then sup (x )= sup (x g( )), so that one sply ax over the hyperparaeter. 8 Herarchcal Pror Herarchcal Pror also called a ultstage pror. The dea s that one ay have structural and subjectve pror nforaton at the sae te, and t s often convenent to odel ths n stages. For nstance, n the Bayes scenaro, structural nowledge that the were..d. led to the frst stage pror descrpton p 1 ()= 0 ( ) 1 The herarchcal approach would see to place a second stage subjectve pror on 0. The herarchcal approach s ost coonly used when the frst stage,, conssts of prors of a certan functonal for. 9 Crtcss Objectvty Classcal statstcs s objectve and hence sutable for the needs of scence, whle Bayesans s subjectve and only useful for ang personal decsons. Msuse of pror dstrbutons Robustness (n secton 4.7) Data or odel dependent prors The dealzed Bayesan vew s that s a quantty about whch separate nforaton exsts, and that ths nforaton s to be cobned wth that n the data. The approach presues the pror doesn t depend n any way on the data. 30
## proof: Entropy: ()= - ( )log( ) Constrant: ( )g ( )= =1... ( )=1 Then, by Lagrange's ultpler ethod, G(( 1 ),, ( n ))= -( )log( )+ ( ( )g ( )- ) +( ( )-1) G( ( ) 0= = -log( )-1+ g ( )+ ( ) -log ( )-1+ g ( )+ =0 ( ) =exp[-1++ g ( )] Snce So ( ) =1 1 exp[-1+]= exp[ g( )] exp[ g( )] Therefore ( ) = exp[ g( )] 31 Fro Gesser 1984a It was ponted out by Barnerd, Jenns, and Wnsten (196) that f a con whose probablty of heads s cae up heads t tes and tals n-t tes n a seres of ndependent tosses, rrespectve of the stoppng rule, the lelhood would be L() t (1- ) n-t, and the lelhood prncpal would then dctate that any nference about should not depend on whch stoppng rule was actually used. Two coon stoppng rules are: (a) fx the total nuber of tosses and observe the nuber of heads (b) observe the total nuber of tosses requred to attan a fxed nuber of heads 3 Two cases In case (a), the saplng dstrbuton of T, the nuber of heads, s n Pr[ T = t n ]= t (1-) n-t, t t=0,1,,n In case (b), the saplng dstrbuton of N, the nuber of tosses requred to obtan t heads, s n 1 Pr[ N = n t ]= t (1-) n-t, t 1 n=t,t+1, 33 Two cases Now there are Bayesans who have developed rules for obtanng reference pror dstrbutons that purport to express lttle or no nforaton regardng the paraeter. All of these ethods, except Gesser s and Zellner s, yeld the sae reference prors P B () -1/ (1- )-1/ for the bnoal and P N () -1 (1- )-1/ for the negatve bnoal case. Hence the posteror denstes for these two cases are P B ( t,n) t-1/ (1- )n-1-1/ and P N ( t,n) t-1 (1- )n-t-1/ respectvely. 34 Concluson In fact for all of these ethods, the pror dstrbuton wll depend on the saplng rule, and consequently so wll the posteror dstrbuton. The lelhood prncpal says that any nference about the sae paraeter should not depend on whch saplng rule was used. So one ay volate the lelhood prncpal n usng nonnforatve prors. 35 Soe Bayesans Jeffreys (1961) nvoed nvarance, Box and Tao (1973) recoended prors such that lelhoods are data translated n soe sense. Aae (1978) and Gesser (1979) forulated procedures nvolvng the predctve dstrbuton and Kullbac-Lebler dvergence easures. Berbardo (1979) used the noton of axzng entropy n the lt. Zellner (1977) axzed the Shannon nforaton of the relatve to that of the pror. 36