Exchageable sequeces ad probabilities for probabilities 1996; modified 98 5 21 to add material o mutual iformatio; modified 98 7 21 to add Heath-Sudderth proof of de Fietti represetatio; modified 99 11 24 to make the presetatio clearer ad more complete ad 00 10 18 to iclude commets o the itegratio measure Suppose oe assigs a probability, P p 1,, p = P p, to the sigle-trial probabilities for alteratives The, i trials, the occurrece probability ie, the total probability that alterative i occurs i times, i = 1,, is give by Here p = p 1,, = dp p 1,, pp p! = dp 1!! p 1 1 p P p! = 1!! p 1 1 p = i, dp = dp 1 dp, ad the itegral rus over positive values of the sigle-trial probabilities The probability o probabilities, P p, is restricted to the simplex; ie, as a fuctio o positive values of the probabilities, it is proportioal to a delta fuctio δ p i 1 otice that, i cotrast to other otes, we do ot iclude the iverse directio cosie i the itegratio measure o the simplex, ad we put the δ fuctio that restricts to the simplex i the distributio rather tha i the itegratio measure The momet of sigle-trial probabilities, 1 p = dp p 1 1 p P p, is the probability for ay sequece i which occurrece umbers are give by the vector = 1,, The last form of p thus writes the occurrece probability i the form of a momet of the sigle-trial probabilities otice that the occurrece probabilities for trials are determied by the th-order momets of P p I particular, the margial probabilities for a sigle trial, p i = dp p i P p, 1
are the first momets of P p A exchageable probability assigmet or a exchageable sequece is oe such that the probability for a sequece does ot chage uder reörderig; i other words, all sequeces with the same occurrece vector have the same probability Ay probability o probabilities leads to a exchageable probability assigmet o the multi-trial hypothesis space This meas that there is a map from probabilities o probabilities to exchageable probability assigmets The de Fietti represetatio theorem asserts that ay exchageable probability assigmet correspods to a uique probability o probabilities Aother way of puttig this is that the map from probabilities o probabilities to exchageable probability assigmets is oe-to-oe ad oto We ca get at the uiqueess ie, the map is oe-to-oe easily Oe way to proceed is to defie a characteristic fuctio Φk e ik p = dp e ik p P p = 1,, = 1,, i 1!! k 1 1 k 1 p i! k 1 1 k p That P p is restricted to the simplex meas that for vectors of the form k = k1,, 1, the characteristic fuctio becomes Φk = e ik ow it is clear why two differet probabilities o probabilities caot lead to the same exchageable probability assigmet: if they did, they would have the same characteristic fuctio ad thus, uder the iverse Fourier trasform, they would be the same Aother way of puttig this is that the polyomials p 1 1 p are liearly idepedet ad complete but ot orthogoal Thus two differet probabilities o probabilities caot lead to the same exchageable sequece, for if they did, they would have have the same overlap with this complete set of polyomials ad thus would be the same Showig that every exchageable assigmet correspods to a probability o probabilities the map is oto requires more work Suppose, for example, that oe uses the occurrece probabilities p to defie a characteristic fuctio ad the iverts the Fourier trasform to get a fuctio P p The ormalizatio of the occurrece probabilities implies that Φk = e ik for k = k1,, 1, which i tur implies that P p is restricted to the surface i p i = 1 The difficulty is that oe ca t tell from this procedure that P p is restricted to positive values of the probabilities ie, restricted to the simplex or, eve worse, that it is positive This difficulty has to be remedied by usig some other method The simplest proof seems to be oe due to David Heath ad William Sudderth [The America Statisticia 304, 188 189 ovember 1976], which I sketch here for the case of biary alteratives, the case cosidered i their paper et X 1, X 2,, X M deote the results of trials of a biary quatity takig o values 0 ad 1, ad let p, K, K, be the probability for 1s i K trials Exchageability guaratees that K p, K = px 1 = 1,, X = 1, X +1 = 0,, X K = 0 2
We ca coditio the probability o the right o the occurrece of m 1s i all trials: K p, K = px 1 = 1,, X = 1, X +1 = 0,, X K = 0 m, pm, m=0 Give m 1s i trials, the sequeces are equally likely Thus the situatio is idetical m to drawig without replacemet from a ur that has m 1s o balls, ad we have that px 1 = 1,, X = 1, X +1 = 0,, X K = 0 m, = m m 1 1 = m m K K, m 1 m m 1 m K 1 1 1 K 1 q 1 r q r j = rr 1 r q + 1 = j=0 Therefore, we have the mai result that p, K = r! r q! K m m K pm, m=0 K The de Fietti represetatio theorem fails for sequeces that are exchageable for a fiite umber of trials : for fiite exchageable sequeces that ca be derived from a probability o probabilities, the probability o probabilities is ot uique, ad there are fiite exchageable sequeces i particular, aticorrelated sequeces such as drawig from a ur without replacemet that caot be derived from a probability o probabilities Yet the Heath-Sudderth proof establishes that all fiite exchageable sequeces ca be derived from mixtures of ur probabilities What remais is to take the limit We ca write p, K as a itegral p, K = K 1 dz z 0 P z = 1 z K K P z, pz, δz m/ m=0 is a distributio cocetrated at the -trial frequecies m/ I the limit, P z coverges to a cotiuous distributio o the simplex, ad the other term i the itegrad goes to z 1 z K, givig K 1 p, K = dz z 1 z K P z 0 3
What we have show is that if P, K is derived from a ifiite exchageable sequece, the it has a de Fietti represetatio i terms of a probability distributio o the simplex The result ca readily be exteded to obiary variables The coclusio is that a probability o probabilities is just a coveiet shorthad for specifyig occurrece probabilities o a multi-trial hypothesis space The Heath-Sudderth proof is based o the fact that if the multi-trial probabilities are derived from a probability o probabilities P p, ie, the i the limit of large, p, K = 1 0 dp K p 1 p K P p, p, K 1/K = P p = /K ; ie, the probability pm, that i the Heath-Sudderth proof becomes the probability o probabilities is just what it ought to be It is iterestig to ivestigate how much iformatio oe gais from trials about the sigle-trial probabilities p = p 1,, p This iformatio is quatified by the mutual iformatio HD ; p = HD HD p, HD = sequeces 1 p log 1 p = p log p 1 1 p 1,, is the Shao iformatio of the data gathered i trials ad HD p = dp P p p i log p i = p i log p i is the coditioal iformatio i the -trial data, give the sigle trial probabilities p otice that p i log p i HD HD p, the first term is the Shao iformatio for trials draw from a iid govered by the sigle-trial margial probabilities p i The first iequality is a cosequece of the subadditivity of Shao iformatio Whe the umber of trials is small, it is hard to make geeral statemets about the mutual iformatio If P p is cocetrated at several widely separated sigle-trial probabilities p, the it takes oly a few trials to begi gettig iformatio about which of the widely separated probabilities is geeratig the data I cotrast, suppose P p is cocetrated at a particular p withi a small rage for each alterative I this case it 4
takes may trials to begi gettig much iformatio about which sigle-trial probabilities withi the rage are geeratig the data We ca estimate the umber of trials required i the followig way, we cosider oly two alteratives = 2 for simplicity After trials, the data is able to determie p 1 to withi a ucertaity give roughly by p1 p 2 / Thus oe would expect to begi gettig iformatio about the value of p 1 whe p1 p 2 /, ie, whe p 1 p 2 / 2 As becomes eve bigger, ie, p 1 p 2 / 2, the data is able to distiguish roughly / p 1 p 2 / = 2 /p 1 p 2 values of p 1, ad the mutual iformatio should be roughly the logarithm of this umber of values, ie, HD ; p log 2 p 1 p 2 We ca put these cosideratios o a firm footig by cosiderig the Gaussia approximatio to the biomial distributio p p The Gaussia approximatio requires that for each alterative i, the umber of trials is large eough that p i / p i, ie, p i 1, for all probabilities p that have substatial support i P p If we further assume that the umber of trials is large eough that the data ca distiguish all the features of P p ie, for each alterative, P p does ot vary sigificatly o the scale p i / the it is a tedious, but straightforward computatio to show that 1 p = 1 1 P p = 2π 1/2 1 ad! p = 1!! p 1 1 p = 1 p 1 P =, which leads to a mutual iformatio HD ; p = dp P p log P pvp, 1 Vp = 1/2 2π p1 p is a probability-depedet volume elemet o the probability simplex, which ca be thought of as the distiguishability volume determied by trials The mutual iformatio 1 has the followig iterpretatio: bi the probabilities p accordig to the volume elemet Vp; the mutual iformatio is the Shao iformatio for the discrete distributio obtaied by replacig the cotiuous distributio P p by the distributio of probabilities for the bis Aother way of sayig this is that the mutual iformatio 1 is the etropy of P p relative to a positio-depedet measure mp = 1/Vp, which describes the positio-depedet distiguishability of distributios p I the aforemetioed example, P p is cocetrated at a particular p, each probability havig a small rage of possible values, the mutual iformatio 1 becomes 1 2 /2π 1/2 HD ; p = log = log, Vp p1 p 5
which simplifies to the estimate above for = 2 Actually, this example is flawed because it requires oe probability, say p, to vary over a rage 1 We ca do a better job of takig ito accout the volume o the simplex by usig a Gaussia p i q i 2 P p = exp 2π 2 1/2 2 2, i which case the mutual iformatio 1 becomes 2πe 2 1/2 / HD ; p = log Vq e 2 1/2 = log q1 q I the first form here, the umerator withi the logarithm ca be thought of as the volume occupied by the Gaussia The is the correctio to the volume that comes from projectig oto the simplex otice that aother eat way to write the mutual iformatio 1 comes from itroducig a Wootters distiguishability metric ds 2 = 4 d p i 2 = dp 2 i p i The volume elemet for the Wootters metric is d W p = δ p i 2 1 2 d p 1 d p dp 1 dp = δ p i 1 p1 p = dp p1 p Redefiig the probability P p i terms of the Wootters metric, gives P W pd W p = P pdp, P W p = p 1 p P p = P pvp = 1/2 2π P W p, ad the mutual iformatio 1 becomes 2π 1/2 HD ; p = d W p P W p log P p W Sice the Wootters metric is based o distiguishability from data i may trials, the mutual iformatio becomes the iformatio of P W p relative to a -depedet, but positio-idepedet measure m W p = /2π 1/2 6