University of Toronto

Size: px

Start display at page:

Download "University of Toronto"

Roderick Griffin
6 years ago
Views:

1 A Limit Result for the Prior Predictive by Michael Evans Department of Statistics University of Toronto and Gun Ho Jang Department of Statistics University of Toronto Technical Report No April 15, 2010 TECHNICAL REPORT SERIES University of Toronto Department of Statistics

2 A Limit Result for the Prior Predictive Michael Evans and Gun Ho Jang Department of Statistics University of Toronto Abstract We establish results concerning the convergence of the prior predictive distribution. An application is made to the problem of checking for prior-data con ict. Keywords: minimal su ciency, prior predictive, convergence, prior-data con ict 1 Introduction Suppose we have a model given by a collection of probability measures fp : 2 g where P (A) = R A f (x) (dx); i.e., each P is absolutely continuous with respect to a support measure on the sample space X ; with the density denoted by f : With this formulation a prior leads to a prior predictive probability measure on X given by M(A) = R P (A) (d) = R A m(x) (dx); where m(x) = R f (x) (d): If T : X! T is a minimal su cient statistic for fp : 2 g; then it is well known that the posterior distribution for is the same whether we observe x or T (x) and so we denote the posterior by ( j T ). Furthermore, the conditional distribution of x given T is independent of and we denote this conditional measure by P ( j T ): The joint distribution P can then be factored as P = M ( j x) = P ( j T ) M T ( j T ) where M T is the marginal prior predictive distribution of T: If f T denotes the marginal density of T; with respect to a support measure T on T ; then m T (t) = R f T (t) (d) denotes the 1

3 density of M T with respect to T : If denotes the density of ; with respect to a support measure on ; then we can write m T (t) = R f T (t)() (d): Our concern here is with the behavior of M T as the amount of data grows. A simple example illustrates the asymptotic behavior of this distribution that we might expect to hold in more general situations. Example 1 (Location normal). Suppose that x = (x 1 ; : : : ; x n ) is a sample from a N(; 1) distribution where 2 R 1 is unknown. Then a minimal su cient statistic is given by (x) = x and (x) converges almost surely to the true value as n! 1: Suppose we put a N( 0 ; 0) 2 prior on : The prior predictive distribution M Tn is then easily obtained from x = + z where z N(0; 1=n) independent of N( 0 ; 2 0); namely, M Tn is the N( 0 ; =n) distribution. We see immediately that M Tn converges in distribution to the N( 0 ; 2 0) distribution as n! 1: Furthermore, m Tn (t) converges almost surely to (2) 1=2 1 0 expf (t 0 ) 2 =2 2 0g as n! 1; uniformly for t in a compact set. Simple computations show that these results do not depend on using a normal prior, namely, if we use prior measure with continuous density ; then M Tn converges in distribution to as n! 1; and m Tn (t) converges almost surely to as n! 1; uniformly for t in a compact set. So in Example 1 we can think of m T (T (x)) as a consistent estimator of the prior evaluated at the true value of the parameter. The signi cance of this is that the value of m T (T (x)) gives an indication of whether or not the prior has been poorly chosen, in the sense that the true value of may lie in a region where little prior probability has been assigned. Of course, we cannot tell this from the value m T (T (x)) itself but need to calibrate this on some scale. In Evans and Moshonov (2006, 2007) the P-value M T (m T (t) m T (T (x))); (1) and some variations of this, was proposed for checking for prior-data con ict. Note that this P-value is a modi cation of a P-value proposed by Box (1980) for general model checking in Bayesian contexts. In Example 1, when using the normal prior, we see that 2

4 (1) converges to 2(1 (j 0 j= 0 )) as n! 1; where is the true value of : So (1) is a consistent estimator of the P-value which measures whether the true value of the parameter lies in the tails of the prior. In Section 2 we prove that (1) converges to (() ( )); where is the true value of ; in fairly general circumstances. The P-value (() ( )) will be small whenever the true value lies in a region of low prior probability and so we have an instance of prior-data con ict. As such, (1) is seen to be an appropriate measure of prior-data con ict. A criticism of (1) is that, in the case of continuous models at least, the P-value is not invariant under smooth transformations. In particular, suppose that W : T! W is 1-1 and smooth. Let J W (t) be the reciprocal of the Jacobian determinant of W evaluated at t: Then, if instead of T we use W (T ) as the minimal su cient statistic, the P-value is M W (m W (w) m W (W (T (x)))) = M T (m T (t)j W (t) m T (T (x))j W (T (x))) and this will not equal (1) unless J W (t) is constant. In Evans and Jang (2010) the general problem of computing P-values, based on the density of a discrepancy statistic, to assess whether or not the data came from a single xed distribution, was considered. An invariant P-value was proposed. For (1) this entails using instead the P-value M T (m T (t) m T (T (x))); (2) where m T (t) = m T (t)e(j 1 T (X) j T (X) = t); J T (x) = j det(dt (x) dt 0 (x))j 1=2 and dt denotes the di erential of T: The factor E(J 1 T (X) j T (X) = t) corrects for volume distortions due to the transformation T and is independent of because T is minimal su cient. Note that m T (t) is the density of M T with respect to the support measure (E(J 1 T (X) j T (X) = t)) 1 (dt): In Example 1 we have that J T (x) is constant and so (1) equals (2). While the P-value (2) will generally di er from (1), it is often the case that the e ect of E(J 1 T (X) j T (X) = t) is negligible. We establish a convergence result for (2) in Section 3. While there are numerous discussions concerning asymptotics for a posterior analysis, for example, Walker (1969), Heyde and Johnstone (1979), and Chen (1985), there seem 3

5 to be almost no discussions concerned with convergence issues associated with the prior predictive distribution. Such results also have implications for methods that choose the prior based on the prior predictive. This paper addresses some of these problems. 2 Convergence of the Basic P-value In the Appendix we provide the proof of the following result. Theorem 1. Suppose is an open subset of a Euclidean space and assume (i)! a.s. P for every ; (ii) m Tn (t)! (t) uniformly on compact subsets of ; (iii) is continuous and the prior distribution of () has no atoms, then M Tn (m Tn (t) m Tn ( (x n )))! (() ( )) a.s. P where is the true value of : Note that Theorem 1 implicitly assumes that the sampling model for is continuous. We will subsequently discuss how to handle the discrete case. To apply this result we need to establish (ii). We discuss several examples. Example 2 (Scale-Gamma). Let x = (x 1 ; : : : ; x n ) be a sample from a Gamma( 0 ; ) distribution where the scale parameter > 0 is unknown. Then the statistic (x) = (n 0 ) 1 P n i=1 x i Gamma(n 0 ; =(n 0 )) is minimal su cient and it converges almost surely to the true value of : We prove the following result in the Appendix. Lemma 2. If (x) Gamma(n 0 ; =(n 0 )) and the prior on is continuous, then (ii) of Theorem 1 holds. So if, in addition the prior distribution on (0; 1) has no atoms, then Theorem 1 applies and we have the convergence of (1). Certainly these conditions apply to the commonly used priors on a scale parameter. The following example uses Example 2 in a problem of considerable importance for statistical practice. 4

6 Example 3 (Normal linear regression). We consider rst the situation where we have a sample x = (x 1 ; : : : ; x n ) from a N(; 2 ) distribution with 2 R 1 and > 0 unknown. Then (x) = (T 1n (x); T 2n (x)) = (x; s 2 ) is a minimal su cient statistic and (x)! (; 2 ) as n! 1: We prove the following result in the Appendix. Lemma 3. If (x) = (x; s) where x N(; 2 =n) independent of s 2 Gamma((n 1)=2; 2 2 =(n 1)) and the prior on (; 2 ) is continuous, then (ii) of Theorem 1 holds. The prior is commonly prescribed by rst stating a prior 2 for 2 and then a conditional prior 1 ( j 2 ) for : As discussed in Evans and Moshonov (2006, 2007), it then makes sense to check 2 rst and, if 2 passes, then check 1 : With this approach we can learn more about where the prior is de cient, if indeed there is a problem: Following that development, the check for 2 is based on T 2n (x) = s 2 via the P-value M T2n (m T2n (t 2 ) m T2n (T 2n (x))): Now T 2n (x) Gamma((n 1)=2; 2 2 =(n 1)): So, when 2 satis es the conditions of Theorem 1, Example 2 applies with 0 = 1=2 and M T2n (m T2n (t 2 ) m T2n (T 2n (x)))! 2 ( 2 ( 2 ) 2 ()): 2 To check 1 the relevant P- value to use is M T1n (m T1n (t 1 j T 2n (x) = s 2 ) m T1n (T 1n (x) j T 2n (x) = s 2 ) j T 2n (x) = s 2 ): (3) Now consider m T1n (t 1 j T 2n = t 2 ) = m Tn (t 1 ; t 2 )=m T2n (t 2 ): Since m Tn (t 1 ; t 2 )! (t 1 ; t 2 ) and m T2n (t 2 )! 2 (t 2 ) uniformly on compact sets, then we have that m T1n (t 1 j T 2n = t 2 )! (t 1 ; t 2 )= 2 (t 2 ) = 1 (t 1 j t 2 ) uniformly on compact sets. Furthermore, the measures M T1n ( j T 2n = s 2 ) converge in distribution to 1 (t 1 j ); 2 m T1n (t 1 j T 2n = s 2 ) converges almost surely N( ; ) 2 to 1 (t 1 j ); 2 and m T1n (x j T 2n = s 2 ) converges almost surely N( ; ) 2 to 1 ( j ): 2 This implies the convergence almost surely N( ; ) 2 of (3) to 1 ( 1 ( j ) 2 1 ( j ) 2 j ): 2 For a normal linear regression model y n = X n + e; where X n 2 R nk and e N n (0; 2 I); we have that (y) = (T 1n (y); T 2n (y)) = (b; s 2 ) where b = (XnX 0 n ) 1 Xny 0 and s 2 = (n k) 1 jjy Xbjj 2 : Under suitable conditions on the X n matrices we have that! (; 2 ) almost surely: The convergence results for this situation then proceed 5

7 just as in as in the location-scale case. The following result, with the proof provided in the Appendix, is sometimes useful in establishing condition (ii) in Theorem 1. Theorem 4. Suppose is an open subset of R k ; and for any > 0; K compact there exist c 1, c 2 > 0 and N > 0 such that f ;Tn (t) c 1 e c2n whenever t 2 K; jjt jj > ; and n N: (4) The following are equivalent: (a) for any prior with continuous density ; then m Tn (t)! (t) uniformly for t 2 K (b) for any compact K and > 0; then Z f ;Tn (t)i(jjt jj < ) d! 1 as n! 1; uniformly for t 2 K: (5) We now consider an example where the distribution of is on a discrete subset of R 1 : In such a case we can t expect condition (ii) of Theorem 1 to hold at values of t where (t) > 0 but m Tn (t) = 0 for all n: Suppose, however, that has a lattice distribution with lattice step equal to h: Then for kh < t (k +1)h we de ne m cont (t) = m Tn ((k + 1)h)=h and 0 otherwise and treat m cont (t) as a density with respect to length measure. Since (x) is always on the lattice we see immediately that M cont (m cont (t) m cont ( (x))) = M Tn (m Tn (t) m Tn ( (x))): We can then apply Theorem 1 to m cont and this proves the convergence of (1). Note that m cont (t) = R f ;T cont n (t) (d) where f;t cont n (t) = f ;Tn ((k + 1)h)=h when kh < t (k + 1)h: Example 4 (Binomial). Suppose that x = (x 1 ; : : : ; x n ) is a sample from a Bernoulli() distribution where 2 (0; 1) is unknown. Then (x) = x is minimal su cient and converges to : For t 2 f0; 1=n; : : : ; 1g then f ;Tn (t) = P ( = t) = n nt nt (1 ) n(1 t) : In this case x has a discrete distribution on the lattice with step size equal to 1=n: In the Appendix we prove the following result. 6

8 Lemma 5. If n (x) Binomial(n; ) and is a continuous on (0; 1), then (ii) of Theorem 1 holds for m cont : Therefore, m cont (t) converges to (t) uniformly on each compact set and (1) converges provided the prior satis es the conditions of Theorem 1. One interesting case where does not satisfy the conditions of Theorem 1 arises when Uniform(0; 1) as the prior distribution of () has all of its mass at 1. In this case, however, we have that m cont (t) n=(n+1)! 1 uniformly for all t 2 (0; 1) and moreover M Tn (m Tn (t) m Tn ( (x))) = 1 = (() ( )) and so the convergence result is obvious. 3 Convergence of the Invariant P-value We now consider the convergence of (2). As noted, this P-value is invariant under smooth transformations and will agree with (1) whenever T is linear or the sampling model for T is discrete. This applies in Examples 1, 2, and 4 but not in Example 3. Example 5 (Normal linear regression). Consider the location-scale case. Clearly T 1n (x) = x is linear, and so the P-value for checking 1 agrees with the invariant version. But T 2n (x) = s 2 is nonlinear and so the P-value (1) for checking 2 is not the same as the invariant version. In this case dt 2n (x) = (2=(n 1))(x 1 x; : : : ; x n x) giving J T2n (x) = j det(dt 2n (x) dt 0 2n(x))j 1=2 = ( p n 1=2)s 1 and so E(J 1 T 2n (X) j (X) = (x; s 2 )) = (2= p n 1)s: Therefore, the invariant P-value is equal to M T2n (m T2n (t 2 )t 1=2 2 m T2n (T 2n (x))(t 2n (x)) 1=2 ) and this converges almost surely to 2 ( 2 ( 2 ) 2 ( 2 ) ) (see Theorem 6). The proof of the following result is virtually identical to that of Theorem 1. Theorem 6. Suppose is an open subset of a Euclidean space and assume (i)! a.s. P for every ; 7

9 (ii) w n (t) = E(J 1 (X) j (X) = t) is continuous and a n w n (t)! w(t) for some sequence a n ; (iii) a n m Tn (t)w n (t)! (t)w(t) uniformly on compact subsets of ; (iv) is continuous and the prior distribution of ()w() has no atoms, then M Tn (m (t) m ( (x n )))! (()w() ( )w( )) a.s. P where is the true value of : Note that ()w() is the density of with respect to the support measure (w()) 1 (d): Also note that when is k-dimensional and p n( ) is asymptotically normal, then, in many cases, we can take a n = n k=2 : The developments in this paper have required that the minimal su cient statistic be a consistent estimator of : The existence of such a minimal su cient statistic is guaranteed for exponential models. Suppose, however, that we reparameterize via the 1-1, smooth function ; namely, = (): Then we must replace by ( ) for the convergence results to hold as stated. If is nonlinear, however, then (1) will typically depend on whether we use or ( ); namely, it will implicitly depend on the parameterization. Using (2) this dependence is avoided and the P-value is independent of the choice of the minimal su cient statistic or equivalently the parameterization. The use of (2) seems more appropriate than (1) for this reason, although there is typically very little di erence in the P-values obtained. 4 Conclusions We have established convergence results for various prior predictive P-values that show directly that these are appropriate for checking for prior-data con ict, namely, assessing if the true value of the parameter is in the tails of the prior. Essentially these results are restricted to situations where a version of the minimal su cient statistic is a consistent estimate of the model parameter and this means our results apply in the context of exponential models. Similar results can undoubtedly be established in other contexts, in 8

10 particular for group models, and these are currently being developed. More generally convergence results for the prior predictive have implications for empirical Bayes methods. For example, suppose we have a family of priors f : 2 Ag with corresponding prior predictives m ;Tn for a minimal su cient statistic : Then convergence of m ;Tn ( (x)) to ( ) has the implication that maximizing m ;Tn ( (x)) over to select the prior is essentially nding the prior that has maximal value at the true value of the parameter. One might argue that it makes more sense to maximize M ;Tn (m ;Tn (t) m ;Tn ( (x))) over as then, based on the convergence of this P- value to ( () ( )); we are nding the prior for which the true value is least surprising. The implications of this are currently being investigated. Appendix Proof of Theorem 1 Let > 0: Then there exists > 0 such that j(t) ( 0 )j < =2 whenever t 2 B ( 0 ) and there exists N 1 such that for all n > N 1 ; (x n ) 2 B ( 0 ): Also there exists N 2 such that for all n > N 2 and for all t 2 B ( 0 ); then jm Tn (t) (t)j < =2: So, if n > maxfn 1 ; N 2 g then M Tn (m Tn (t) ( 0 ) ) M Tn (m Tn (t) m Tn ( (x n ))) M Tn (m Tn (t) ( 0 ) + ): Now we prove that M Tn (m Tn (t) ( 0 ) )! (() ( 0 ) ): Let 0 > 0: Let C be compact such that 0 2 C; (@C) = 0; and (C) 1 0 =2: By (i) and Slutsky s Theorem M Tn converges in law to. Therefore, M Tn (C)! (C) and so there exists N 3 such that for all n > N 3 ; M Tn (C) > 1 0 : Therefore, for all n > N 3 ; M Tn (m Tn (t) ( 0 ) ) 0 M Tn (m Tn (t) ( 0 ) ; C) M Tn (m Tn (t) ( 0 ) ) and we can make the LHS and RHS as close as we like by choosing 0 small. Let 00 > 0: There exists N 4 such that for all n > N 4 then jm Tn (t) (t)j < 00 for all t 2 C: When n > maxfn 3 ; N 4 g then, M Tn ((t) ( 0 ) 00 ; C) M Tn (m Tn (t) ( 0 ) ; C) M Tn ((t) ( 0 ) + 00 ; C) and the LHS converges to ((t) ( 0 ) 00 ; C) while the RHS converges to ((t) ( 0 ) + 00 ; C): By choosing 00 small we can make these quantities as close to ((t) ( 0 ) ; C) as we 9

11 like. This proves that M Tn (m Tn (t) ( 0 ) ; C)! ((t) ( 0 ) ; C) and this establishes that M Tn (m Tn (t) ( 0 ) )! ((t) ( 0 ) ): A similar argument shows that M Tn (m Tn (t) ( 0 )+)! ((t) ( 0 )+) and this completes the proof. Proof of Lemma 2 Suppose K is a compact set in (0; 1): Then, there are 0 < a < b < 1 such that K [a; b]. Fix > 0 satisfying < minfa=3; 1g. We prove (4) and (5), then apply Theorem 4. For t >, t= t=(t+) 1 =(b+). Also for t <, t= t=(t ) 1+=(a ). Hence, jt= 1j > = =(b+) for jt j > and t 2 K. Note (n 0 ) p 2(n0 ) n0 1=2 e n0. Since ue u+1 has peak 1 at u = 1, 1 = sup u:ju 1j> ue u+1 < 1. Also there exists N 1 > 1 such that n 1=2 n0=2 1 1 for all n N 1. Let u = t=. Then for jt j >, we get ju 1j > and f ;Tn (t) = [(n 0 ) n0 e n0 = (n 0 )t](ue u+1 ) n0 a 1 (n 0 =2) 1=2 n0 1 a 1 ( 0 =2) 1=2 e n2 1 0 log(1= 1). Hence, (4) holds. For (5), let I 0 = R f ;Tn(t)I(jt j < ) d R 1 0 [(n 0) n0 = (n 0 )]u n0 2 e n0u du = n 0 =(n 0 1). Also for t 2 K, I(jt j < ) I(jt= 1j < 1 ) I(jt= 1j < 1 n 1=2 log n) where 1 = =(b + ). Then, I 0 (n Z 0) n0 e n0 u n0 2 e n0(u 1) I(ju 1j < 1 ) du: (n 0 ) For ju 1j < 1 n 1=2 log n, a lower bound of the logarithm of the integrand is given by log(u n0 2 e n0(u 1) ) = n 0 (u 1) + (n 0 2) log(1 (1 u)) n 0 (n 1) (n 0 2)(1 u + (1 u) 2 =2 + j1 uj 3 ) n 0 (1 u) 2 =2 (2 + 0 (log n) 2 )j1 uj: The change of variable v = p n 0 (u I 0 (n 0) n0 1=2 Z e n0 (n 0 ) By Stirling s formula (n 0 ) n0 1) gives e v2 =2 (2+ 0(log n) 2 )jvj= p n 0 I(jvj < 1 1=2 0 log n) dv: 1=2 e n0 = (n 0 )! (2) 1=2. The integral converges to R e v 2 =2 dv by the Lebesgue dominated convergence theorem. Hence, I 0! 1 as n! 1. Thus (5) holds and, by Theorem 4, Theorem 1 (ii) holds. Proof of Lemma 3 We prove (4) and (5). The density f ; 2(x; s 2 ) of = (X; S 2 ) is given by ((n 1)=2) (n 1)=2 (n 1)=2 e h s 2 (2=n) 1=2 ((n 1)=2)(s 2 ) 3=2 n 1 2 exp n ( s2 2 1) (x ) 2 2 i n=2 : 10

12 Let I 1 be the rst part and I 2 be the part inside the brackets. To prove (4), x a compact set K. Let a = inffs 2 : (x; s 2 ) 2 Kg > 0 and b = supfs 2 : (x; s 2 ) 2 Kg < 1. Also we consider 0 < < min(a=3; 1). Then, Stirling s formula gives I 1 =n! 1=[(2s 2 ) 3=2 ] 1=[(2a) 3=2 ]. For (x; s 2 ) 2 K and jj(x; s 2 ) (; 2 )jj >, we have jx j > =2 or js 2 2 j > =2. Then there exists > 0 such that js 2 = 2 1j >. Since ve (v 1)(n 1)=n is unimodal having peak e 1=n n=(n 1) at v = n=(n 1), an upper bound of I 2 is obtained at s 2 = 2 = 1 + or s 2 = 2 = 1 provided by n > 1 + 1=. So I 2 max((1 ) exp(n=(n 1)); (1 + ) exp( n=(n 1))) < 1. If js 2 = 2 1j, then jx j > =2. Thus (x ) 2 = 2 = (x ) 2 (s 2 = 2 )=s 2 (=2) 2 (1 )=b. So I 2 e 1=n (n=(n 1)) exp( 2 (1 )=4b) < 1 when n > (1 + exp( 2 (1 )=4b)) 1. Hence (4) holds. The integration range jj(x; s 2 ) (; 2 )jj < contains jx j < =2 and js 2 2 j < =2. Again, this region contains jx j= < 2 = =( + 2b). Then, Z Z I 3 = f ; 2(x; s 2 )I(jj(x; s 2 ) (; 2 )jj < ) d d 2 Z Z (2 2 =n) 1=2 exp( (n=2 2 )(x ) 2 )I(jx j= < 2 ) ((n 1)=2) (n 1)=2 (n 1)=2 1 s e ((n 1)=2)( 2 ) (n 1)=2 (n 1)s 2 =2 2 I(js 2 2 j < =2) d d 2 Using v = p n(x )=, this integral can be separated into two parts, let I 4 and I 5 be the two integrals. Then, I 4 = ( p n 2 ) ( p n 2 )! 1 as n! 1; and I 5! 1 as n! 1 by Lemma 2 for 0 = 1=2. So I 3! 1 and (5) holds. Finally, Theorem 1 (ii) holds by Theorem 4. Proof of Theorem 4 Suppose (a) holds. If > 0 and K is compact, then R f ;Tn (t)()i(jjt jj ) d R c1 e c2n I(jjt jj ) (d) c 1 e c2n! 0 uniformly in t 2 K: So if (a) holds, then j R f ;Tn (t)()i(jjt jj < ) d (t)j! 0 uniformly for t 2 K: The set K = f : jj 0 jj for some 0 2 Kg is also compact when is small enough. Note that, for given ; the convergence in (5) follows whenever this convergence holds for a smaller value of : Let be a continuous density that is constant and positive on K ; then (b) follows. 11

13 Now suppose (b) holds, has a continuous density, K is a compact subset of and > 0. Then K is also compact for is small enough. Since is uniformly continuous on K ; there exists 0 > 0 such that j( 1 ) ( 2 )j < =4 whenever 1 ; 2 2 K and jj 1 2 jj < 0. From (b), there exists L 1 > 0 such that j R f ;Tn (t)i(jjt jj < ) d 1j < =(4 sup t2k (t)) for all n L 1 and t 2 K where = min( 0 ; ): Also, there exist c 1, c 2 > 0 and L 2 > 0 such that f ;Tn (t) c 1 e c2n whenever t 2 K; jjt jj ; n L 2 : Therefore, there exists L 3 such that R f ;Tn (t)()i(jjt jj ) d c 1 e c2n =4 for all n L 3. Finally, for n L = max(l 1 ; L 2 ; L 3 ) and t 2 K; we have that jm Tn (t) (t)j R f ;Tn (t)()i(jjt jj ) d + R f ;Tn (t)j() (t)ji(jjt jj < ) d + (t)j R f ;Tn (t)i(jjt jj < ) d 1j =4 + (=4)(1 + =4) + =4 < and we see that (a) holds. Proof of Lemma 5 Since K is compact, there is a > 0 such that 0 < a t 1 a < 1 for all t 2 K. For t 2 K; Stirling s formula implies that log nt n = 2 1 log 2nt(1 t) n(t log t + (1 t) log(1 t)) + r(n; t) where r(n; t) < (12n) 1 (12nt + 1) 1 (12n(1 t) + 1) 1 < 1. So we have that f ;Tn (t) = (2t(1 t)=n) 1=2 exp(r(n; t)) exp(ng(; t)) where g(; t) = t log(=t) + (1 t) log((1 )=(1 t)): Let 0 < < a=2: Note g(; t) has maximum value 0 at = t. Therefore, since g(; t) is continuous when jt j > ; we have that b = sup t;:jt j> g(; t) > 0: Also there is N 1 so that n 1=2 e bn=2 1 for n N 1. So, when t 2 K, jt j > and n N 1, f ;Tn (t) (2a(1 a)) 1=2 e exp( nb=2). Hence (4) holds. Let I 0 = R 1 0 f cont ; (t)i(jt j < ) d. Then, I 0 R 1 0 f cont ; (t) d = n n nt R 1 0 nt (1 ) n(1 t) d = n=(n + 1). Thus, using (4), we get I 0 = n n + 1 Z 1 0 f cont ; (t)i(jt j > ) d n n + 1 c 1 e c2n! 1: Hence, (5) holds. So Theorem 1 (ii) holds by Theorem 4. 12

14 References Box, G. E. P., Sampling and Bayes inference in scienti c modelling and robustness. J. Roy. Statist. Soc. Ser. A 143 (4), , with discussion. Chen, C. F., On asymptotic normality of limiting density functions with Bayesian implications. J. Roy. Statist. Soc. Ser. B 47 (3), Evans, M., Jang, G. H., Invariant P -values for model checking. Ann. Statist. 38 (1), Evans, M., Moshonov, H., Checking for prior-data con ict. Bayesian Anal. 1 (4), Evans, M., Moshonov, H., Checking for prior-data con ict with hierarchically speci ed priors. In: Upadhyay, A., Singh, U., Dey, D. (Eds.), Bayesian Statistics and its Applications. Anamaya Publishers, New Delhi, pp Heyde, C. C., Johnstone, I. M., On asymptotic posterior normality for stochastic processes. J. Roy. Statist. Soc. Ser. B 41 (2), Walker, A. M., On the asymptotic behaviour of posterior distributions. J. Roy. Statist. Soc. Ser. B 31,

Measuring Statistical Evidence Using Relative Belief

Measuring Statistical Evidence Using Relative Belief Michael Evans Department of Statistics University of Toronto Abstract: A fundamental concern of a theory of statistical inference is how one should