Lecture 5. i=1 xi. Ω h(x,y)f X Θ(y θ)µ Θ (dθ) = dµ Θ X

Size: px

Start display at page:

Download "Lecture 5. i=1 xi. Ω h(x,y)f X Θ(y θ)µ Θ (dθ) = dµ Θ X"

Imogen Norton
6 years ago
Views:

1 LECURE NOES 25 Lecture 5 9. Minimal sufficient and complete statistics We introduced the notion of sufficient statistics in order to have a function of the data that contains all information about the parameter. However, a sufficient statistic does not have to be any simpler than the data itself. As we have seen, the identity function is a sufficient statistic so this choice does not simplify or summarize anything. A statistic is said to be minimal sufficient if it is as simple as possible in a certain sense. Here is a definition. Definition 11. A sufficient statistic : X is minimal sufficient if for any sufficient statistic U : X U there is a measurable function g : U such that = g(u) µ X Θ ( θ)-a.s. for all θ Ω. How do we check if a statistic is minimal sufficient? It can be inconvenient to check the condition in the definition for all sufficient statistics U. heorem 10. If there exist version of f X Θ (x θ) for each θ and a measurable function : X such that (x) = (y) y D(x), where D(x) = {y X : f X Θ (y θ) = f X Θ (x θ)h(x,y), θ and some function h(x,y) > 0}, then is a minimal sufficient statistic. Example 14. Let {X n } be IID Exp(θ) given Θ = θ and X = (X 1,...,X n ). Put (x) = x 1 + +x n. Let us show is minimal sufficient. he ratio n i=1 xi f X Θ (x θ) f X Θ (y θ) = θn e θ θ n e θ n i=1 yi does not depend on θ if and only if n i=1 x i = n i=1 y i. In this case h(x,y) = 1, D(x) = {y : n i=1 x i = n i=1 y i}, and is minimal sufficient. Proof. Note first that the sets D(x) form a partition of X. Indeed, by putting h(y,x) = 1/h(x,y) we see that y D(x) implies x D(y). Similarly, taking h(x,x) = 1, we see that x D(x) and hence, the different D(x) form a partition. he condition says that the sets D(x) coincide with sets 1 {(x)} and hence D(x) B for each x. By Bayes theorem we have, for y D(x), dµ Θ X f X Θ (x θ) (θ x) = dµ Θ Ω f X Θ(x θ)µ Θ (dθ) = h(x,y)f X Θ(y θ) Ω h(x,y)f X Θ(y θ)µ Θ (dθ) = dµ Θ X (θ y). dµ Θ hat is, the posterior density is constant on D(x). Hence, it is a function of (x) and by Lemma 1 is sufficient. Let us check that is also minimal. ake U : X U to be a sufficient statistic. If we show that U(x) = U(y) implies y D(x), then it follows that U(x) = U(y) implies (x) = (y) and hence that is a function of U(x). hen is minimal. By the factorization theorem (heorem 2, Lecture 6) f X Θ (x θ) = h(x)g(θ,u(x)). We can assume that h(x) > 0 because P θ ({x : h(x) = 0}) = 0. Hence, U(x) = U(y) implies f X Θ (y θ) = h(y) h(x) g(θ,u(x)). hat is, y D(x) with h(x,y) = h(y)/h(x).

2 26 HENRIK HUL he next concept is that of a complete statistic. Definition 12. Let : X be a statistic and {µ Θ ( θ),θ Ω} the family of conditional distributions of (X) given Θ = θ. he family {µ Θ ( θ),θ Ω} is said to be complete if for each measurable function g, E θ [g()] = 0, θ implies P θ (g() = 0) = 1, θ. he family {µ Θ ( θ),θ Ω} is said to be boundedly complete if each bounded measurable function g, E θ [g()] = 0, θ implies P θ (g() = 0) = 1, θ. A statistic is said to be complete if the family {µ Θ ( θ),θ Ω} is complete. A statistic is said to be boundedly complete if the family {µ Θ ( θ),θ Ω} is boundedly complete. Oneshouldnotethatcompletenessisastatementabouttheentirefamily{µ Θ ( θ),θ Ω} and not only about the individual conditional distributions µ Θ ( θ). Example 15. Suppose that has Bin(n,θ) distribution with θ (0,1) and g is a function such that E θ [g()] = 0 θ. hen n ( ) n n ( ) n ( θ ) k. 0 = E θ [g()] = g(k) θ k (1 θ) n k = (1 θ) n g(k) k k 1 θ k=0 If we put r = θ/(1 θ) we see that this equals n ( ) n (1 θ) n g(k) r k k k=0 which is a polynomial in r of degree n. Since this is constant equal to 0 for all r > 0 it must be that g(k) ( n k) = 0 for eachk = 0,...,n, i.e. g(k) = 0 for eachk = 0,...,n. Since, for each θ, is supported on {0,...,n} it follows that P θ (g() = 0) = 1 θ so is complete. k=0 An important result for exponential families is the following. heorem 11. If the natural parameter space Ω of an exponential family contains an open set in R k, then (X) is a complete sufficient statistic. Proof. We will give a proof for k = 1. For larger k one can use induction. We know that the natural statistic has a density c(θ)e θt with respect to ν (see Section 4.2, Lecture 4). Let g be a measurable function such that E θ [g()] = 0 for all θ. hat is, g(t)c(θ)e θt ν (dt) = 0 θ. If we write g + and g for the positive and negative part of g, respectively, then this says g + (t)c(θ)e θt ν (dt) = g (t)c(θ)e θt ν (dt) θ. (9.1) ake a fixed value θ 0 in the interior of Ω. his is possible since Ω contains an open set. Put Z 0 = g + (t)c(θ 0 )e θ0t ν (dt) = g (t)c(θ 0 )e θ0t ν (dt)

3 LECURE NOES 27 and define the probability measures P and Q by P(C) = Z0 1 g + (t)c(θ 0 )e θ0t ν (dt) C Q(C) = Z0 1 g (t)c(θ 0 )e θ0t ν (dt). hen, the equality (9.1) can be written exp{t(θ θ 0 )}P(dt) = C exp{t(θ θ 0 }Q(dt), θ. With u = θ θ 0 we see that this implies that the moment generating function of P, M P (u), equals the mgf of Q, M Q (u) in a neighborhood of u = 0. Hence, by uniqueness ofthe moment generatingfunction P = Q. It follows that g + (t) = g (t) ν -a.e. and hence that µ Θ{t : g(t) = 0 θ} = 1 for all θ. Hence, is complete sufficient statistic. Completeness of a statistic is also related to minimal sufficiency. heorem 12 (Bahadur s theorem). If is a finite-dimensional boundedly complete sufficient statistic, then it is minimal sufficient. Proof. Let U be an arbitrary sufficient statistic. We will show that is a function of U by constructing the appropriate function. Put = ( 1 (X),..., k (X)) and S i () = [1+e i ] 1 so that S i is bounded and bijective. Let X i (u) = E θ [S i () U = u], Y i (t) = E θ [X i (U) = t]. We want to show that S i () = X i (U) P θ -a.s. for all θ. hen, since S i is bijective we have i = S 1 i (X i (U)) and the claim follows. We show S i () = X i (U) P θ -a.s. in two steps. First step: S i () = Y i () P θ -a.s. for all θ. o see this note that E θ [Y i ()] = E θ [E θ [X i (U) ]] = E θ [X i (U)] = E θ [E θ [S i () U]] = E θ [S i ()]. Hence, forallθ, E θ [Y i () S i ()] = 0andsinceS i is bounded, soisy i andbounded completeness implies P θ (S i () = Y i ()) = 1 for all θ. Second step: X i (U) = Y i () P θ -a.s. for all θ. By step one we have E θ [Y i () U] = X i (U) P θ -a.s. So if we show that the conditional variance of Y i () given U is zero we are done. hat is, we need to show Var θ (Y i () U) = 0 P θ -a.s. By the usual rule for conditional variance (heorem B.78 p. 634) Var θ (Y i ()) = E θ [Var θ (Y i () U)]+Var θ (X i (U)) = E θ [Var θ (Y i () U)]+E θ [Var θ (X i (U) )]+Var θ (S i ()). By step one Var θ (Y i ()) = Var θ (S i ()) and E θ [Var θ (X i (U) )] = 0 since X i (U) is known if is known. Combining this we see that Var θ (Y i () U) = 0 P θ -a.s. as we wanted. 10. Ancillary statistics As we have seen a sufficient statistic contains all the information about the parameter. he opposite is when a statistic does not contain any information about the parameter.

4 28 HENRIK HUL Definition 13. A statistic U : X U is called ancillary if the conditional distribution of U given Θ = θ is the same for all θ. Example 16. Let X 1 and X 2 be conditionally independent N(θ,1) distributed given Θ = θ. hen U = X 2 X 1 is ancillary. Indeed, U has N(0,2) distribution, which does not depend on θ. Sometimes a statistic contains a coordinate that is ancillary. Definition 14. If = ( 1, 2 ) is a sufficient statistic and 2 is ancillary, then 1 is called conditionally sufficient given 2. Example 17. Let X = (X 1,...,X n ) be conditionally IID U(θ 1/2,θ+1/2)given Θ = θ. hen f X Θ (x θ) = n I [θ 1/2,θ+1/2] (x i ) = I [θ 1/2, ) (minx i )I (,θ+1/2] (maxx i ). i=1 = ( 1, 2 ) = (maxx i,maxx i minx i ) is minimal sufficient and 2 is ancillary. Note that f X θ (y θ) = f X θ (x θ) maxx i = maxy i and minx i = miny i (x) = (y). Hence, by heorem 10 Lecture 7, is minimal sufficient. he conditional density of ( 1, 2 ) given Θ = θ can be computed as (do this as an exercise) f 1, 2 Θ(t 1,t 2 θ) = n(n 1)t n 2 2 I [0,1] (t 2 )I [θ 1/2+t2,θ+1/2](t 1 ) In particular, the marginal density of 2 is f 2 Θ(t 2 θ) = n(n 1)t n 2 2 (1 t 2 ) and this does not depend on θ. Hence 2 is ancillary. Note that the conditional distribution of 1 given 2 = t 2 and Θ = θ is f 1 2,Θ(t 1 t 2,θ) = 1 (1 t 2 ) I [θ 1/2+t 2,θ+1/2](t 1 ). hat is, it is U(θ 1/2 + t 2,θ + 1/2). Hence, this distribution becomes more concentrated as t 2 becomes large. Although 2 does not tell us something about the parameter, it tells us something about the conditional distribution of 1 given Θ. he usual rule in classical statistics is to (whenever it is possible) perform inference conditional on an ancillary statistic. In our example we can exemplify it. Example 18 (continued). Consider the above example with n = 2 and consider finding a 50% confidence interval for Θ. he naive way to do it is to consider the interval I 1 = [minx i,maxx i ] = [ 1 2, 1 ]. his interval satisfies P θ (Θ I 1 ) = 1/2 since there is probability 1/4 that both observations are above θ and probability 1/4 that both are below θ.

5 LECURE NOES 29 If one performs the inference conditional on the ancillary 2 we get a very different result. We can compute P θ ( 1 2 Θ 1 2 ) = P θ (Θ 1 Θ+ 2 2 = t 2 ) = 1 1 t 2 θ+t2 = t 2 1 t 2 I [0,1/2) (t 2 ). θ I [θ 1/2+t2,θ+1/2](t 1 )dt 1 Hence, the level of confidence depends on t 2. In particular, we can construct an interval I 2 = [ 1 1/4(1+ 2 ), 1 +1/4 3 2 /4] which has the property Indeed, P θ (Θ I 2 2 = t 2 ) = 1/2. P θ (Θ I 2 2 = t 2 ) = P θ (Θ 1/4+3 2 /4 1 Θ+1/4(1+ 2 ) 2 = t 2 ) = θ+(1+t2)/4 θ 1/4+3t 2/4 I [θ 1/2+t2,θ+1/2](t 1 )dt 1 = 1/2. Since this probability does not depend on t 2 it follows that P θ (Θ I 2 ) = 1/2. Let us compare the properties of I 1 and I 2. Suppose we observe 2 small. his does not give us much information about Θ and this is reflected in I 2 being wide. On the contrary, I 1 is very small which is counterintuitive. Similarly, if we observe 2 large, then we know more about Θ and I 2 is short. However, this time I 1 is wide! Suppose is sufficient and U is ancillary and they are conditionally independent given Θ = θ. hen there is no benefit of conditioning on U. Indeed, in this case f U,Θ (t u,θ) = f Θ (t θ) so conditioning on U does not change anything. his situation appear when there is a boundedly complete sufficient statistic. heorem 13 (Basu s theorem). If is boundedly complete sufficient statistic and U is ancillary, then and U are conditionally independent given Θ = θ. Furthermore, for every prior µ Θ they are independent (unconditionally). Proof. For the first claim (to show conditional independence) we want to show that for each measurable set A U µ U Θ (A) = µ U,Θ (A t,θ) µ Θ ( θ) a.e. t, θ. (10.1) Since U is ancillary µ U Θ (A θ) = µ U (A), θ. We also have µ U Θ (A θ) = µ U,Θ (A t,θ)µ Θ (dt θ) = µ U (A t)µ Θ (dt θ), where the second equality follows since is sufficient. Indeed, µ X,Θ (B t,θ) = µ X (B t) and since U = U(X) µ U,Θ (A t,θ) = µ X,Θ (U 1 A t,θ) = µ X (U 1 A t) = µ U (A t).

6 30 HENRIK HUL Combining these two we get [µ U (A) µ U (A t)]µ Θ (dt θ) = 0. By considering the integrand as a function g(t) we see that the above equation is the same as E θ [g()] = 0 for each θ and since is boundedly complete µ Θ ({t : g(t) = 0} θ) = 1 for all θ. hat is (10.1) holds. For the second claim we have by conditional independence that µ U, (A B) = µ U (A t)µ Θ (dt θ)µ Θ (dθ) Ω B = µ U (A)µ Θ (B θ)µ Θ (dθ) so and U are independent. Ω = µ U (A)µ (B) Sometimes a combination of the recent results are useful for computing expected values in an unusual way: Example 19. Let X = (X 1,...,X n ) be conditionally IID Exp(θ) given Θ = θ. Consider computing the expected value of X n g(x) =. X 1 + +X n o do this, note that g(x) is an ancillary statistic. Indeed, if Z = (Z 1,...,Z n ) are IID Exp(1) then X = d θ 1 Z and we see that ( 1 P θ (g(x) x) = P θ x < X X ) n 1 +1 X n X n ( 1 = P θ x < Z Z ) n 1 +1 Z n Z n Since the distribution of Z does not depend on θ we see that g(x) is ancillary. he natural statistic (X) = X X n is complete (by the heorem just proved) and minimal sufficient. By Basu s theorem (heorem 13) (X) and g(x) are independent. Hence, θ = E θ [X n ] = E θ [(X)g(X)] = E θ [(X)]E θ [g(x)] = nθe θ [g(x)] and we see that E θ [g(x)] = n 1.

Lecture 4. f X T, (x t, ) = f X,T (x, t ) f T (t )

Lecture 4. f X T, (x t, ) = f X,T (x, t ) f T (t ) LECURE NOES 21 Lecture 4 7. Sufficient statistics Consider the usual statistical setup: the data is X and the paramter is. o gain information about the parameter we study various functions of the data