Completeness A minimal sufficient statistic achieves the maximum amount of data reduction while retaining all the information the sample has concerning θ. On the other hand, the distribution of an ancillary statistic doesn t depend on θ at all. It may seem that minimal sufficient and ancillary statistics must be functionally independent, but this isn t necessarily the case.
We saw, for example, that for the Unif θθ+ distribution, the sample (, ) range R= X( n) X is ancillary, yet (, ) RX is minimal sufficient. For many important situations, however, our intuition is correct. Completeness is the needed condition for the minimal sufficient and ancillary statistics to be independent. The property of completeness will also be used to construct good estimators.
Definition: Let f ( tθ ) be a family of distributions for a statistic T ( X ). The family is called complete if implies that EgT θ = 0 for all θ ( g T ) Pθ = 0 = for all θ. 3
Completeness is a property of an entire family of probability distributions, indexed by θ. For example, if X has a N ( 0,) distribution, E( X ) = 0, but for the function g( x) = x, Pr( X = 0) = 0, not. However, this is one particular distribution, and not a family over values of θ. We will see that in this case no function of X has mean zero for all real θ unless that function is zero with probability. 4
For a complete family (a complete statistic T ), no non-constant function of T can have an expectation that doesn t depend on θ. (Show why this is true.) 5
Example Let X,, Xn Bernoulli p. We know that T( X ) = Xi is sufficient for the parameter p (0< p < ). We claim be i.i.d. that it is complete as well. Now T ( X) Bin( n, p). We need to show that there does not exists a gt p = for all p except gt = with probability one. such that E gt 0 if 0 Suppose that E gt 0 p =. Then 6
n n 0= Ep gt = gt p p t= 0 t t n n n p = p g( t) t= 0 t p. This implies that t n n p t= 0 t = p p or g( t) 0 for all ( 0,) n n t= 0 t p = satisfies 0 < r <. p t g t r = 0 for all r, where r t n t 7
Then the nth degree polynomial in r above is zero for all r, and thus all of its n coefficients are zero. Since 0 t, g t =, t = 0,,, n. 0 Thus T is complete. 8
Theorem: Let X,, Xn be i.i.d. observations from an exponential family given by f xθ hxcθ w θ t x. j= k = exp j j where = ( θ θ ) θ. Then,, k i= i i= k i n n ( X),, T t X t X = is complete if contains an open set in { w θ w θ θ Θ},, k : k. (The range as θ varies is k -dimensional.) 9
Let X,, Xn Example be i.i.d. (, ) N µσ. x µ µ f ( xθ ) = ( πσ ) exp + x σ σ σ w ( µσ, ) (, ) w µσ / µ σ t x = x =, σ t x = x =,. Since < µ <, 0 < σ < we have < < and < w < 0. As the w range of w ( ) ( ),, w, values of (, ) µσ µσ over µσ contains an open set in, ( T x, ) i xi = X is a complete sufficient statistic. 0
Let X,, Xn Example be i.i.d. (, ) N θθ. x x f ( xθ ) = ( πθ ) exp + θ θ w θ w θ θ =, / t x = x θ t x = x =,. Since w θ = w θ, the parameter space is on a parabola, which is not an open set on. Hence (, ) i i T X x x is not complete in this case. =
Note that E( X) = θ, and by Cochran s theorem, the square of n s/ θ has a chi-square distribution with n degrees of freedom. It can be shown that E ( cs ) where c = ( cs ) 0 E X = θ for all θ, n n Γ, and thus n Γ =. (We have produced a function of T with mean zero, yet the function isn t identically zero.)
Example Let X,, Xn be i.i.d. (, ) x θ f ( xθ ) = ( πθ ) exp + x θ w ( θ ) θ w θ / =, < <, and 0 sufficient statistic. t x x N θθ, θ > 0.. Here =, and x i is a complete 3
Let X,, Xn f Example be i.i.d. U ( 0, ) = < θ ( θ) ( θ) I X n ( n) θ. Then x and ( n) sufficient. T X = X is Let g( t ) be a function such that E g T = for all θ. To find E g( T) ( ) 0 we compute g ( t ) f θ X t dt. n 0 θ n t F t = P X t = P( X t) n =, θ Now, X ( n) 0 < t < θ, so that 0 < t < θ. f t nt θ X n =, n / ( n) n 4
Then θ θ n n n n E g T g t nt dt n g t t dt ( ) 0 = θ = θ = θ θ 0 n 0 0 g t t dt = 0 for all θ. Differentiating with respect to θ on both n g θ θ = 0 sides of the last equation, for all θ > 0 (Fundamental Theorem of Calculus), which implies that g ( θ ) = 0 for all θ > 0. Hence T = X( n) X is a complete sufficient statistic. (Technically, this only applies to Riemann integrable functions.) 5
Some facts:. If T is complete and S ( T) = Ψ is a function of T, then S is also complete. Proof: If not, then E g( S) 0 but = θ for all θ, g S is not zero with probability. ( Ψ ) = 0 But then E g ( T) θ for all θ, but g T is not zero with probability, * where g* g = Ψ. Thus T cannot be complete, a contradiction. 6
. An ancillary statistic cannot be complete. Proof: If S is ancillary then its distribution does not depend on θ. So E g( S) = E θ g( S) = C in θ. So * E g* ( S) = 0 is constant g S = g S C has for all θ. Thus S is not complete. 7
3. If a function of T is ancillary, then T cannot be complete. Proof: Suppose that Ψ ( T ) is ancillary, whereas T is complete. Completeness of Ψ must be complete T implies that ( T ) as well. Since an ancillary statistic cannot be complete, we have a contradiction. 8
Note: This implies that no complete sufficient statistic exists for the U θθ+ family. This follows since (, ) =, n T X X X X is minimal sufficient, and thus a function of every sufficient statistic, while the function of T, X X n, is ancillary. (T and X X n are a function of every sufficient statistic, yet X X n is ancillary, so no sufficient statistic is complete.) 9
4. An estimator g( T ) for a parameter Ψ ( θ ) (based on T only) is called unbiased if E g( T) = Ψ θ for all θ. θ If T is complete then only one unbiased Ψ based on T is estimator of ( θ ) possible. Proof: If there were two unbiased estimators based on T, then E gt = E ht = Ψ, but then ( ) θ θ θ ( *) θ = 0 for g* ( t) gt ht E g T =. In Chapter 7, we will show that this estimator based on a complete sufficient statistic is the best possible unbiased estimator in a certain sense. 0