March 10, 2017 THE EXPONENTIAL CLASS OF DISTRIBUTIONS Abstract. We will introduce a class of distributions that will contain many of the discrete and continuous we are familiar with. This class will help to explain why the sample sum is often a sufficient and complete statistic. 1. Introduction Let Θ R be an open interval that is possibly infinite. We say that a family of pdfs {f θ } θ Θ is of exponential class if f(x; θ) = h(x) exp(η(θ)k(x) A(θ)) for some functions η and A which depend on θ and some functions k and h 0 which only depend on x and not on θ. Thus we are assuming that the support of f θ does not depend on θ and the family has common support S. Notice that since f is a pdf, if it is a density for a continuous random variable, then right away we know that ( ) A(θ) = log h(x) exp(η(θ)k(x))dx, and similarly in the discrete case ( ) A(θ) = log h(x) exp(η(θ)k(x). By taking and x s(x) := exp(log(h(x))) q(θ) := A(θ) we can also write as some other texts: ( ) f(x; θ) = exp η(θ)k(x) + s(x) + q(θ) 1[x S]. Furthermore, we say that an exponential family is regular if η is a non-constant continuous function of θ; in the continuous case, we also require that k (x) is not identically zero and h to be a continuous function; in the discrete case, we require that k(x) is not a constant function. Let us remark that if k is the constant function, then exp(η(θ)k(x) A(θ)) is equal to a constant for all θ, and there is really only one pdf in the family. Many familiar families pdfs both discrete and continuous are of (regular) exponential class.
2 THE EXPONENTIAL CLASS OF DISTRIBUTIONS Exercise 1. Fix an integer n 1. Let p (0, 1). Show that Binomial family given by ( ) n f p (x) = p x (1 p) n x x is of exponential class Solution. Write ( ) n f p (x) = exp ( x log(p) + (n x) log(1 p) ) x ( ) n ( ) = exp log( p 1 p x )x + n log(1 p) Take h(x) = ( ) n x, η(p) = log( p ), k(x) = x, and A(θ) = n log(1 p). 1 p Recall that a continuous random variable X is said to have a gamma distribution with parameters α > 0 and β > 0 if it has a pdf given by { 1 β f(x; α, β) = α Γ(α) xα 1 e x/β if x > 0, 0 otherwise; Exercise 2. Fix α > 0. Show that the family of gamma distributions given by {f( ; α, β)} β>0 is of exponential class. Solution. Set h(x) = 1 Γ(α) xα 1 1[x > 0], η(β) = 1/β, k(x) = x, and A(β) = α log(β). Theorem 3. Let X = (X 1,..., X n ) be a random sample from a family of regular exponential class, where Then the sum given by f(x 1 ; θ) = h(x 1 ) exp(η(θ)k(x 1 ) A(θ)). T = t(x) := k(x i ) is a sufficient and complete statistic; in particular, the family of pdfs corresponding to T is also of regular exponential class. Proof of Theorem 3 (sufficiency). Observe that ( ) n L(x; θ) = exp η(θ)t(x) na(θ) h(x i ). Hence set ( ) g(t(x); θ) := exp η(θ)t(x) na(θ)
and THE EXPONENTIAL CLASS OF DISTRIBUTIONS 3 H(x) := n h(x i ) and the result follows from the Neyman factorization theorem. Proposition 4. Let X be a random variable with pdf from a regular exponential class given by f(x; θ) = h(x) exp(η(θ)k(x) A(θ)). Then provided all the derivatives exists, and E θ k(x) = A (θ)/η (θ) Var θ (k(x)) = 1 η (θ) 3 (η (θ)a (θ) A (θ)η (θ)). Proof of Proposition 4 (continuous case). We use the same trick as we did with Fisher information, differentiate, with respect to θ, the identity 1 = f(x; θ)dx. (1) We can bring the derivative inside the integral since the support of f is independent of θ. This gives 0 = f(x; θ) [ η (θ)k(x) A (θ) ] dx. (2) Note that E θ k(x) = k(x)f(x; θ)dx. Some rearranging, and using (1) gives the desired result for the expectation. For the variance, we differentiate the identity (2) with respect to θ, and obtain 0 = f(x; θ) [ η (θ)k(x) A (θ) ] + f(x; θ) [ η (θ)k(x) A (θ) ] 2 dx We recognize from (2) that so that 0 = = E θ [η (θ)k(x) A (θ)] = 0, f(x; θ) [ η (θ)k(x) A (θ) ] dx + Var θ [η (θ)k(x) A (θ)] f(x; θ) [ η (θ)k(x) A (θ) ] dx + η (θ) 2 Var θ k(x).
4 THE EXPONENTIAL CLASS OF DISTRIBUTIONS Some algebra and the previous identity give η (θ) 2 Var θ k(x) = A (θ) η (θ)e θ k(x) = A (θ) η (θ) A (θ) η (θ), from which the desired result follows. Proposition 5 (Additive property). Let X = (X 1,..., X n ) be a random sample from a family of regular exponential class, where Let f(x 1 ; θ) = h(x 1 ) exp(η(θ)k(x 1 ) A(θ)). T = t(x) := Then the pdf of T has the form k(x i ). g(t; θ) = r(t) exp ( η(θ)t na(θ) ), where r(t) does not depend on θ, so that in particular, the family of pdfs corresponding to T is of regular exponential class. Proof of Proposition 5 (discrete case). Suppose x is such that t(x) = t, then n n P(X = x) = f(x i ; θ) = exp[η(θ)t na(θ)] h(x i ). For each t, let S t := {x : t(x) = t}. Set We have that r(t) = x S t n h(x i ). P(T = t) = r(t) exp[η(θ)t na(θ)]. The proof of Proposition 5 in the continuous case is a bit harder. Proposition 6. A family of pdfs that is of regular exponential class of the form: f(x; θ) = h(x) exp(η(θ)x A(θ)) is complete. Thus in Proposition 6 we require that k(x) = x.
THE EXPONENTIAL CLASS OF DISTRIBUTIONS 5 Sketch Proof of Proposition 6 (continuous case). Suppose that family is given by f(x; θ) = h(x) exp(η(θ)x A(θ)) where θ Θ. Let u be so that u(x)h(x) exp(η(θ)x A(θ))dx = 0 for all θ Θ. Let v(x) = u(x)h(x), then we have that v(x) exp(η(θ)x) = 0 for all θ Θ. Recall that η is a continuous function, and as a consequence of the intermediate value property, continuous functions map closed intervals to closed intervals. Since η is not a constant, we have that ˆv(s) := v(x) exp(sx) = 0 for all s [a, b], where a < b. Here, we appeal to the theory of Laplace transforms to obtain that v = 0; since h is non-zero on the support of f, we can conclude that u = 0. Proof of Theorem 3. We already proved the sufficiency of T, the rest follow from Propositions 5 and 6, and the fact that the pdf of T has the form required by Proposition 6. 2. An example from the Beta family Exercise 7 (Beta(θ, 1)). Let X = (X 1,..., X n ) be a random sample, where X 1 has pdf f(x 1 ; θ) := 1[x 1 (0, 1)]θx θ 1 1, where θ > 0. Show that G := (X 1 X 2 X n ) 1/n is a complete and sufficient statistic for θ. Solution. We can rewrite f as f(x 1 ; θ) = 1[x 1 (0, 1)] exp[(θ 1) log(x 1 ) + log(x 1 )]. Thus we have η(θ) = θ 1 and k(x 1 ) = log(x 1 ), and by Theorem 3, we know that T = log(x i ) is sufficient and complete. Recall that any 1-1 function of a sufficient statistic is again sufficient, so G = exp(t/n) is also sufficient. It is also clear that 1-1 function of a complete statistic must also be complete.
6 THE EXPONENTIAL CLASS OF DISTRIBUTIONS Exercise 8. Referring to Exercise 7, show that the mle for θ is given by n Z := n log(x i). Exercise 9. referring to Exercise 8, let Y i := log(x i ). Show that Y i Γ(1, 1/θ). Exercise 10. Show that EZ = θ( n n 1 ). Exercise 11. Referring to Exercise 7, find the MVUE for θ. Exercise 12. Let W Γ(α, β). Let a > 0. Assume that α > a. Show that EW a Γ(α a) = β a Γ(α). Exercise 13. Show that the variance of the MVUE in Exercise 11 is θ 2 /(n 2). Exercise 14. Referring to Exercise 7, show that Fisher information of X 1 is given by I(θ) = θ 2. Thus the MVUE is not efficient. 3. Another example Exercise 15. Let X = (X 1,..., X n ) be a random sample from the distribution θ f θ (x 1 ) = (1 + x 1 ) 1[x 1+θ 1 > 0]. (a) Find the Cramer-Rao lower bound for an unbiased estimator of θ. (b) Find the Cramer-Rao lower bound for an unbiased estimator of 1/θ. (c) Find the mle for θ. (d) Show that there is an efficient estimator of 1/θ. (e) Show that there does not exist an efficient estimator for θ. Solution. (a) We have that for x 1 > 0, l(x 1 ; θ) = log(θ) (1 + θ) log(1 + x 1 ); l (x 1 ; θ) = 1/θ log(1 + x 1 ); l (x 1 ; θ) = 1/θ 2. Hence I(θ) = 1/θ 2 and if Y is an unbiased estimator for θ, then Var θ (Y ) θ2 n. (b) If Z is an unbiased estimator for g(θ) = 1/θ, then we have that Var θ (Z) (g (θ)) 2 ni(θ) = 1 nθ 2.
THE EXPONENTIAL CLASS OF DISTRIBUTIONS 7 (c) If x (0, ) n, we have that l(x; θ) = log(θ) (1 + θ) log(1 + x i ); (d) Let l (x; θ) = 1/θ log(1 + x i ); setting this to 0 and solving for θ, we obtain that mle is given by ( 1. W := n log(1 + X i )) W = log(1 + X i ). We want to compute the distribution for W. Let Y i := log(1+x i ). We have that P(Y 1 z) = P(X 1 e z 1) = e z 1 0 θ (1 + x 1 ) 1+θ dx 1. The fundamental theorem of calculus and the chain rule give that pdf for Y 1 is given by f Y1 (z) = θ e z+zθ ez = θe zθ ; thus Y i are independent exponential random variables with mean 1/θ and we know that W Γ(n, 1/θ). In particular, we have that E(W ) = n/θ and Var(W ) = nθ 2. Hence from the previous calculations, we know that W /n is an efficient estimator for 1/θ. Another way to proceeds would be to note that f θ (x 1 ) = θ (1 + x 1 ) 1[x 1+θ 1 > 0] = 1[x 1 > 0] exp[ (1 + θ) log(1 + x 1 ) + log(θ)]; hence we are dealing with a family of exponential class with: η(θ) = (1 + θ), k(x 1 ) = log(1 + x 1 ), and A(θ) = log(θ). By our theory, we know that W is a complete and sufficient statistic; in particular, we can compute its mean and variance of Y 1 using Proposition 4.
8 THE EXPONENTIAL CLASS OF DISTRIBUTIONS (e) By the Lehmann-Scheffe theorem, it suffices to find an unbiased estimator for θ that is a function of W, since our theory on the exponential family, gives that W is both sufficient and complete. By a previous calculation, in Section 2, we already know what to do: just consider Z = n 1 n W ; we know that Z is unbiased, and Var θ (Z) = θ2 n 2. End of Midterm 2 coverage