SUFFICIENT STATISTICS

Size: px

Start display at page:

Download "SUFFICIENT STATISTICS"

Willa Bradford
5 years ago
Views:

1 SUFFICIENT STATISTICS. Introduction Let X (X,..., X n ) be a random sample from f θ, where θ Θ is unknown. We are interested using X to estimate θ. In the simple case where X i Bern(p), we found that the sample mean was an efficient estimator for p. Thus, if we observe a finite sequence of coin flips, in order to have an efficient estimate of the probability, p, that heads occurs in a single flip, we need only count the number of times we see heads (and divide by the total number of flips), and we need not worry about the order in which the heads and tails occurred. Note that the sequences 000 and 000 lead to the same estimate, when we use the sample mean. In what follows, we want to study the following question: do we get any additional information about p by making use of the order in which the heads and tails occurred? The sample mean does not make use of the order, and it does give us an efficient estimator, so, in short the answer in this case is no. Thus in this example, it appears that we can greatly simplify and reduce the amount of data, without affecting our ability to find good estimators. 2. Sufficient statistics Let X (X,..., X n ) be a random sample from f θ, where θ Θ is unknown. Recall that a T is a statistic if T T (X) u(x) for some deterministic function u. We will assume that u does not depend on θ. Some examples of that you are familiar with are when T is the sample mean, the sample variance, and the maximum. Let us remark that although in many important examples, T is one dimensional point estimator, it need not be, for example, T (X) X is a statistic. We say that T is sufficient for θ if the conditional distribution of X given T does not depend on θ. In the case were the random variables involved are not discrete, even this definition requires somewhat advanced mathematics, since we might have that P(T t) 0, in which case it is not immediate how one can make sense of P(X T t) We will first discuss the discrete case, and then we will extend our discussion to the continuous case.

2 2 SUFFICIENT STATISTICS 3. The discrete case Exercise. Let X (X,..., X n ) be a random sample, where X i Bern(p). Show that the sample sum given by T X + + X n is a sufficient statistic for p. Solution. Let x {0, } n and t {0,,... n}. We need to show that P(X x T t) does not depend on p. In fact, you already did this computation in the first homework! By definition, P(X x T t) P(X x, T t). P(T t) We may assume that t t(x) x + +x n, otherwise, P(X x, T t) 0. Thus, {X x} {X x} {T t}, and P(X x, T t) P(X x). We have that P(X x) L(x; p) p x i ( p) x i p t ( p) n t We also know that T Bin(n, p), so that ( ) n P(T t) p t ( p) n t. t Hence we obtain that which does not depend on p. P(X x T t) ( n t), Exercise 2. Discuss why you should expect that the final answer we obtained in Exercise is P(X x T t) ( n t). In the discrete setting, we have that T is sufficient for θ if and only if for all x and t t(x), we have P(X x T t) P(X x, T t) P(T t) for some function of H(x) which does not depend on θ. P(X x) P(T t) H(x), Exercise 3. Let X (X,..., X n ) be a random sample, where X i is a discrete random variable that is uniformly distributed in {, 2,..., θ}. Show that M max {X,..., X n } is a sufficient statistic for θ.

3 Solution. Let m {, 2,..., θ}. Note that and Hence SUFFICIENT STATISTICS 3 {M m} n {X i m} {M m} {M m} \ {M m }. P(M m) θ n (mn (m ) n ). Let x {, 2,..., θ} n and m max {x,..., x n }. so we are done. P(X x M m) P(X x, M m) P(M m) P(X x) P(M m) θ n (m θ n (m ) n ) n m n (m ) n, Exercise 4. Let X (X,..., X n ) be a random sample, where X i is a Poisson random variable with mean λ. Show that the sample sum given by T X + + X n is a sufficient statistic for λ. In order to have some more examples to discuss, recall that X is a geometric random variable with parameter p (0, ), if P(X k) p( p) k, for k, 2,.... Thus X is the number of Bernoulli p trials required to get a success. Here, EX /p. Let us remark that sometimes geometric random variables are defined so that P(X k) p( p) k, for k 0,, 2,... ; in this case X is the number of fails before a success, and EX p/( p). Before we find a sufficient statistic for p, we do a couple of preliminary exercises. Exercise 5. Let X (X,..., X n ) be a random sample, where X i is a geometric random variable with parameter p (0, ) and mean /p. Show that the mle for p is given by / X.

4 4 SUFFICIENT STATISTICS Exercise 6. Referring to Exercise 5, let T X + + X n. Show that for k n, n +,..., we have ( ) k P(T k) p n ( p) k n. n Solution. Note that T is the number for trials required to get n success. By counting we obtain the required formula: the last kth trial is a success, and you are left with k trials, of which n of them must be successes. Exercise 7. Referring to Exercise 5, show that the sample sum given by T X + + X n is a sufficient statistic for p. Solution. Let x {, 2, 3, 4,...} n and t x + + x n. We have that P(X x T t) which does not depend on p. P(X x) P(T t) n p( p)x i ) pn ( p) t n ( t n ( t n ), 4. The continuous case In the continuous case, as in the case of likelihoods we work with the density functions instead of the probabilities directly. Let X (X,..., X n ) be a random sample from f θ, where θ Θ is unknown. Let T u(x) be a statistic with density function q(t). Then T is a sufficient statistic for θ if for all x and t t(x), we have L(x; θ) n q(t(x)) f(x i; θ) H(x), q(t(x)) for some function H which does not depend on θ. Exercise 8. Let X (X,..., X n ) be a random sample, where X i Unif(0, θ), where θ is unknown. Show that M max {X,..., X n } is a sufficient statistic for θ. Exercise 9. Let X (X,..., X n ) be a random sample, where X i N(µ, ), where µ is unknown. Show that the sample mean is a sufficient statistic for µ.

5 SUFFICIENT STATISTICS 5 Solution. Luckily, we know that the distribution for X; we have that X N(µ, /n). However, even with this piece of knowledge, this is a tricky exercise. First, we need the following observation. Note that (x i x) 0. Thus (x i µ) 2 (x i x + x µ) 2 ) ((x i x) 2 + 2(x i x)( x µ) + ( x µ) 2 ) ((x i x) 2 + ( x µ) 2 (x i x) 2 + n( x µ) 2 With this algebra in hand, we have that n 2π e (x i µ)2 2 n 2π e n( x µ)2 2 n (x i x) 2 n(2π) (n )/2 e 2, which does not depend on µ. 5. Fisher-Neyman factorization We saw in the previous exercises that proving that a statistic is sufficient from the definition can be quite challenging. The following theorem factorization theorem makes life easier. Theorem 0. Let X (X,..., X n ) be a random sample from the pdf f θ, where θ Θ is unknown. A statistic T is sufficient for θ if and only if there exists nonnegative functions g(t; θ) and h(x) (which does not depend on θ) such that for all points x and all θ Θ, we have L(x; θ) f(x i ; θ) g(t(x); θ)h(x). Clearly, by definition, a factorization holds if T is sufficient, so one direction of the proof is trivial. It is also immediate for Theorem 0, that a - function of a sufficient statistic is again sufficient. Let us also remark in Theorem 0, g(t; θ) does not have to be the density

6 6 SUFFICIENT STATISTICS function for T (X), and in the discrete case, we do not require that g(t) P(T t). The factorization of Theorem 0 is not unique. The utility of Theorem 0 lies in the fact that we do not need to identify the distribution of T. Before we prove the non-trivial direction of Theorem 0, let us apply it Exercise 9. Exercise. Apply Theorem 0 to solve Exercise 9. Solution (Solution to Exercise 9). The difference here is we still need the somewhat tricky algebra, but we no longer need to know that sum of independent normals is again normal. L(x; µ) 2π e (x i µ)2 2 (2π) n/2 e 2 n (x i x) 2 e n 2 ( x µ)2. Thus we choose, g( x; µ) e n 2 ( x µ)2 and h(x) (2π) n/2 e 2 n (x i x) 2. Exercise 2. Let X (X,..., X n ) be a random sample, where X i N(0, θ), where the variance θ is unknown. Show that T n X2 i is a sufficient statistic for θ. Exercise 3. Let X (X,..., X n ) be a random sample, where X i N(µ, σ 2 ), where both µ and σ 2 are unknown. Set θ (µ, σ 2 ). Let T ( X, S 2 ), where X is the usual sample mean, and S 2 is the usual sample variance. Show that L(x; θ) g(t(x); θ)h(x), some functions g and h, so that T is a sufficient statistic for θ. Exercise 4. Apply Theorem 0 to solve Exercise 4. Solution. Let x {0,, 2...} n, and t t(x) x + + x n. We have that P(X x) e λ λx i x i! λt e nλ x i!. Thus we choose g(t; λ) λ t e nλ and h(x) n. x i! Exercise 5. Let X (X,..., X n ) be a random sample, where X is a real-valued continuous random variable with a pdf given by f(x ; θ) h(x )c(θ)e w(θ)u(x ) Show that T n u(x i) is a sufficient statistic for θ.

7 SUFFICIENT STATISTICS 7 Proof Theorem 0 (discrete case). Let t t(x). We have by assumption that P(X x) g(t; θ) h(x). P(T t) P(T t) Let us remark that we do not have that g(t; θ) P(T t). Of course, P(T t) P θ (T t) depends on θ, and the claim is that the θ s in g(t; θ) cancel out the θ s in P(T t). To see why, let A : {y : t(y) t(x)}. Of course, x A, but there could be other elements; think of t as the sample sum, then if t(x) t, any other permutation of y of x, we have t(y) t. Thus, Hence P(T t) P(A) y A which does not depend on θ. P(X x) P(T t) P(X y) g(t; θ) y A h(y). h(x) y A h(y), The proof in the continuous case is more technical; your text has a proof of a special case of the continuous case. The above proof is similar to the proof of the following elementary fact. Theorem 6. Let X be a discrete random variable with pdf f. g : R R, then Eg(X) g(x)f(x), x whenever the sum is absolutely convergent. Proof. We have that Eg(X) y yp(g(x) y). Suppose X takes values on the set A. Let A y : {x A : g(x) y}. Note that the sets A y partition the set A. Thus P(g(X) y) P(A y ) x A y f(x) If and Eg(X) yf(x) g(x)f(x) g(x)f(x). y x A y y x A y End of Midterm coverage x A

1 Probability Model. 1.1 Types of models to be discussed in the course

1 Probability Model. 1.1 Types of models to be discussed in the course Sufficiency January 11, 2016 Debdeep Pati 1 Probability Model Model: A family of distributions {P θ : θ Θ}. P θ (B) is the probability of the event B when the parameter takes the value θ. P θ is described