4 Invariant Statistical Decision Problems

Size: px

Start display at page:

Download "4 Invariant Statistical Decision Problems"

Cynthia Jennings
5 years ago
Views:

1 4 Invariant Statistical Decision Problems 4.1 Invariant decision problems Let G be a group of measurable transformations from the sample space X into itself. The group operation is composition. Note that a group should include the identity transformation e and the inverse g 1, such that g 1 g = e. Consequently, all transformations are one-to-one. Definition 30. The family of distributions P θ, θ Θ, is said to be invariant under the group G, if for every g G and every θ Θ there exists a unique θ Θ such that the distribution of g(x) is given by P θ whenever the distribution of X is given by P θ. This unique θ is denoted by ḡ(θ). The meaning of the definition is that for every real-valued integrable function φ E θ φ(g(x)) = Eḡ(θ) φ(x). Definition 31. A parameter θ is said to be identifiable if distinct values of θ correspond to different distributions. If the family of distributions is invariant under G, then the unicity of θ implies that θ is identifiable. Lemma 8. If a family of distributions P θ, θ Θ, is invariant under G, then Ḡ = {ḡ : g G} is a group of transformations of Θ into itself. Definition 32. A decision problem, consisting of the game (Θ, A, L) and the distributions P θ over X is said to be invariant under the group G if the family of distributions is invariant and if the loss function is invariant under G in the sense that for every g G and a A there exist a unique a A such that L(θ, a) = L(ḡ(θ), a ) for all θ Θ. Denote the unique a by g(a). Lemma 9. If a decision problem is invariant under a group G, then G = { g : g G} is a group of transformations of A into itself. Example 14. Consider the shift group in the normal estimation problem with L(θ, a) = (a θ) 2. Here g c (x) = x + c. Thus, ḡ c (θ) = θ + c and g c (a) = a + c. 14

2 Example 15. Assume X is binomial. Let L(θ, a) = W (θ a), for some even function W. Let G = {e, g}, where g(x) = n x. The distribution of g(x) is B(n, 1 θ). Thus, ḡ(θ) = 1 θ and g(a) = 1 a. Example 16. Let X N(θ 1 1, θ2 2I) and Θ = {θ = (θ 1, θ 2 ) : θ 2 > 0}. Let A = R and let L(θ, a) = (a θ 1 ) 2 /θ2 2. Consider transformations of the form g b,c (x) = bx + c1, where b 0. Then ḡ b,c (θ) = (bθ 1 + c, b θ 2 ) and g b,c (a) = ba + c. Example 17. Consider again the situation in Example 16. If A = {0, 1} and we take { 1 if θ1 > 0 L(θ, 0) = 0, if θ 1 0, { 1 if θ1 0 L(θ, 1) = 0, if θ 1 > 0. For g b (x) = bx, b > 0, then ḡ b (θ) = bθ and g b (a) = a. 4.2 Invariant decision rules Definition 33. Given an invariant decision problem, a non-randomized decision rule d D is said to be invariant under G if for all x X and all g G d(g(x)) = g(d(x)). A randomized decision rule is invariant if it is a mixture of invariant decision rules. Theorem 14. The risk of an invariant decision rule is constant over the orbits of the group Ḡ. 4.3 Location and scale parameters Definition 34. A real parameter θ Θ is said to be a location parameter for the distribution of a random variable X if F θ (x) is a function of x θ only. Lemma 10. For the location parameter: 1. θ is a location parameter for the distribution of X if, and only if, the distribution of X θ is independent of θ. 15

3 2. If the distribution of X are absolutely continuous with density f θ (x), then θ is a location parameter if, and only if, f θ (x) = f(x θ), for some density function f(x). Example 18. The normal mean (known variance), the Cauchy α (known β), the U(θ, θ + 1) are all examples of location parameters. Definition 35. A real parameter θ Θ is said to be a scale parameter for the distribution of a random variable X if F θ (x) is a function of x/θ only. Lemma 11. For the scale parameter: 1. θ is a location parameter for the distribution of X if, and only if, the distribution of X/θ is independent of θ. 2. If the distribution of X are absolutely continuous with density f θ (x), then θ is a location parameter if, and only if, f θ (x) = (1/θ)f(x/θ), for some density function f(x). Example 19. The normal N(θµ, θ 2 ), the Cauchy β (known α), the U(0, θ), the β in a Gamma distribution (known α) are all examples of location parameters. One can combine both definitions and get a location-scale family with parameters (µ, θ). Note that the distribution of X µ is independent of µ, but the distribution of X/σ is not independent of σ. Lemma 12. If every nonrandomized invariant rule is an equalizer rule (that is, it has a constant risk), the nonrandomized invariant decision rules form an essentially complete class of all randomized invariant rules. Assume Θ = A = R and let L(θ, a) = L(a θ). Theorem 15. In the problem of estimating a location parameter with loss L(θ, a) = L(a θ), if E 0 (X b) exists and is finite for some b and if there exists b 0 such that E 0 L(X b 0 ) = inf b E 0L(X b), (1) (over b for which the expectation exists) then d 0 (x) = x b 0 is a best invariant rule. It has a constant risk equal to (1). Example 20. If L(θ, a) = (a θ) 2 and X has a finite variance, then b 0 = E 0 (X). If L(θ, a) = a θ and X has a finite first moment, then b 0 is the median of X under P 0. 16

4 Example 21. Let P θ (X = θ + 1) = 1 P θ (X = θ 1) = 0.5 and let { a θ if θ a 1, L(θ, a) = 1 if a θ > 1. The the best invariant rules are not admissible. Example 22. In the multidimensional normal location estimation problem the vector of means is the best invariant estimator. However, it is not admissible for dimension higher than 2, since the estimator: ( d(x) = X 1 k 2 ) X 2 is better (k is the dimension). However, typically the best invariant estimate is minimax. Theorem 16. If an equalizer rule is extended Bayes, it is also a minimax rule. Which leads to: Theorem 17. Under the conditions of Theorem 15, if L is bounded below and if for every ɛ > 0 there exists N such that E 0 L(X b)i { X N} inf b for all b, then the best invariant rule is minimax. E 0L(X b) ɛ An example of a location parameter is the normal mean: Theorem 18. If X 1,..., X n is a sample from a normal distribution with mean θ and known variance σ 2, then X is a best invariant estimate of θ and a minimax estimate of θ, provided that the loss function is a nondecreasing function of a θ and that E 0 L( X) exists and is finite. 4.4 Estimation of a distribution function Let X 1,..., X n be a sample from a continuous distribution F. We estimate the distribution with ˆF, which is continuous from the right. Two commonly used loss functions are: L 1 (F, ˆF ) = sup F (x) ˆF (x), (2) x 17

5 and L 2 (F, ˆF ) = (F (x) ˆF (x)) 2 F (dx), (3) A sufficient statistic is (X (1),..., X (n) ), the order statistics. Let us reduce the collection rules further by considering only invariant decision rules. Consider the group of transformations G = {g ψ : g ψ (x (1),..., x (n) ) = (ψ(x (1) ),..., ψ(x (n) ))} for all ψ continuous and strictly monotone. Then ḡ ψ (F )(x) = F (ψ 1 (x)) and g ψ ( ˆF )(x) = ˆF (ψ 1 (x)) for both losses. As part of the homework assignment you should prove that the invariant decision rules have the form ˆF (x) = n u i I [X(i),X (i+1) )(x), i=0 for = X (0) < X (1) < < X (n) < X (n+1) =. It turns out that for the second loss function and for invariant rules: R(F, ˆF ) = n E i=0 F (X(i+1) ) F (X (i) ) (t u i ) 2 dt. This function is minimized for u i = (i + 1)/(n + 2). 18

ST5215: Advanced Statistical Theory

ST5215: Advanced Statistical Theory Department of Statistics & Applied Probability Wednesday, October 5, 2011 Lecture 13: Basic elements and notions in decision theory Basic elements X : a sample from a population P P Decision: an action