Statistics 210B, Spring 1998 Class Notes P.B. Stark stark@stat.berkeley.edu www.stat.berkeley.eduèçstarkèindex.html January 29, 1998 Second Set of Notes 1 More on Testing and Conædence Sets See Lehmann, TSH, Ch. 3, 4, 5. Deænition 1 SèXè isaconædence level 1, æ conædence set for the parameter çèp ç è if P ç fsèxè 3 çèçèg ç1, æ 8ç 2 æ: è1è Deænition 2 Accuracy of Conædence Sets. The accuracy at ç èæè of the conædence set SèXè for the parameter çèæè is P ç fsèxè 3 çèp æ èg;çèçè 6= çèæè: è2è That is, it is the probability that the conædence set contains çèæè, when that is not the true value of ç. The accuracy is the same as the ëfalse coverage" probability that appeared on the right hand side of the Ghosh-Pratt identity. Typically, when there are nuisance parameters, there is no conædence set SèXè that minimizes 2 for all æ such that çèçè 6= çèæè èthere is no 1
uniformly most accurate conædence setè. However, if one restricts the class of conædence sets in ways that are sometimes reasonable, there are then optimal sets in the restricted class. Deænition 3 A 1, æ conædence set SèXè for the parameter çèçè is unbiased if for all ç, P ç fsèxè 3 çèæèg ç1, æ 8æ s.t. çèçè 6= çèæè: è3è That is, a conædence set is unbiased if the probability ofcovering a false value of ç is smaller than the conædence level 1, æ. In some sense, a biased conædence set treats some parameter values ç specially. For example, a biased 95è conædence set for the mean ç of a unit variance normal is f0gëëx, 1:96;X + 1:96ë. The coverage probability is 95è for all ç 2 R except ç = 0, which is in the set with probability one. The analogous property of a test is that there is no alternative value of ç for which the probability of rejection is less than the level æ of the test: Deænition 4 A level æ test æ of H against the alternative K is unbiased if æ æ èp ç è ç æ 8ç 2 H è4è and æ æ èpçè ç æ 8ç 2 K: è5è If the second inequality is strict èéè, the test is strictly unbiased. There exist UMP unbiased tests in many problems for which there is no UMP test, in particular, when ç èçè 6= ç èwhen there are nuisance parameters on which the distribution of X depends, but which are irrelevant to the truth of the hypothesisè. Lemma 1 èlehmann, TSH, Lemma 4.1.1è Suppose æ ç R, and let æ be the common boundary of fç : P ç 2 Hg and fç : P ç 2 Kg. IfP æ is such that the power function æ æ èp æ è is a continuous function of æ for every æ, and if æ 0 is UMP among all level æ tests such that æ æ èæè =æ 8æ 2 æ; è6è then æ 0 is UMP unbiased. 2
This lemma reduces questions about UMP unbiased tests to their behavior on the boundary between the null and alternative, provided the level is a continuous function of the parameter. In one-parameter exponential families, we saw that there exist UMP tests of one-sided hypotheses about the parameter. For two-sided hypotheses H : æ 1 ç ç ç æ 2 versus K : çéæ 1 or çéæ 2, there exist UMP unbiased tests; the form of their decision functions is èlehmann, TSH, 4.2è çèxè = 8é é: 1; T èxè éc 1 ;Tèxè éc 2 a i T èxè =c i ;i=1; 2 0 c 1 étèxè éc 2 è; è7è with c 1 and c 2 chosen s.t. EP æ1 çèxè =EP æ1 çèxè =æ: è8è Example. èlehmann, TSH, x4.2è Suppose P æ,æ=ë0; 1ë is distributions of the number of successes X in a sequence of a æxed number n of independent trials, each with probability ç of success. èi.e., X ç Binèn; çè.è This is a one-parameter exponential family. Consider the null hypothesis H : P ç = Binèn; æè, for æxed æ 2 ë0; 1ë. The null hypothesis is of the form 7 with æ 1 = æ 2 = æ, and T èxè =x, so the constraint 8 reduces to EP æ çèxè ==1, X c 2,1 x=c 1 +1 nc x æ x è1,æè n,x,a 1n C c1 æ c 1 è1,æè n,c 1,a 2n C c2 æ c 2 è1,æè n,c 2 =1,æ Note again that in practice, typically one would choose æ so that a 1 = a 2 = 0, rather than use a randomized test. è9è Deænition 5 A family of conædence level 1, æ conædence sets SèXè for çèçè is uniformly most accurate unbiased if for all ç 2 æ, it minimizes the probabilities P ç fsèxè 3 çèæèg 8æ s.t. çèæè 6= çèçè: è10è Conædence sets derived by inverting uniformly most powerful unbiased tests are uniformly most accurate unbiased. 3
2 Equivariant and Invariant Procedures Deænition 6 A group is a set G and an operation æ : GæG!G that satisæes for every g, h, k in G 1. g æ èh æ kè =ègæhè æ k èassociativityè 2. There exists a unique e 2Gsuch that e æ g = g æ e = g for every g 2Gèexistence of an identity elementè 3. For each g 2G, there exists g,1 2Gs.t. g æ g,1 = g,1 æ g = e. Typically, the dot in the notation will be omitted, so we shall write gh èmultiplicationè in place of g æ h. Another symbol commonly used for the group operation is +. Also, while formally a group is the pair èg; æè, G is commonly referred to as the group, with the group operation æ understood from context. Deænition 7 A group of transformations on the set X is a collection G of transformations g : X!Xand an operation æ : GæG!Gsuch that èg; æè isagroup. For example, let X = R n, and for each æ 2 R n, let g æ : X!X ç! ç + æ èvector addition in R n è è11è Let + denote the group operation, and deæne g æ + g ç = g æ+ç. With these deænitions, èg = fg æ : æ 2 R n g; +è is a group of transformations on X. The identity element of the group is g 0, the inverse of g æ is g,æ, and associativity of the group operation + follows from the associativity ofvector addition on R n. This group is called the translation group on R n. Deænition 8 Invariance ofdecision procedures. A statistical decision problem èa set P æ of distributions on an outcome space X, a loss function L, and a set of possible decisions Dè is invariant under the group G of transformations of the outcome space X if 4
1. The family P æ is closed under G, in the sense that for any æ 2 æ and any g 2G, there exists ç 2 æ such that if X ç P æ, gx ç P ç, with ç 2 æ, and the mapping çg :æ! æ, æ 7! ç is one-to-one and onto èçgæ =æè. 2. For each g 2G, there isatransformation hègè :D!Dsuch that hèg 1 g 2 è=hèg 1 èhèg 2 è, and Lèçgæ; hègèdè =Lèæ;dè for all æ, g, and d. Example. Suppose æ = R m, P æ = fnèç; Iè : ç 2 æg is the set multivariate normal distributions with independent, unit variance components, çèçè =ç, D = R m, and that Lèç; æè =kç, æk 2. Let G be the translation group on R m. Clearly, the set P æ is closed under G: if X ç P ç, then g æ X ç P ç+æ. The set of transformations on P æ induced by the action of G on X is çg, whose elements çgèg æ è=çg æ map P ç to P ç+æ.èçg is also a group, with the group operation deæned by çg æ +çg ç =çgèg æ+ç èè. If we deæne hèg ç è:d!d, æ 7! æ + ç, then hèg ç g ç èèæè =hèg ç èhèg ç èæ, and Lèçg ç ç; hèg ç èæè =Lèç; æè, as required by the conditions of equivariance. When the decision problem is invariant under G, it is reasonable to consider only invariant decision rules æ : X! D, for which æègxè = hègèæèxè. Lehmann ètsh, 1.5è draws a distinction between invariant and equivariant decision rules: for the former, hègèd = d for all d, while for the latter, æègxè varies with g. In the invariant case, the decision problem is unchanged under X 7! gx. Deænition 9 Suppose we are testing H against K. Given a transformation g on X,if fp çgç : P ç 2 Hg = H and fp çgç : P ç 2 Kg = K, we say the problem of testing H against K is invariant under the transformation g. The set of transformations under which a testing problem is invariant isalways a group, with the group operation deæned in the natural way; the induced set G ç of transformations on P æ also form a group èg ç is a homomorphism of Gè. Deænition 10 A function M : X!Yis maximal invariant with respect to the G of transformations on a set X if 1. Mèxè =Mègxè for all g 2GèM is invariant under Gè and 5
2. Mèxè =Mèyè è x = gy for some g 2G. A test is invariant if and only if it depends on x only through a maximal invariant M èxè. For example, suppose the observation X is an iid sample of size n from some common distribution. The distribution of X is clearly invariant under permutations of its components, so we could work instead with the set of order statistics without losing any information about ç. The order statistics are in fact a maximal invariant of the permutation group, so invariant tests regarding ç need depend on X only through its order statistics. Any test that treated the components of X diæerently would have anad hoc æavor. Ceteris paribus, it makes sense to base tests on a maximal invariant function of the data X èwith respect to Gè if the testing problem is invariant under G. Subject to some measurability considerations, the set of all invariant tests is characterized by the set of all decision functions çèxè =hèmèxèè where Mèxè is a maximal invariant. Basing tests on suæcient statistics reduces the outcome space X.Invariant tests reduce not only the outcome space X, but also the parameter space æ: Theorem 1 èsee Lehmann, TSH, 6.3 Th.3.è If M èxè is maximal invariant with respect to G and if çèçè is maximal invariant with respect to the induced group ç G, then the distribution of MèXè depends on ç only through çèçè. The utility of this theorem results from the fact that Mèxè and çèçè can turn out to be real-valued, even when the dimensions of X and æ are large, with the distribution of MèXè having monotone likelihood ratio in çèçè. That makes it possible to ænd UMP invariant tests using the one-dimensional optimality theory we saw earlier. Deænition 11 A test with decision function ç is almost invariant with respect to G if for all g 2G, çègxè =çèxè except on a P æ -null set N g that can depend on g. Remark. The power function of a test that is almost invariant under G is invariant under the induced group G. ç The converse is not true in general. Remark. Unbiasedness and invariance are not equivalent in general, in that UMP unbiased tests can exist when UMP almost-invariant ones do not, and vice versa. However, Lehmann 6
è6.6, Th. 7è shows that if in a given testing problem, there exists a UMP unbiased test with decision function ç æ that is unique up to sets of measure zero, and there also exists a UMP almost-invariant test w.r.t. some group G, then the UMP almost-invariant test is also unique up to sets of measure zero, and the two tests are the same a.e. Deænition 12 Equivariant Conædence Set. Suppose that the set of distributions P æ on X is preserved under the group G, and let çg be the group of transformations on æ induced by the action of G on X. Suppose that the action of çg on the component çèçè of the more general parameter ç depends only on çèçè; that is, çèçgèçèè = çèçgèæèè if çèçè =çèæè. For each g 2G, let ~gs = fçèçgèçèè : çèçè 2 Sg. IfSèxè is such that ~gsèxè =Sègxè 8x 2X;g 2G; è12è we say S is equivariant under G. Equivariant conædence sets result from inverting invariant tests. 7