Stat 643 Results. Roughly ßXÐ\Ñ is sufficient for ÖT ) (or for )) provided the conditional distribution of \ given XÐ\Ñ is the same for each

Size: px
Start display at page:

Download "Stat 643 Results. Roughly ßXÐ\Ñ is sufficient for ÖT ) (or for )) provided the conditional distribution of \ given XÐ\Ñ is the same for each"

Transcription

1 Stat 643 Results Sufficiency (and Ancillarity and Completeness Roughly ßXÐ\Ñ is sufficient for (or for provided the conditional distribution of \ given XÐ\Ñ is the same for Example Let \œð\ ß\ßÞÞÞß\Ñ, here the \ are iid Bernoulli Ð:Ñ and XÐ\Ñ œ \ You may sho that TÒ\ œblx œ>ó is the same for each : 3œ 3 : In some sense, a sufficient statistic carries all the information in \ (A variable ith the same distribution as \ can be constructed from XÐ\Ñ and some randomization hose form does not depend upon The $64 question is ho to recognize one The primary tool is the Factorization Theorem in its various incarnations This says roughly that X is sufficient iff 0ÐBÑ œ1ðxðbññ2ðbñ Result 1 (Page 20 Lehmann TSH Suppose that \ is discrete Then XÐ\Ñ is sufficient for iff b nonnegative functions 1Ð Ñ and 2Ð Ñ such that TÒ\ œbó œ1ðxðbññ2ðbñþ (You can and should prove this yourself The Stat 643 version of Result 1 is considerably more general and relies on measuretheoretic machinery Basically, one can easily push Result 1 as far as families of distributions dominated by a 5-finite measure The folloing, standard notation ill be used throughout these notes: ÐkU,, TÑa probability space (for fixed c E Ð1Ð\ÑÑ œ ' 1T XÀÐkßUÑpÐg ßY Ñ a statistic U œ UÐXÑœÖX ÐJÑ J Y the 5-algebra generated by X Definition 1 X (or U is sufficient for c if for every F U b a U -measurable function ] such that for ] œ E ÐM l Ñ as T F U Þ -1-

2 (Note that E ÐMF lu ÑœTÐFl U Ñ This definition says that there is a single random variable that ill serve as a conditional probability of F given U for all in the parameter space Definition 2 c is dominated by a 5 -finite measure if T Then assuming that c is dominated by, standard notation is to let T 0 œ and it is these objects (R-N derivatives that factor nicely iff X involves some nontrivial technicalities is sufficient To prove this Definition 3 c has a countable equivalent subset if b a sequence of positive constants Ö- 3 ith -3 œ and a corresponding sequence Ö 3 such that - œ -T 3 3 is 3œ 3œ equivalent to c in the sense that - ÐFÑ œ iff TÐFÑ œa Þ Notation: Assuming that c has a countable equivalent subset, let - be a choice of - 3 T 3 as in Definition 3 Theorem 2 (Page 575 Lehmann TSH c is dominated by a 5 -finite measure iff it has a countable equivalent subset Theorem 3 (Page 54 Lehmann TSH Suppose that c is dominated by a 5-finite measure and XÀÐkßUÑpÐg ßYÑ Then X is sufficient for c iff b nonnegative Y-measurable functions 1 such that a T ÐBÑ œ1ðxðbññ - as - (This last equality is equivalent to TÐFÑ œ ' F 1ÐXÐBÑÑ -ÐBÑ af U Theorem 4 (Page 55 Lehmann TSH (The Factorization Theorem Suppose that c here is 5-finite Then X is sufficient for c iff b a nonnegative U-measurable function 2 and nonnegative Y -measurable functions 1 such that a T ÐBÑ œ0ðbñ œ1ðxðbññ2ðbñ as -2-

3 Standard/elementary applications of Theorem 4 are to cases here: 1 is Lebesgue measure on e, c is some parametric family of -dimensional distributions, and the 0 are -dimensional probability densities, or 2 is counting measure on some countable set (like, eg, am b c is some parametric family of discrete distributions and the 0 are probability mass functions But it can also be applied much more idely (eg to cases here is the sum of - dimensional Lebesgue measure and counting measure on some countable subset of e Example Theorem 4 can be used to sho that for k œ e ß \œð\ ß\ßÞÞÞß\Ñß U œ U the -dimensional Borel 5-algebra, 1 œð1 ß1 ßÞÞÞß1 Ñ a generic permutation of the first positive integers, 1 Ð\Ñ œð\ 1 ß\ 1 ßÞÞÞß\ 1 Ñß the - dimensional Lebesgue measure and c œötlt and 0œ T satisfies 0Ð1 ÐBÑÑ œ0ðbñ as for all permutations 1 ß the order statistic XÐ\Ñ œð\ ÐÑß\ ÐÑßÞÞÞß\ ÐÑ Ñ is sufficient Actually, it is an important fact (that does not follo from the factorization theorem that one can establish the sufficiency of the order statistic even more broadly With c œöt on e such that TÐFÑ œtð1ðfññ for all permutations 1, direct argument shos that the order statistic is sufficient (You should try to prove this For F U,try ]ÐBÑ equal to the fraction of all permutations that place Bin F (Obvious Result 5 If XÐ\Ñ is sufficient for c and c is a subset of c, XÐ\Ñ is sufficient for c A natural question is Ho far can one go in the direction of 'reducing' \ ithout throing something aay? The notion of minimal sufficiency is aimed in the direction of ansering this question Definition 4 A sufficient statistic XÀÐkßUÑpÐg ßY Ñ is minimal sufficient (for or for c provided for every sufficient statistic WÀÐkßUÑpÐfßZÑba (?measurable? function h À fp g such that XœY W ae c (It is probablyequivalent or close to equivalent that UÐXÑ UÐWÑÞ There are a couple of ays of identifying a sufficient statistic -3-

4 Theorem 6 (Like page 92 of Schervish other development on pages of Lehmann TPE Suppose that c is dominated by a 5 -finite measure and XÀÐkßUÑpÐg ßYÑ is sufficient for c Suppose that there exist versions of the R-N T derivatives ÐBÑ œ0ðbñ such that athe existence of 5ÐBßCÑ such that 0ÐCÑ œ0ðbñ5ðbßcña b implies XÐBÑ œxðcñ then X is minimal sufficient (A trivial generalization of Theorem 6 is to suppose that the condition only holds for a set k k ith T probability 1 for all The condition in Theorem 6 is essentially that 0ÐCÑ XÐBÑ œxðcñ iff is free of 0ÐBÑ Note that sufficiency of XÐ\Ñ means that XÐBÑ œxðcñ implies athe existence of 5ÐBßCÑ such that 0ÐCÑ œ0ðbñ5ðbßcña b So the condition in Theorem 6 could be phrased as an iff condition Example Let \œð\ ß\ßÞÞÞß\Ñ, here the \ are iid Bernoulli Ð:Ñ and XÐ\Ñ œ \ 3 You may use Theorem 6 to sho that sho that X is minimal sufficient 3œ Lehmann emphasizes a second method of establishing minimality based on the folloing to theorems 5 Theorem 7 (Page 41 of Lehmann TPE Suppose that c œöt 3 3œ is finite and let Ö0 3 be the R-N derivatives ith respect to some dominating 5-finite measure (like, eg, 5 TÑÞ Suppose that all 5 derivatives 0 are positive on all of k Then 3œ 3 3 is minimal sufficient 0ÐBÑ 0ÐBÑ 0ÐBÑ 5 XÐ\Ñ œ Œ ß ßÞÞÞß 0ÐBÑ 0ÐBÑ 0ÐBÑ (Again it is a trivial generalization to observe that the minimality also holds if there is a set k k ith T probability 1 for all here the 5 derivatives 0 3 are positive on all of k ÞÑ Theorem 8 (Page 42 of Lehmann TPE Suppose that c is a family of mutually absolutely continuous distributions and that c cþ If X is sufficient for c and minimal sufficient for c, then it is minimal sufficient for c The ay that Theorems 7 and 8 are used together is to look at a fe 's, invoke Theorem 7 to establish minimal sufficiency for the finite family of a vector made up of a fe likelihood ratios and then use Theorem 8 to infer minimality for the hole family -4-

5 Example Let \œð\ ß\ßÞÞÞß\Ñ 8, here the \ 3 are iid Beta Ð ß Ñ Using T the Beta ÐßÑ distribution, T the Beta ÐßÑ distribution and T the Beta ÐßÑ distribution, 8 8 Theorems 7 and 8 can be used to sho thatœ \ß 3 Ð \Ñ 3 is minimal sufficient for the Beta family 3œ 3œ Definition 5 A statistic XÀÐkßUÑpÐg ßYÑ is said to be ancillary for c if the distribution of X (ie, the measure T X on Ðg ßYÑdefined by T X ÐJÑœTÐX ÐJÑÑÑ does not depend upon (ie is the same for all Definition 6 A statistic XÀÐkßUÑpÐe ßU Ñ is said to be first order ancillary for c if the mean of X (ie, E ÐXÐ\ÑÑ œ ' XT does not depend upon (ie is the same for all Example Let \œð\ ß\ßÞÞÞß\Ñ 8, here the \ 3 are iid N Ð,Ñ The sample variance of the 's is ancillary \ 3 Lehmann argues that roughly speaking models ith nice (lo dimensional minimal sufficient statistics X are ones here no nontrivial function of X is ancillary (or first order ancillary This line of thinking brings up the notion of completeness Notation: Continuing to let XÀÐkßUÑpÐg ßYÑ and T X be the measure on Ðg ßYÑ defined by T X ÐJÑœTÐX ÐJÑÑ, let c X œöt X X Definition 7a c or X is complete (boundedly complete for c or if i YÀÐkßU ÑpÐe ßU Ñ U -measurable (and bounded, and ii E YÐ\Ñ œa imply that Yœ as T a Þ By virtue of Lehmann's Theorem (see Cressie's Probability Summary this is equivalent to the next definition X Definition 7b c or X is complete (boundedly complete for c or if i 2ÀÐg ßYÑpÐe ßU Ñ Y-measurable (and bounded, and ii E 2ÐXÐ\ÑÑ œa imply that 2 Xœ as T a Þ These definitions say that there is no nontrivial function of X that is an unbiased estimator of 0 This is equivalent to saying that there is no nontrivial function of X that is first order ancillary Notice also that completeness implies bounded completeness, so that bounded completeness is a slightly eaker condition than completeness -5-

6 Example Let \œð\ ß\ßÞÞÞß\Ñ, here the \ are iid Bernoulli Ð:Ñ and XÐ\Ñ œ \ 3 You may use the fact that polynomials can be identically 0 on an interval 3œ only if their coefficients are all 0, to sho that XÐ\Ñ is complete for : ÒßÓ Theorem 9 (Page 99 Schervish (Basu's Theorem If X is a boundedly complete sufficient statistic and Y is ancillary, then for all the random variables Y and X are independent (according to T Þ (This theorem has some uses in the identification of UMVUE's (Obvious Result 10 If X is complete for c and X need not be complete for c Theorem in the sense that TÐFÑ implies that TÐFÑ then X (boundedly complete for c implies that X is (boundedly complete for c Completeness of a statistic says that it is not carrying any ancillary information, stuff that in and of itself is of no help in inference about That is reminiscent of minimal sufficiency Theorem 12 (Bahadur's Theorem Suppose that XÀÐkßUÑpÐg ßYÑ is sufficient and boundedly complete i (Pages of Schervish If Ðg ßYÑœÐe ßU Ñ then X is minimal sufficient ii If there is any minimal sufficient statistic, X is minimal sufficient Example Suppose that c (here is 5 -finite and that e 5 T ÐBÑ œ-ð Ñexp 3XÐBÑ 3 Þ Provided, for example, contains an open rectangle in e 5, Theorem 1 page 142 Lehmann TSH implies that XÐ\Ñ œðx Ð\ÑßXÐ\ÑßÞÞÞßXÐ\ÑÑ 5 is complete It is clearly sufficient by the factorization criterion So by part i of Theroem 12, X is minimal sufficient 5 3œ -6-

7 Decision Theory This is a formal frameork for posing problems of finding good statistical procedures as optimization problems Basic Elements of a (Statistical Decision Problem: \ data formally a mapping ÐH[ ß parameter space often one needs to do integrations so let V be a 5 -algebra and assume that for F U, the mapping taking pt ÐFÑ from Ð@V ß ÑpÐV ßU Ñ is measurable T decision/action space the class of possible decisions or actions e ill also need to average over (random actions, so let X be 5-algebra on T PÐ ß+Ñ loss function TpÒß_Ñ and needs to be suitably measurable $ ÐBÑ decision rule $ ÐBÑ ÀÐkßUÑpÐTX ß Ñ and for data \, $ Ð\Ñ is the decision taken on the basis of observing \ The fact that PÐ$ ß Ð\ÑÑ is random prevents any sensible comparison of decision rules until one has averaged out over the distribution of \, producing something called a risk function Definition 8 VÐ$ ß Ñ» E PÐ$ ß Ð\ÑÑ œ ' PÐ$ ß ÐBÑÑT ÐBÑ pòß_ñ and is called the risk function (of the decision rule $ Definition 9 $ is at least as good as $ if VÐ$ ß ÑŸVÐ$ ß Ña Þ Definition 10 $ is better than $ if VÐ$ ß ÑŸVÐ$ ß Ña ith strict inequality for at least Definition 11 $ and $ are risk equivalent if VÐ$ ß ÑœVÐ$ ß Ña Þ (Note that risk equivalency need not imply equality of decision rules Definition 12 $ is best in a class of decision rules? if i $? and ii $ is at least as good as any other $? Þ Typically, only smallish classes of decision rules ill have best elements (For example, in some point estimation problems there are best unbiased estimators, here no estimator is best overall Here the class? is the class of unbiased estimators An alternative to looking for best elements in small classes of rules is to someho reduce the function of, VÐ$ ß Ñ, to a single number and look for rules minimizing the number (Reduction by -7-

8 averaging amounts to looking for Bayes rules Reduction by taking a maximum value amounts to looking for minimax rules Definition 13 $ is inadmissible in? if b $? hich is better than $ (hich dominates $ Definition 14 $ is admissible in? if it is not inadmissible in?(if there is no $? hich is better On the face of it, it ould seem that one ould never ant to use an inadmissible decision rule But, as it turns out, there can be problems here all rules are inadmissible (More in this direction later Largely for technical reasons, it is convenient/important to somehat broaden one's understanding of hat constitutes a decision rule To this end, as a matter of notation, let W œö$ ÐBÑ œthe class of (nonrandomized decision rules Then one can define to fairly natural notions of randomized decision rules Definition 15 Suppose that for each B k, 9B is a distribution on ÐTX ß Ñ Then 9B is called a behavioral decision rule The idea is that having observed \œb, one then randomizes over the action space according to the distribution 9 B in order to choose an action (This should remind you of hat is done in some theoretical arguments in hypothesis testing Notation: Let W œö 9 œthe class of behavioral decision rules B $ It is possible to think of W as a subset of W For $ W, let 9B be a point mass $ distribution on T concentrated at $ ÐBÑ Then 9 W and is clearly equivalent to $ The risk of a behavioral decision rule is (abusing notation and reusing the symbols VÐ ß Ñ in this context as ell as that of Definition 8 clearly VÐ9 ß Ñœ ( ( PÐ ß+Ñ 9BÐ+ÑT ÐBÑ Þ k T B A somehat less intuitively appealing notion of randomization for decision rules is next Let Y be a 5-algebra on W that contains the singleton sets Definition 16 A randomized decision function (or decision rule < is a probability measure on ÐWY ß ÑÞ ( $ ith distribution < becomes a random object The idea here is that prior to observation, one chooses an element of W by use of <, say $, and then having observed \œbdecides $ ÐBÑ -8-

9 Notation: Let W œö < œ the class of randomized decision rules It is possible to think of W as a subset of W by identifying $ W ith < $ placing mass 1 on $ < is clearly equivalent to $ $ The risk of a randomized decision rule is (once again abusing notation and reusing the symbols VÐ ß Ñ clearly VÐ< ß Ñœ ( VÐ$ ß Ñ <$ Ð Ñœ ( ( PÐ$ ß ÐBÑÑT ÐBÑ <$ Ð Ñ ß W W k assuming that VÐ< ß Ñ is properly measurable The behavioral decision rules are probably more appealing/natural than the randomized decision rules On the other hand, the randomized decision rules are often easier to deal ith in proofs So a reasonably important question is When are W and W equivalent in the sense of generating the same set of risk functions? Result13 (Some properly qualified version of this is true See Ferguson pages If T is a complete separable metric space ith X the Borel 5-algebra, appropriate conditions hold on the T and??????? holds, then W and W are equivalent in terms of generating the same set of risk functions It is also of interest to kno hen the extra complication of considering randomized rules is really not necessary One simple result in that direction concerns cases here P is convex in its second argument Lemma 14 (See page 151 of Schervish, page 40 of Berger and page 78 of Ferguson Suppose that T is a convex subset of e and 9B is a behavioral decision rule Define a nonrandomized decision rule $ by $ ÐBÑ œ ( + 9 Ð+Ñ T B (In the case that, interpret $ ÐBÑ as vector-valued, the integral as a vector of integrals over the coordinates of + T Then i If PÐ ß ÑÀ TpÒß_Ñ is convex, then VÐ$ ß ÑŸVÐ9 ß Ñ Þ ii If PÐ ß ÑÀ T pòß_ñ is strictly convex, VÐ9 ß Ñ _ and then TÖBl 9 B is nondegenerate, VÐ$ ß Ñ VÐ9 ß Ñ Þ -9-

10 (Strict convexity of a function 1 means that for ÐßÑ 1Ð B Ð ÑCÑ 1ÐBÑ Ð Ñ1ÐCÑ for Band C in the convex domain of 1Þ See Berger pages 38 through 40 There is an extension of Jensen's inequality that says that if 1 is strictly convex and the distribution of \ is not concentrated at a point, then E1Ð\Ñ 1ÐE Ð\ÑÑ Corollary 15 Suppose that T is a convex subset of e and 9B is a behavioral decision rule Define a nonrandomized decision rule $ by $ ÐBÑ œ ( + 9 Ð+Ñ T B i If PÐ ß ÑÀ T pòß_ñ is convex a, then $ is at least as good as 9 ii If PÐ ß ÑÀ T pòß_ñ is convex a and for some the function PÐ ß ÑÀ T pòß_ñ is strictly convex, VÐ ß9Ñ _ and then $ is better than 9 T ÖBl 9 B is nondegenerate Corollary 15 says roughly that in the case of a convex loss function, one need not orry about considering randomized decision rules (Finite Dimensional Geometry of Decision Theory A very helpful device for understanding some of the basics of decision theory is to consider the geometry associated ith cases is finite So until further notice, œö ß ßÞÞÞß 5 and VÐ< ß Ñ and < W As a matter of notation, let f œöc œðc ßCßÞÞÞßCÑ e lc œvð ß< Ña3and < W f is the set of all randomized risk vectors -10-

11 Theorem 16 f is convex In fact, it turns out to be the case, and is more or less obvious from Theorem 16, that if f œöclc œvð ß< Ña3and < W, 3 3 then f is the convex hull of f, ie the smallest convex set containing f Definition 17 For B e 5, the loer quadrant of Bis U œödld ŸB a3 Þ B 3 3 Theorem 17 C f (or the decision rule giving rise to C is admissible iff UC f œöc Definition 18 The loer boundary of f is -f Ð ÑœÖClU f for f the closure of f C œöc Definition 19 f is closed from belo if -f Ð Ñ f Notation: Let to them in f EÐfÑ stand for the class of admissible rules (or risk vectors corresponding Theorem 18 If f is closed from belo, then EÐfÑœ -f Ð ÑÞ Theorem 18 If f is closed, then EÐfÑœ -f Ð Ñ Complete Classes of Decision Rules No drop the finiteness assumption Definition 20 A class of decision rules V W is a complete class if for any 9  V, b a 9 V such that 9 is better than 9 Definition 21 A class of decision rules V W is an essentially complete class if for any 9  V, b a 9 V such that 9 is at least as good as 9 Definition 22 A class of decision rules V W is a minimal complete class if it is complete and is a subset of any other complete class Notation: Let A ÐW Ñ be the set of admissible rules in W Theorem 19 If a minimal complete class Vexists, then V œeðw Ñ -11-

12 Theorem 20 If EÐ W Ñ is complete, then it is minimal complete Theorems 19 and 20 are the properly qualified versions of the rough (and not quite true statement EÐW Ñ equals the minimal complete class Sufficiency and Decision Theory We orked very hard on the notion of sufficiency, ith the goal in mind of establishing that one doesn't need \ if one has XÐ\Ñ That hard ork ought to have some implications for decision theory Presumably, in most cases one can get by ith basing a decision rule on a sufficient statistic Result 21 (See page 120 of Ferguson, page 36 of Berger and page 151 of Schervish Suppose a statistic X is sufficient for c œöt Then if 9 is a behavioral decision rule there exists another behavioral decision rule 9 that is a function of X and has the same risk function as 9 Note that 9 a function of X means that XÐBÑ œxðcñ implies that 9B and 9C are the same measures on T Note also that Result 21 immediately implies that the class of decision rules in W that are functions of X is essentially complete In terms of risk, one need only consider rules that are functions of sufficient statistics The construction used in the proof of Result 21 doesn't aim to improve on the original (behavioral decision rule by moving to a function of the sufficient statistic But in convex loss cases, there is a construction that produces rules depending on a sufficient statistic and potentially improving on an initial (nonrandomized decision rule This is the message of Theorem 24 But to prove Theorem 24, a lemma of some independent interest is needed That lemma and an immediate corollary are stated before Theorem 24 5 Lemma 22 Suppose that T e is convex and $ ÐBÑ and $ ÐBÑ are to nonrandomized decision rules Then $ ÐBÑ œ Ð$ ÐBÑ $ ÐBÑÑ is also a decision rule Further, i if PÐ ß+Ñ is convex in + and VÐ$ ß ÑœVÐ$ ß Ñ, then VÐ$ ß ÑŸVÐ$ ß Ñ, and ii if PÐ ß+Ñ is strictly convex in +ßVÐ$ ß ÑœVÐ$ ß Ñ _ and TÐ $ Ð\Ñ Á $ Ð\ÑÑ, then VÐ$ ß Ñ VÐ$ ß ÑÞ 5 Corollary 23 Suppose T e is convex and $ ÐBÑ and $ ÐBÑ are to nonrandomized decision rules ith identical risk functions If PÐ ß+Ñ is convex in +a, the existence of such that PÐ ß+Ñ is strictly convex in +ßVÐ$ ß ÑœVÐ$ ß Ñ _ and TÐ $ Ð\Ñ Á $ Ð\ÑÑ, implies that $ and $ are inadmissible -12-

13 Theorem 24 (See Berger page 41, Ferguson page 121, TPE page 50, Schervish page 152 (Rao-Blackell Theorem Suppose that T e 5 is convex and $ is a nonrandomized decision function ith El$ Ð\Ñl _a Þ Suppose further that X is sufficient for and ith U œ UÐXÑ, let $ ÐBÑ» EÒ$U l ÓÐBÑ Þ Then $ is a nonrandomized decision rule Further, i if PÐ ß+Ñ is convex in +, then VÐ$ ß ÑŸVÐ$ ß Ñ, and ii for any for hich PÐ ß+Ñ is strictly convex in +ßVÐ$ ß Ñ _ and TÐ $ Ð\Ñ Á $ Ð\ÑÑ, one has that VÐ$ ß Ñ VÐ$ ß ÑÞ Bayes Rules The Bayes approach to decision theory is one means of reducing risk functions to numbers so that they can be compared in a straightforard fashion Notation: Let K be a distribution on Ð@V ß Ñ K is usually (for philosophical reasons called the prior distribution Definition 23 The Bayes risk of 9 W ith respect to the prior K is (abusing notation again and once more using the symbols VÐ ß Ñ VÐKß9 Ñœ ( VÐ9 ß ÑKÐ Ñ Þ Notation: Again abusing notation somehat, VÐKÑ» inf VÐKß9 Ñ Þ 9 W Definition 24 9 is said to be Bayes ith respect to K (or to be a Bayes rule ith respect to K provided VÐKß9ÑœVÐKÑ Þ Definition 25 9 is said to be %-Bayes ith respect to K provided VÐKß9 ÑŸVÐKÑ % Þ Bayes rules are often admissible, at least if the prior involved spreads its mass around sufficiently Theorem 25 is finite (or countable, K is a priordistribution ith KÐ Ñ a and 9 is Bayes ith respect to K, then 9 is admissible -13-

14 5 Theorem 26 e is such that every neighborhood of any has nonempty intersection ith the interior Suppose further that VÐ9 ß Ñ is continuous in for all 9 W Let K be a prior distribution that has in the sense that every open ball that is a subset has positive K probability Then if VÐKÑ _ and 9 is a Bayes rule ith respect to K, 9 is admissible Theorem 27 If every Bayes rule ith respect to K has the same risk function, they are all admissible Corollary 28 In a problem here Bayes rules exist and are unique, they are admissible A converse of Theorem 25 (for is given in Theorem 30 Its proof requires the use of a piece of 5-dimensional analytic geometry (called the separating hyperplane theorem that is stated next 5 5 Theorem 29 Let f and f be to disjoint convex subsets of e Then b a : e such that :Á and :B Ÿ :C ac f and B f Theorem 30 is finite and 9 is admissible, then 9 is Bayes ith respect to some prior For admissible rules are usually Bayes or limits of Bayes rules See Section 210 of Ferguson or page 546 of Berger As the next result shos, Bayesians don't need randomized decision rules, at least for purposes of achieving minimum Bayes risk, VÐKÑ Result 31 (See also page 147 of Schervish Suppose that < W is Bayes versus K and VÐKÑ _ Then there exists a nonrandomized rule $ W that is also Bayes versus KÞ There remain the questions of 1 hen Bayes rules exist and Ñ hat they look like hen they do exist These issues are (to some extent addressed next Theorem 32 is finite, f is closed from belo and K assigns positive probability to there is a rule Bayes versus K Definition 26 A formal nonrandomized Bayes rule versus a prior K is a rule $ ÐBÑ such that ab k ß $ ÐBÑ is an + T minimizing PÐ ß+Ñ 0ÐBÑ ( ' 0ÐBÑKÐ Ñ -14-

15 Definition 27 If K is a 5-finite measure, a formal nonrandomized generalized Bayes rule versus K is a rule $ ÐBÑ such that ab kß Minimax Decision Rules $ ÐBÑ is an + T minimizing ( PÐ ß+Ñ0 ÐBÑKÐ Ñ Þ An alternative to the Bayesian reduction of VÐ9 ß Ñ to a number by averaging according to the distribution K, is to instead reduce VÐ9 ß Ñ to a number by maximization over Definition 28 A decision rule 9 H is said to be minimax if Definition 29 If a decision rule supvð9 ß Ñœ infsupvð9 ß Ñ Þ 9 has constant (in risk, it is called an equalizer rule It should make some sense that if one is trying to find a minimax rule by so to speak pushing don the highest peak in the profile VÐ9 ß Ñ, doing so ill tend to drive one toards rules ith flat VÐ9 ß Ñ profiles Theorem 33 If 9 is an equalizer rule and is admissible, then it is minimax Theorem 34 Suppose that Ö 93 is a sequence of decision rules, each 93 Bayes against a prior K3 If VÐK3ß93ÑpG and 9 is a decision rule ith VÐ9 ß ÑŸG a, then 9 is minimax Corollary 35 If 9 is Bayes versus K and VÐ9 ß ÑŸVÐKÑa, then 9 is minimax Corollary 35' If 9 is an equalizer rule and is Bayes ith respect to a prior K, then it is minimax A reasonable question to ask, in the light of Corollaries 35 and 35', is Which K should I be looking for in order to produce a Bayes rule that might be minimax? The anser to this question is Look for a least favorable prior Definition 30 A prior distribution K is said to be least favorable if VÐKÑ œ sup VÐK Ñ K Intuitively, one might think that if an intelligent adversary ere choosing, he/she might ell generate it according to a least favorable K, therefore maximizing the minimum possible risk In response a decision maker ought to use 9 Bayes versus K Perhaps then, this Bayes rule might tend to minimize the orst that could happen over choice of? Theorem 36 If 9 is Bayes versus K and VÐ9 ß ÑŸVÐKÑa, then K is least favorable -15-

16 (Corollary 35 already shos 9 in Theorem 36 to be minimax -16-

17 Point Estimation (Unbiasedness, Exponential Families, Information Inequalities, Invariance and Equivariance and Maximum Likelihood Unbiasedness One possible limited class of sensible decision rules in estimation problems is the class of unbiased estimators (Here, e are tacitly assuming that T is e or some subset of it Definition 31 The bias of an estimator $ ÐBÑ for Ð Ñ e is E Ð$ Ð\Ñ Ð ÑÑ œ ( Ð$ ÐBÑ Ð ÑÑT ÐBÑ Definition 32 An estimator $ ÐBÑ is unbiased for Ð Ñ e provided k E Ð$ Ð\Ñ Ð ÑÑ œa Þ Before jumping into the business of unbiased estimation ith too much enthusiasm, an early caution is in order We supposedly believe that Bayes admissible and that admissibility is sort of a minimal goodness requirement for a decision rule but there are the folloing theorem and corollary Theorem 37 (See Ferguson pages 48 and 52 If PÐ ß+Ñ œað ÑÐ Ð Ñ +Ñ for AÐ Ñ, and $ is both Bayes rt K ith finite risk function and unbiased for Ð Ñ, then (according to the joint distribution of Ð\ß ÑÑ $ Ð\Ñ œ Ð Ñ as Þ Corollary 38 e has at least to elements and the family c œöt is mutually absolutely continuous If PÐ ß+Ñ œð +Ñ no Bayes estimator of is unbiased Definition 33 $ is best unbiased for Ð Ñ e if it is unbiased and at least as good as any other unbiased estimator There need not be any unbiased estimators in a given estimation problem, let alone a best one But for many common estimation loss functions, if there is a best unbiased estimator it is unique Theorem 39 Suppose that T e is convex, PÐ ß+Ñ is convex in +a and $ and $ are to best unbiased estimators of Ð Ñ Then for any for hich PÐ ß+Ñ is strictly convex in + and VÐ$ ß Ñ _ TÒ $ Ð\Ñ œ $ Ð\ÑÓ œ Þ -17-

18 The special case of PÐ ß+Ñ œð Ð Ñ +Ñ is, of course, one of most interest For this case of uneighted squared error loss, for estimators ith second moments VÐ$ ß Ñœ Var $ Ð\Ñ Definition 34 $ is called a UMVUE (uniformly minimum variance unbiased estimator of Ð Ñ e if it is unbiased and Var $ Ð\Ñ Ÿ Var $ Ð\Ñ a for any other unbiased estimator of Ð Ñ, $ A much eaker notion is next Definition 35 $ is called locally minimum variance unbiased if it is unbiased and Var $ Ð\Ñ Ÿ Var $ Ð\Ñ for any other unbiased estimator of Ð Ñ, $ Notation: Let h œö$ l E $ Ð\Ñ œ and E $ Ð\Ñ _ a Þ ( h is the set of unbiased estimators of 0 ith finite variance Theorem 40 Suppose that $ is an unbiased estimator of Ð Ñ e and E $ Ð\Ñ _ a Then $ is a UMVUE of Ð Ñ iff Cov Ð$ Ð\Ñß?Ð\ÑÑ œa a? h Theorem 41 (Lehmann/Scheffe Suppose that T is a convex subset of e, X is a sufficient statistic for c œöt and $ is an unbiased estimator of Ð Ñ Let $ œ E Ò$ lxóœ 9 X Then $ is an unbiased estimator of Ð ÑÞ If in addition, X is complete for c, i 9 X unbiased for Ð Ñ implies that 9 Xœ 9 X as T a, and ii PÐ ß+Ñ convex in +a implies that a $ is best unbiased, and b if $ is any other best unbiased estimator of Ð Ñ $ œ $ as T for any at hich VÐ$ ß Ñ _ and PÐ ß+Ñ is strictly convex in + (Note that iia says in particular that $ is UMVUE, and that iib says that $ is the unique UMVUE -18-

19 Basu's Theorem (Theorem 9 is on occasion helpful in doing the conditioning required to get $ of Theorem 41 The most famous example of this is in the calculation of the UMVUE of F ˆ - 5 based on 8 iid RÐ5 ß Ñ observations Exponential Families The most common method of establishing completeness of a statistic X (at least for the standard examples for use in the construction of Theorem 41 is an appeal to the properties of Exponential Families of distributions Hence the placement of the next batch of class material A List of Facts About Exponential Families: Definition 36 Suppose that c œöt is dominated by a 5 -finite measure c is called an exponential family if for some 2ÐBÑ, T 0ÐBÑ» ÐBÑ œ exp š +Ð Ñ ( 3Ð ÑX3ÐBÑ 2ÐBÑ a Definition 37 A family of probability measures Á implies that T ÁT Þ 5 3œ c œöt is said to be identifiable if 5 Fact(s 42 Let ( œð( ß( ßÞÞÞß( Ñß > œö( e l' 2ÐBÑexp š ( XÐBÑ ÐBÑ _, and consider also the family of distributions (on k ith R-N derivatives rt of the form 5 0ÐBÑ ( œoð( Ñexp š ( XÐBÑ 2ÐBÑ, ( > 3œ Call this family c This set of distributions on k is at least as large as c (and this second parameterization is mathematically nicer than the first > is called the natural 5 parameter space for c Þ It is a convex subset of e ( TSH, page 57 If > lies in a subspace of dimension less than 5, then 0ÐBÑ ( (and therefore 0ÐBÑÑ can be ritten in a form involving feer than 5 statistics X3 We ill henceforth assume > to be fully 5 dimensional Note that depending upon the nature of the functions ( 3Ð Ñand the parameter c may be a proper subset of c That is, defining 5 > œöð( Ð Ñß( Ð ÑßÞÞÞß( Ð ÑÑ e ß > can be a proper subset of 5 Fact 43 The support of T, defined as ÖBl0 ÐBÑ, is clearly ÖBl2ÐBÑ, hich is independent of The distributions in c are thus mutually absolutely continuous 3 3 Fact 44 From the Factorization Theorem, the statistic for X œðx ßXßÞÞÞßXÑ 5 is sufficient -19-

20 Fact 45 X has induced measures ÖT X hich also form an exponential family 5 Fact 46 If contains an open rectangle in e, then X is complete for c (See pages of TSH 5 Fact 47 If contains an open rectangle in e (and actually under the much eaker assumptions given on page 44 of TPE X is minimal sufficient for c Fact 48 If 1 is any measurable real valued function such that E l1ð\ñl _, then E ( 1Ð\Ñ œ ( ( 1ÐBÑ0 ÐBÑ ÐBÑ is continuous on > and has continuous partial derivatives of all orders on the interior of > These can be calculated as ` â 5 ` â 5 E 1Ð\Ñ œ 1ÐBÑ 0ÐBÑ ÐBÑ 5 ( ( 5 ( `( `( â`( `( `( â`( (See page 59 of TSH 5 5 Fact 49 If for? œð? ß?ßÞÞÞß?Ñ, both ( and (? belong to > Further, if 5 OÐ( Ñ E( expš?xð\ñ?xð\ñ â?xð\ñ 5 5 œ Þ OÐ(? Ñ ( is in the interior of >, then â 5 ` 5 E ( 5 ( 5 `( `( â`( Œ 5 OÐ( Ñ º ( œ( ÐX Ð\ÑX Ð\ÑâX Ð\ÑÑ œoð Ñ Þ ` ` ( 4 `( ( 4 4 ( œ( `( 4 ( œ( ` 4 6 `( 4`( a ( b¹ 6 ( œ( In particular, E XÐ\Ñ œ a log OÐ( Ñ b ¹, Var XÐ\Ñ œ a log OÐ( Ñ b ¹, and Cov ( XÐ\Ñ, XÐ\ÑÑ œ log OÐ Ñ Þ ( ( Fact 50 If \ œð\ ß\ßÞÞÞß\Ñ 8 has iid components, ith each \ 3 µt, then \ generates a 5 dimensional exponential family on ( k ßU rt The 5 dimensional 8 statistic X Ð\ Ñ is sufficient for this family Under the condition that contains an 3œ 3 open rectangle, this statistic is also complete and minimal sufficient Information Inequalities These are inequalities about variances of estimators, some of hich involve information measures (See Schervish Sections 231, 512 and TPE Sections 26, 27 The fundamental tool here is the Cauchy-Scharz inequality -20-

21 Lemma 51 (Cauchy-Scharz If Y and Z are random variables ith E Y _ and E Z _, then EY EZ ÐEYZÑ Þ Applying this to Yœ\ E \ and Z œ] E ], e get the folloing Corollary 52 If \ and ] have finite second moments, ÐCovÐ\ß]ÑÑ Ÿ Var\ Var ] Corollary 53 Suppose that $ Ð]Ñ takes values in e and E Ð$ Ð]ÑÑ _ If 1is a measurable function such that E 1Ð] Ñœ and E Ð1Ð]ÑÑ _, defining G» E$ Ð]Ñ1Ð]Ñ G Var $Ð] Ñ EÐ1Ð]ÑÑ A restatement of Corollary 53 in terms that look more statistical is next Corollary 53' Suppose that $ Ð\Ñ takes values in e and E Ð$ Ð\ÑÑ _ If 1 is a measurable function such that E 1Ð\Ñ œ and E Ð1 Ð\ÑÑ _, defining GÐ Ñ» $ Ð\Ñ1 Ð\Ñ E G Ð Ñ Var $ Ð\Ñ E Ð1 Ð\ÑÑ Definition 38 We ill say that the model c ith dominating 5 -finite measure is FI regular e, provided there exists an open neighborhood of, say S, such that i 0ÐBÑ aband a S, ii abß 0ÐBÑ is differentiable at and iii œ ' 0ÐBÑ ÐBÑ can be differentiated rt under the integral at, ie œ ' 0ÐBÑ ¹ ÐBÑ œ (A definition like that on page 111 of Schervish, that requires i and ii only on a set of T probability 1 a in S, is really no more general than this WOLOG, just thro aay the exceptional set Jargon: The (random function of, log 0Ð\Ñ is called the score function iii of Definition 3 says that the mean of the score function is at -21-

22 Definition 39 If the model c is FI regular at, then MÐ Ñœ E log0ð\ñ Œ ¹ is called the Fisher Information contained in \ at Þ ÐIf a model is FI regular at, MÐ Ñ is the variance of the score function at œ Theorem 54 (Cramer-Rao Suppose that the model c is FI regular at and that MÐ Ñ _Þ If E $ Ð\Ñ œ ' $ ÐBÑ0 ÐBÑ ÐBÑ can be differentiated under the integral at, ie if E $ Ð\ѹ = ' $ ÐBÑ 0ÐBÑ ¹ ÐBÑ, then œ œ Var $ Ð\Ñ Œ E $ Ð\ѹ MÐ Ñ œ Þ For the special case of $ Ð\Ñ vieed as an estimator of, e sometimes rite E $ Ð\Ñ œ,ð Ñ, and thus Theorem 54 says Var $ Ð\Ñ Ð,Ð ÑÑ ÎMÐ Ñ On the face of it, shoing that the C-R loer bound is achieved looks like a general method of shoing that an estimator is UMVUE But the fact is that the circumstances in hich the C-R loer bound is achieved are very ell defined The C-R inequality is essentially the Cauchy-Scharz inequality, and one gets equality in Lemma 51 iff Z œ+y as This translates in the proof of Theorem 54 to equality iff log0ð\ñ ¹ œ+ð Ñ$ Ð\Ñ,Ð Ñ as T Þ œ And, as it turns out, this can happen for all in a parameter iff one is dealing ith a one parameter exponential family of distributions and the statistic $Ð\Ñ is the natural sufficient statistic So the method orks only for UMVUEs of the means of the natural sufficient statistics in one parameter exponential families Result 55 Suppose that for ( real, ÖT ( is dominated by a 5 -finite measure, and ith 2ÐBÑ, 0ÐBÑ ( œ expa+ð( Ñ ( XÐBÑb as a( Then for any ( in the interior of > here Var (XÐ\Ñ _, XÐ\Ñ achieves the C-R loer bound for its variance (A converse of result 55 is true, but is also substantially harder to prove -22-

23 The C-R inequality is Corollary 53 ith 1ÐBÑ œ log 0ÐBÑ There are other interesting choices of For example (assuming it is permissible to move enough derivatives through integrals Ñ, 1ÐBÑ œ Š log 0ÐBÑ Î0 ÐBÑ produces an interesting inequality And the choice 1ÐBÑ œ a0ðbñ 0 ÐBÑbÎ0 ÐBÑ leads to the next result Theorem 56 If T Tß Var $ Ð\Ñ So, (the Chapman-Robbins Inequality ae E $ Ð\Ñ E $ Ð\Ñb 0Ð\Ñ 0 Ð\Ñ 0Ð\Ñ Š Þ Var $ Ð\Ñ sup ' st T T ae E $ Ð\Ñ E $ Ð\Ñb 0Ð\Ñ 0 Ð\Ñ Š 0Ð\Ñ Þ The Fisher Information comes up not only in the present context of information inequalities, but also in the theory of the asymptotics of maximum likelihood estimation It is thus useful to have some more results about it Theorem 57 Suppose that the model c is FI regular at Þ If abß 0ÐBÑ is in fact tice differentiable at and œ ' 0ÐBÑ ÐBÑ can be differentiated tice rt under the integral at, ie both œ ' 0ÐBÑ ¹ ÐBÑ and œ ' 0ÐBÑ ¹ ÐBÑ then œ œ MÐ Ñœ E 0Ð\Ñ log º œ Fact 58 If \ß\ßÞÞÞß\ are independent, \ µt, then \œð\ ß\ßÞÞÞß\Ñ(that has distribution T T â T8 carries Fisher Information MÐ Ñœ MÐ 3 Ñ, here in the obvious ay, MÐ Ñ is the Fisher information in \Þ 3 3 3œ Fact 59 MÐ Ñ doesn't depend upon the dominating measure Fact 60 For the case of E $ Ð\Ñ œ, the C-R inequality says that Var $ Ð\Ñ And MÐ Ñ 8 8M Ð ÑÞ hen \œð\ ß\ßÞÞÞß\Ñhas iid components, this combined ith Fact 58 says that Var $Ð\Ñ There are multiparameter versions of the information inequalities (and for that matter, multiple $ versions involving the covariance matrix of the $ 's The fundamental result -23-

24 needed to produce multiparameter versions of the information inequalities is the simple generalization of Corollary 53 Lemma 61 Suppose that $ Ð]Ñ takes values in e and E Ð$ Ð]ÑÑ _ Suppose further that 1ß1ßÞÞÞß1 5are measurable functions such that E 1Ð] 3 Ñœ and EÐ13Ð]ÑÑ _ a3and the 13 are linearly independent as square integrable functions on k according to the probability measure T Let G» E $ Ð]Ñ1 Ð]Ñ and define a 5 5matrix 3 3 H»aE1 Ð]Ñ1 Ð]Ñb Þ 3 4 Then for GœÐG ßGßÞÞÞßGÑ 5 Var$Ð] Ñ GH G Þ The statistical-looking version of Lemma 61 is next Lemma 61' Suppose that $ Ð\Ñ takes values in e and E Ð$ Ð\ÑÑ _ Suppose further that 1 ß1 ßÞÞÞß1 5 are measurable functions such that E 1 3Ð\Ñ œ and E Ð1 3Ð\ÑÑ _a3and the 1 3 are linearly independent as square integrable functions on k according to the probability measure T Let GÐ 3 Ñ» E $ Ð\Ñ1 3Ð\Ñ and define a 5 5matrix Then for GÐ Ñœ ÐG Ð ÑßGÐ ÑßÞÞÞßGÐÑÑ 5 H Ð Ñ» a 1 Ð\Ñ1 Ð\Ñb Þ E 3 4 Var $ Ð\Ñ GÐ ÑHÐ Ñ GÐ Ñ Þ 5-dimensional versions of Definitions 38 and 39 are next Definition 40 We ill say that the model c ith dominating 5 -finite measure is FI 5 regular e, provided there exists an open neighborhood of, say S, such that i 0ÐBÑ aband a S, ii abß 0ÐBÑ has first order partial derivatives at and iii œ ' 0ÐBÑ ÐBÑ can be differentiated rt to each 3 under the integral at, ie œ ' ` 0ÐBÑ ¹ ÐBÑ ` 3 œ ` Definition 41 If the model c is FI regular at and E Œ` log0ð\ñ ¹ _a3ß then the 5 5matrix 3 ` ` MÐ Ñœ ŒE log0ð\ñ¹ log0ð\ñ ¹ ` œ ` 3 4 œ is called the Fisher Information contained in \ at Þ The multiparameter version of Theorem 57 isnext œ -24-

25 Theorem 62 Suppose that the model c is FI regular at Þ If abß 0ÐBÑ in fact has continuous second order partials in the neighborhood Sß and œ ' 0ÐBÑ ÐBÑ can be differentiated tice under the integral at, ie both œ ' ` 0ÐBÑ ¹ ÐBÑ and œ ' ` 0ÐBÑ ¹ ÐBÑa3and 4, then ` ` 3 4 œ ` 3 œ ` MÐ Ñœ E log 0Ð\Ñ º ` ` 3 4 œ And there is the multiparameter version of the C-R inequality 5 Theorem 63 (Cramer-Rao Suppose that the model c is FI regular e and that MÐ Ñ exists and is nonsingular Þ If E $ Ð\Ñ œ ' $ ÐBÑ0 ÐBÑ ÐBÑ can be differentiated rt to each 3 under the integral at, ie if ` ` E $ Ð\ѹ = ' $ ÐBÑ 0ÐBÑ ¹ ÐBÑ, thenith ` 3 œ ` 3 œ ` ` ` ` E $ ¹ œ ` E $ ¹ œ ` E $ ¹ 5 œ GлРÐ\Ñ ß Ð\Ñ ßÞÞÞß Ð\Ñ Ñ Var $Ð\Ñ GÐ ÑMÐ Ñ GÐ Ñ Þ There are multiple $ versions of this information inequality stuff See Rao, pages 326+ Invariance and Equivariance Unbiasedness (of estimators is one means of restricting a class of decision rules don to a subclass small enough that there is hope of finding a best rule in the subclass Invariance/equivariance considerations is another These notions have to do ith symmetries in a decision problem Invariance means that (some things don't change Equivariance means that (some things change similarly or equally We'll do the case of Location Parameter Estimation first, and then turn to generalities about invariance/equivariance that apply more broadly (to both other estimation problems and to other types of decision problems entirely 5 5 Definition 42 When k œ e e, to say that is a location parameter means that \ has the same distribution a 5 1 Definition 43 When k œ e e, to say that is a 1-dimensional location parameter means that \ (for a 5 -vector of 's has the same distribution a -25-

26 Until further notice suppose œ T œ e Definition 44 PÐ ß+Ñ is called location invariant if PÐ ß+Ñ œ 3 Ð +Ñ If 3ÐDÑ is increasing in D for D and decreasing in D for D, the problem is called a location estimation problem -26- Note that for a -dimensional location parameter, \µt means that Ð\ - ѵT- Þ It therefore seems that a reasonable restriction on estimators $ is that they should satisfy $ ÐB - Ñœ $ ÐBÑ - Ho to formalize this? Definition 45 A decision rule $ is called location equivariant if $ ÐB - Ñœ $ ÐBÑ - Theorem 64 If is a -dimensional location parameter, PÐ ß+Ñ is location invariant and $ is location equivariant, then VÐ$ ß Ñ is constant in (That is, $ is an equalizer rule This theorem says that if I'm orking on a location estimation problem ith a location invariant loss function and restrict attention to location equivariant estimators, I can compare estimators by comparing numbers In fact, as it turns out, there is a fairly explicit representation for the best location equivariant estimator in such cases Definition 46 A function 1 is called location invariant if 1ÐB - Ñœ1ÐBÑ Lemma 65 Suppose that $ is location equivariant Then $ is location equivariant iff a location invariant function? such that $ œ $?Þ b Lemma 66 A function? is location invariant iff? depends on Bonly through CÐBÑ œðb BßB BßÞÞÞßB BÑ Theorem 67 Suppose that PÐ ß+Ñ œ 3 Ð +Ñ is location invariant and b a location equivariant estimator $ for ith finite risk Let 3 ÐDÑ œ 3Ð DÑÞ Suppose that for each Cb a that minimizes E œ 3 $ Ò œcó over choices Then $ ÐBÑ œ $ is a best location equivariant estimator of In the case of PÐ ß+Ñ œð +Ñ the estimator of Theorem 67 is Pittman's estimator, hich can for some situations be ritten out in a more explicit form

27 Theorem 68 Suppose that PÐ ß+Ñ œð +Ñ and the distribution of \œð\ ß\ßÞÞÞß\Ñ 5 has R-N derivative rt to Lebesgue measure on e 5 0ÐBÑ œ0ðb ßB ßÞÞÞßB Ñœ0ÐB Ñ 5 here 0 has second moments Then the best location equivariant estimator of is $ _ ' _ ÐBÑ œ _ ' _ 0ÐB Ñ 0ÐB Ñ if this is ell-defined and has finite risk (Notice that this is the formal generalized Bayes estimator vs Lebesgue measure Lemma 69 For the case of PÐ ß+Ñ œð +Ñ a best location equivariant estimator of must be unbiased for No cancel the assumption œ T œ e The folloing is a more general story about invariance/equivariance Suppose that c œöt is identifiable and (to avoid degeneracies that PÐ ß+Ñ œpð ß+Ña Ê+œ+ Þ Invariance/equivariance theory is phrased in terms of groups of transformations i kpk, p@, and iii TpT, and ho PÐ ß+Ñ and decision rules behave under these groups of transformations The natural starting place is ith transformations on k Definition 47 For 1-1 transformations of k onto k, 1 and 1, the composition of 1 and 1 is another 1-1 onto transformation of k defined by 1 1ÐBÑ œ1ð1 ÐBÑÑ ab kþ Definition 48 For 1a 1-1 transformation of k onto k, if 2 is another 1-1 onto transformation such that 2 1ÐBÑ œbab k, 2 is called the inverse of 1 and denoted as 2œ1 It is then an easy fact that not only is 1 1ÐBÑ œbabß but 1 1 ÐBÑ œbab as ell Definition 49 The identity transformation on k is defined by /ÐBÑ œbabþ Another easy fact is then that / 1œ1 /œ1for any 1-1 transformation of k onto k, 1 (Obvious Result 70 If Z is a class of 1-1 transformations of k onto k that is closed under composition ( 1, 1 Z Ê1 1 Z and inversion ( 1 Z Ê1 Z, then ith the operation (composition Z is a group in the usual mathematical sense -27-

28 A group such as that described in Result 70 ill be called a transformation group Such a group may or may not be Abelian/commutative (one is if 1 1 œ1 1 a1 ß1 Z Jargon/notation: Any class of transformations V can be thought of as generating a group That is, ZV Ð Ñ consisting of all compositions of finite numbers of elements of V and their inverses is a group under the operation To get anyhere ith this theory, e then need transformations that fit together nicely ith those on k At a minimum, one ill need that the transformations on k don't get one outside of c Definition 50 If 1 is a 1-1 transformation of k onto k such that œ c, then the transformation 1 is said to leave the model invariant If each element of a group of transformations Z leaves c invariant, e'll say that the group leaves the model invariant One circumstance in hich Z leaves c invariant is here the model c can be thought of as generated by Z in the folloing sense Suppose that Y has some fixed probability distribution T and consider 1ÐYÑ c œöt l1 Z Þ 1Ð\Ñ If a group of transformations on k (say Z leaves c invariant, there is a natural corresponding group of transformations That is, for 1 Z \µt Ê1Ð\Ñ µt for In fact, becauseof the identifiability assumption, there is only one such So, as a matter of notation, adopt the folloing Definition 51 If Z leaves the model c invariant, for each 1 Z, define p@ by 1Ð Ñœ the element such that \µt Ê1Ð\Ñ µt Þ (According to the notation established in Definition 51, for Z leaving c invariant \µt Ê1Ð\Ñ µt ÞÑ 1Ð Ñ (Obvious Result 71 For Z a group of 1-1 transformations of k onto k leaving c invariant, Z œö1l1 Z ith the operation (composition is a group of 1-1 transformations (I believe that the group Z is the homomorphic image of Z under the transformation so that, for example, 1 œ1 and 1 1 œ

29 We next need transformations on T that fit nicely together ith those on k If that's going to be possible, it ould seem that the loss function ill have to fit nicely ith the transformations already considered Definition 52 A loss function PÐ ß+Ñ is said to be invariant under the group of transformations Z provided that for every 1 Z and + T, b a unique + T such that PÐ ß+Ñ œpð1ð Ñß+Ñ a Þ No then, as another matter of jargon/ notation adopt the folloing Definition 53 If Z leaves the model invariant and the loss PÐ ß+Ñ is invariant, define 1À TpT by 1Ð+Ñ œ the element + of such that PÐ ß+Ñ œpð1ð T Ñß+Ñ a Þ (According to the notation established in Definition 53, for Z leaving c invariantand invariant loss PÐ ß+Ñ, PÐ ß+Ñ œpð1ð Ñß1Ð+ÑÑ ÞÑ (Obvious Result 72 For Z a group of 1-1 transformations of k onto k leaving c invariant and invariant loss PÐ ß+Ñ, Z œö1l1 Z ith the operation (composition is a group of 1-1 transformations of T (I believe that the group Z is the homomorphic image of Z under the transformation µ so that, for example, 1 œ1 and Ð1 µ 1Ñœ1 1 So finally e come to the notion of equivariance for decision functions The hole business is slightly subtle for behavioral rules (see pages of Ferguson so for this elementary introduction, e'll confine attention to nonrandomized rules Definition 54 In an invariant decision problem (one here Z leaves c invariant and PÐ ß+Ñ is invariant a nonrandomized decision rule $k : pt is said to be equivariant provided $ Ð1ÐBÑÑ œ1ð $ ÐBÑÑ ab k and 1 ZÞ (One could make a definition here involving a single 1 or perhaps a 3 element group Z œö1ß1 ß/ and talk about equivariance under 1 instead of under a group I'll not bother The basic elementary theorem about invariance/equivariance is the generalization of Theorem 64 (that cuts don the size of the comparison problem that one faces hen comparing risk functions Theorem 73 If Z leaves c invariant, PÐ ß+Ñ is invariant and $ is an equivariant nonrandomized decision rule, then -29-

30 VÐ$ ß ÑœVÐ1Ð Ñß$ Ñ and 1 Z Definition 55 In an invariant decision problem, b l œ1ð Ñfor some 1 Z is called the orbit containing The set of orbits Ö The distinct orbits are equivalence classes Theorem 73 says that an equivariant nonrandomized decision rule in an invariant decision problem has a risk function that is constant on orbits When looking for good equivariant rules, the problem of comparing risk functions is reduced somehat to comparing risk functions on orbits Of course, that comparison is especially easy hen there is only one orbit Definition 56 A group of 1-1 transformations of a space onto itself is said to be transitive if for any to points in the space there is a transformation in the group taking the first point into the second Corollary 74 If Z leaves c invariant, PÐ ß+Ñ is invariant and Z is transitive then every equivariant nonrandomized decision rule has constant risk Definition 57 In an invariant decision problem, a decision rule $ is said to be a best (nonrandomized equivariant (or MRE minimum risk equivariant decision rule if it is equivariant and for any other nonrandomized equivariant rule $ VÐ$ ß ÑŸVÐ$ ß Ñ a Þ The import of Theorem 73 is that one needs only check for this risk inequality for one from each orbit Maximum Likelihood (Primarily the Simple Asymptotics Thereof Maximum likelihood is a non-decision-theoretic method of point estimation dating at least to Fisher and probably beyond As usual, for an identifiable family of distributions T c œöt dominated by a 5 -finite measure, e let 0 œ Þ Definition 58 A statistic XÀ k p@ is called a maximum likelihood estimator (MLE of if 0XÐBÑÐBÑ œ sup 0ÐBÑ ab k This definition is not ithout practical problems in terms of existence and computation of MLEs 0ÐBÑ can fail to be bounded in, and even hen it is, can fail to have a maximum value as one searches One ay to get rid of the problem of potential lack of boundedness (favored and extensively practiced by Prof Meeker for example is -30-

31 to admit that one can measure only to the nearest something, so that real k discrete (finite or countable so that can be counting measure and 0ÐBÑ œtðöb Ñ are Ðso that one is just maximizing the probability of observing the data in hand ÑÞ Of course, the possibility that the sup is not achieved still remains even hen this device is used (like, for example, hen \µ Binomial Ð8ß:Ñ for : ÐßÑ hen \œ or 8 5 Note that in the common situation e and the 0 are nicely differentiable in, a maximum likelihood estimator must satisfy the likelihood equations ` ` log0ðbñ¹ œ for 3œßßÞÞÞß5 3 œxðbñ Much of the theory connected ith maximum likelihood estimation has to do ith roots of the likelihood equations (as opposed to honest maximizers of 0ÐBÑ Further, much of the attractiveness of the method derives from its asymptotic properties A simple version of the theory (primarily for the 5œ case follos Adopt the folloing notation for the discussion of the asymptotics of maximum likelihood 8 \ represents all information on available through the 8th stage of observation 8 k 8 the observation space for \ (through the 8th stage U8 a 5-algebra on k8 T \ 8 the distribution of \ 8 (dominated by a 5 -finite measure a (fixed parameter space 8 T 0 the R-N derivative \8 8 For this material one really does need the basic probability space ÐHT ß ßTÑ in the background, as it is necessary to have a fixed space upon hich to prove convergence results Often (ie in the iid case e may take ÐkßU ßT \ Ñ T \ to be a 1-dimensional model here the 's are absolutely continuous ith respect to a 5 -finite measure H œ k _ 8 \ œð\ ß\ßÞÞÞß\Ñ 8 for \Ð 3 = Ñœ the 3th coordinate of = 8 k8 œ k T œ U _ T œˆ \ T _ T \ 8 œ ˆ T \ 8 8 œ and 0 ÐB Ñœ 0ÐB Ñ 8 3œ 3 For such a set-up, the usual drill is for appropriate sequences of estimators try to prove consistency and asymptotic normality results Ös 8 8 ÐB Ñ to -31-

Stat 643 Problems. a) Show that V is a sub 5-algebra of T. b) Let \ be an integrable random variable. Find EÐ\lVÑÞ

Stat 643 Problems. a) Show that V is a sub 5-algebra of T. b) Let \ be an integrable random variable. Find EÐ\lVÑÞ Stat 643 Problems 1. Let ( HT, ) be a measurable space. Suppose that., T and Uare 5-finite positive measures on this space with T Uand U...T.T.U a) Show that T. and œ a.s......u.....u.u Š.. b) Show that

More information

Stat 643 Problems, Fall 2000

Stat 643 Problems, Fall 2000 Stat 643 Problems, Fall 2000 Assignment 1 1. Let ( HT, ) be a measurable space. Suppose that., T and Uare 5-finite positive measures on this space ith T Uand U...T.T.U a) Sho that T. and œ a.s......u.....u.u

More information

QUALIFYING EXAMINATION II IN MATHEMATICAL STATISTICS SATURDAY, MAY 9, Examiners: Drs. K. M. Ramachandran and G. S. Ladde

QUALIFYING EXAMINATION II IN MATHEMATICAL STATISTICS SATURDAY, MAY 9, Examiners: Drs. K. M. Ramachandran and G. S. Ladde QUALIFYING EXAMINATION II IN MATHEMATICAL STATISTICS SATURDAY, MAY 9, 2015 Examiners: Drs. K. M. Ramachandran and G. S. Ladde INSTRUCTIONS: a. The quality is more important than the quantity. b. Attempt

More information

DEPARTMENT OF MATHEMATICS AND STATISTICS QUALIFYING EXAMINATION II IN MATHEMATICAL STATISTICS SATURDAY, SEPTEMBER 24, 2016

DEPARTMENT OF MATHEMATICS AND STATISTICS QUALIFYING EXAMINATION II IN MATHEMATICAL STATISTICS SATURDAY, SEPTEMBER 24, 2016 DEPARTMENT OF MATHEMATICS AND STATISTICS QUALIFYING EXAMINATION II IN MATHEMATICAL STATISTICS SATURDAY, SEPTEMBER 24, 2016 Examiners: Drs. K. M. Ramachandran and G. S. Ladde INSTRUCTIONS: a. The quality

More information

EXCERPTS FROM ACTEX CALCULUS REVIEW MANUAL

EXCERPTS FROM ACTEX CALCULUS REVIEW MANUAL EXCERPTS FROM ACTEX CALCULUS REVIEW MANUAL Table of Contents Introductory Comments SECTION 6 - Differentiation PROLEM SET 6 TALE OF CONTENTS INTRODUCTORY COMMENTS Section 1 Set Theory 1 Section 2 Intervals,

More information

Part I consists of 14 multiple choice questions (worth 5 points each) and 5 true/false question (worth 1 point each), for a total of 75 points.

Part I consists of 14 multiple choice questions (worth 5 points each) and 5 true/false question (worth 1 point each), for a total of 75 points. Math 131 Exam 1 Solutions Part I consists of 14 multiple choice questions (orth 5 points each) and 5 true/false question (orth 1 point each), for a total of 75 points. 1. The folloing table gives the number

More information

Dr. H. Joseph Straight SUNY Fredonia Smokin' Joe's Catalog of Groups: Direct Products and Semi-direct Products

Dr. H. Joseph Straight SUNY Fredonia Smokin' Joe's Catalog of Groups: Direct Products and Semi-direct Products Dr. H. Joseph Straight SUNY Fredonia Smokin' Joe's Catalog of Groups: Direct Products and Semi-direct Products One of the fundamental problems in group theory is to catalog all the groups of some given

More information

Product Measures and Fubini's Theorem

Product Measures and Fubini's Theorem Product Measures and Fubini's Theorem 1. Product Measures Recall: Borel sets U in are generated by open sets. They are also generated by rectangles VœN " á N hich are products of intervals NÞ 3 Let V be

More information

A General Overview of Parametric Estimation and Inference Techniques.

A General Overview of Parametric Estimation and Inference Techniques. A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying

More information

ST5215: Advanced Statistical Theory

ST5215: Advanced Statistical Theory Department of Statistics & Applied Probability Wednesday, October 5, 2011 Lecture 13: Basic elements and notions in decision theory Basic elements X : a sample from a population P P Decision: an action

More information

Review of Elementary Probability Theory Math 495, Spring 2011

Review of Elementary Probability Theory Math 495, Spring 2011 Revie of Elementary Probability Theory Math 495, Spring 2011 0.1 Probability spaces 0.1.1 Definition. A probability space is a triple ( HT,,P) here ( 3Ñ H is a non-empty setà Ð33Ñ T is a collection of

More information

Math 131 Exam 4 (Final Exam) F04M

Math 131 Exam 4 (Final Exam) F04M Math 3 Exam 4 (Final Exam) F04M3.4. Name ID Number The exam consists of 8 multiple choice questions (5 points each) and 0 true/false questions ( point each), for a total of 00 points. Mark the correct

More information

You may have a simple scientific calculator to assist with arithmetic, but no graphing calculators are allowed on this exam.

You may have a simple scientific calculator to assist with arithmetic, but no graphing calculators are allowed on this exam. Math 131 Exam 3 Solutions You may have a simple scientific calculator to assist ith arithmetic, but no graphing calculators are alloed on this exam. Part I consists of 14 multiple choice questions (orth

More information

The Rational Numbers

The Rational Numbers The Rational Numbers Fields The system of integers that e formally defined is an improvement algebraically on the hole number system = (e can subtract in ) But still has some serious deficiencies: for

More information

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory Part V 7 Introduction: What are measures and why measurable sets Lebesgue Integration Theory Definition 7. (Preliminary). A measure on a set is a function :2 [ ] such that. () = 2. If { } = is a finite

More information

Testing Statistical Hypotheses

Testing Statistical Hypotheses E.L. Lehmann Joseph P. Romano Testing Statistical Hypotheses Third Edition 4y Springer Preface vii I Small-Sample Theory 1 1 The General Decision Problem 3 1.1 Statistical Inference and Statistical Decisions

More information

Lecture 4. f X T, (x t, ) = f X,T (x, t ) f T (t )

Lecture 4. f X T, (x t, ) = f X,T (x, t ) f T (t ) LECURE NOES 21 Lecture 4 7. Sufficient statistics Consider the usual statistical setup: the data is X and the paramter is. o gain information about the parameter we study various functions of the data

More information

Statistical machine learning and kernel methods. Primary references: John Shawe-Taylor and Nello Cristianini, Kernel Methods for Pattern Analysis

Statistical machine learning and kernel methods. Primary references: John Shawe-Taylor and Nello Cristianini, Kernel Methods for Pattern Analysis Part 5 (MA 751) Statistical machine learning and kernel methods Primary references: John Shawe-Taylor and Nello Cristianini, Kernel Methods for Pattern Analysis Christopher Burges, A tutorial on support

More information

Théorie Analytique des Probabilités

Théorie Analytique des Probabilités Théorie Analytique des Probabilités Pierre Simon Laplace Book II 5 9. pp. 203 228 5. An urn being supposed to contain the number B of balls, e dra from it a part or the totality, and e ask the probability

More information

Stat 8112 Lecture Notes Weak Convergence in Metric Spaces Charles J. Geyer January 23, Metric Spaces

Stat 8112 Lecture Notes Weak Convergence in Metric Spaces Charles J. Geyer January 23, Metric Spaces Stat 8112 Lecture Notes Weak Convergence in Metric Spaces Charles J. Geyer January 23, 2013 1 Metric Spaces Let X be an arbitrary set. A function d : X X R is called a metric if it satisfies the folloing

More information

Math 131, Exam 2 Solutions, Fall 2010

Math 131, Exam 2 Solutions, Fall 2010 Math 131, Exam 2 Solutions, Fall 2010 Part I consists of 14 multiple choice questions (orth 5 points each) and 5 true/false questions (orth 1 point each), for a total of 75 points. Mark the correct anser

More information

The Hahn-Banach and Radon-Nikodym Theorems

The Hahn-Banach and Radon-Nikodym Theorems The Hahn-Banach and Radon-Nikodym Theorems 1. Signed measures Definition 1. Given a set Hand a 5-field of sets Y, we define a set function. À Y Ä to be a signed measure if it has all the properties of

More information

Some Background Material

Some Background Material Chapter 1 Some Background Material In the first chapter, we present a quick review of elementary - but important - material as a way of dipping our toes in the water. This chapter also introduces important

More information

Probability basics. Probability and Measure Theory. Probability = mathematical analysis

Probability basics. Probability and Measure Theory. Probability = mathematical analysis MA 751 Probability basics Probability and Measure Theory 1. Basics of probability 2 aspects of probability: Probability = mathematical analysis Probability = common sense Probability basics A set-up of

More information

MAP METHODS FOR MACHINE LEARNING. Mark A. Kon, Boston University Leszek Plaskota, Warsaw University, Warsaw Andrzej Przybyszewski, McGill University

MAP METHODS FOR MACHINE LEARNING. Mark A. Kon, Boston University Leszek Plaskota, Warsaw University, Warsaw Andrzej Przybyszewski, McGill University MAP METHODS FOR MACHINE LEARNING Mark A. Kon, Boston University Leszek Plaskota, Warsaw University, Warsaw Andrzej Przybyszewski, McGill University Example: Control System 1. Paradigm: Machine learning

More information

SIMULTANEOUS CONFIDENCE BANDS FOR THE PTH PERCENTILE AND THE MEAN LIFETIME IN EXPONENTIAL AND WEIBULL REGRESSION MODELS. Ping Sa and S.J.

SIMULTANEOUS CONFIDENCE BANDS FOR THE PTH PERCENTILE AND THE MEAN LIFETIME IN EXPONENTIAL AND WEIBULL REGRESSION MODELS. Ping Sa and S.J. SIMULTANEOUS CONFIDENCE BANDS FOR THE PTH PERCENTILE AND THE MEAN LIFETIME IN EXPONENTIAL AND WEIBULL REGRESSION MODELS " # Ping Sa and S.J. Lee " Dept. of Mathematics and Statistics, U. of North Florida,

More information

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources STA 732: Inference Notes 10. Parameter Estimation from a Decision Theoretic Angle Other resources 1 Statistical rules, loss and risk We saw that a major focus of classical statistics is comparing various

More information

3 - Vector Spaces Definition vector space linear space u, v,

3 - Vector Spaces Definition vector space linear space u, v, 3 - Vector Spaces Vectors in R and R 3 are essentially matrices. They can be vieed either as column vectors (matrices of size and 3, respectively) or ro vectors ( and 3 matrices). The addition and scalar

More information

Boundary Behavior of Solutions of a Class of Genuinely Nonlinear Hyperbolic Systems by Julian Gevirtz

Boundary Behavior of Solutions of a Class of Genuinely Nonlinear Hyperbolic Systems by Julian Gevirtz Boundary Behavior of Solutions of a Class of Genuinely Nonlinear Hyperbolic Systems by Julian Gevirtz ABSTRACT We study the set of boundary singularities of arbitrary classical solutions of genuinely nonlinear

More information

Inner Product Spaces

Inner Product Spaces Inner Product Spaces In 8 X, we defined an inner product? @? @?@ ÞÞÞ? 8@ 8. Another notation sometimes used is? @? ß@. The inner product in 8 has several important properties ( see Theorem, p. 33) that

More information

Example Let VœÖÐBßCÑ À b-, CœB - ( see the example above). Explain why

Example Let VœÖÐBßCÑ À b-, CœB - ( see the example above). Explain why Definition If V is a relation from E to F, then a) the domain of V œ dom ÐVÑ œ Ö+ E À b, F such that Ð+ß,Ñ V b) the range of V œ ran( VÑ œ Ö, F À b+ E such that Ð+ß,Ñ V " c) the inverse relation of V œ

More information

Stat 643 Review of Probability Results (Cressie)

Stat 643 Review of Probability Results (Cressie) Stat 643 Review of Probability Results (Cressie) Probability Space: ( HTT,, ) H is the set of outcomes T is a 5-algebra; subsets of H T is a probability measure mapping from T onto [0,] Measurable Space:

More information

SIMULATION - PROBLEM SET 1

SIMULATION - PROBLEM SET 1 SIMULATION - PROBLEM SET 1 " if! Ÿ B Ÿ 1. The random variable X has probability density function 0ÐBÑ œ " $ if Ÿ B Ÿ.! otherwise Using the inverse transform method of simulation, find the random observation

More information

F (x) = P [X x[. DF1 F is nondecreasing. DF2 F is right-continuous

F (x) = P [X x[. DF1 F is nondecreasing. DF2 F is right-continuous 7: /4/ TOPIC Distribution functions their inverses This section develops properties of probability distribution functions their inverses Two main topics are the so-called probability integral transformation

More information

Bivariate Uniqueness in the Logistic Recursive Distributional Equation

Bivariate Uniqueness in the Logistic Recursive Distributional Equation Bivariate Uniqueness in the Logistic Recursive Distributional Equation Antar Bandyopadhyay Technical Report # 629 University of California Department of Statistics 367 Evans Hall # 3860 Berkeley CA 94720-3860

More information

3 Integration and Expectation

3 Integration and Expectation 3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ

More information

OUTLINING PROOFS IN CALCULUS. Andrew Wohlgemuth

OUTLINING PROOFS IN CALCULUS. Andrew Wohlgemuth OUTLINING PROOFS IN CALCULUS Andre Wohlgemuth ii OUTLINING PROOFS IN CALCULUS Copyright 1998, 001, 00 Andre Wohlgemuth This text may be freely donloaded and copied for personal or class use iii Contents

More information

Testing Statistical Hypotheses

Testing Statistical Hypotheses E.L. Lehmann Joseph P. Romano, 02LEu1 ttd ~Lt~S Testing Statistical Hypotheses Third Edition With 6 Illustrations ~Springer 2 The Probability Background 28 2.1 Probability and Measure 28 2.2 Integration.........

More information

PART I. Multiple choice. 1. Find the slope of the line shown here. 2. Find the slope of the line with equation $ÐB CÑœ(B &.

PART I. Multiple choice. 1. Find the slope of the line shown here. 2. Find the slope of the line with equation $ÐB CÑœ(B &. Math 1301 - College Algebra Final Exam Review Sheet Version X This review, while fairly comprehensive, should not be the only material used to study for the final exam. It should not be considered a preview

More information

Lecture 12 November 3

Lecture 12 November 3 STATS 300A: Theory of Statistics Fall 2015 Lecture 12 November 3 Lecturer: Lester Mackey Scribe: Jae Hyuck Park, Christian Fong Warning: These notes may contain factual and/or typographic errors. 12.1

More information

Sequence convergence, the weak T-axioms, and first countability

Sequence convergence, the weak T-axioms, and first countability Sequence convergence, the weak T-axioms, and first countability 1 Motivation Up to now we have been mentioning the notion of sequence convergence without actually defining it. So in this section we will

More information

LECTURE NOTES 57. Lecture 9

LECTURE NOTES 57. Lecture 9 LECTURE NOTES 57 Lecture 9 17. Hypothesis testing A special type of decision problem is hypothesis testing. We partition the parameter space into H [ A with H \ A = ;. Wewrite H 2 H A 2 A. A decision problem

More information

UC Berkeley Department of Electrical Engineering and Computer Sciences. EECS 126: Probability and Random Processes

UC Berkeley Department of Electrical Engineering and Computer Sciences. EECS 126: Probability and Random Processes UC Berkeley Department of Electrical Engineering and Computer Sciences EECS 26: Probability and Random Processes Problem Set Spring 209 Self-Graded Scores Due:.59 PM, Monday, February 4, 209 Submit your

More information

Notes on the Unique Extension Theorem. Recall that we were interested in defining a general measure of a size of a set on Ò!ß "Ó.

Notes on the Unique Extension Theorem. Recall that we were interested in defining a general measure of a size of a set on Ò!ß Ó. 1. More on measures: Notes on the Unique Extension Theorem Recall that we were interested in defining a general measure of a size of a set on Ò!ß "Ó. Defined this measure T. Defined TÐ ( +ß, ) Ñ œ, +Þ

More information

STAT 2122 Homework # 3 Solutions Fall 2018 Instr. Sonin

STAT 2122 Homework # 3 Solutions Fall 2018 Instr. Sonin STAT 2122 Homeork Solutions Fall 2018 Instr. Sonin Due Friday, October 5 NAME (25 + 5 points) Sho all ork on problems! (4) 1. In the... Lotto, you may pick six different numbers from the set Ö"ß ß $ß ÞÞÞß

More information

Connectedness. Proposition 2.2. The following are equivalent for a topological space (X, T ).

Connectedness. Proposition 2.2. The following are equivalent for a topological space (X, T ). Connectedness 1 Motivation Connectedness is the sort of topological property that students love. Its definition is intuitive and easy to understand, and it is a powerful tool in proofs of well-known results.

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Definition Suppose M is a collection (set) of sets. M is called inductive if

Definition Suppose M is a collection (set) of sets. M is called inductive if Definition Suppose M is a collection (set) of sets. M is called inductive if a) g M, and b) if B Mß then B MÞ Then we ask: are there any inductive sets? Informally, it certainly looks like there are. For

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

CHAPTER 3 THE COMMON FACTOR MODEL IN THE POPULATION. From Exploratory Factor Analysis Ledyard R Tucker and Robert C. MacCallum

CHAPTER 3 THE COMMON FACTOR MODEL IN THE POPULATION. From Exploratory Factor Analysis Ledyard R Tucker and Robert C. MacCallum CHAPTER 3 THE COMMON FACTOR MODEL IN THE POPULATION From Exploratory Factor Analysis Ledyard R Tucker and Robert C. MacCallum 1997 19 CHAPTER 3 THE COMMON FACTOR MODEL IN THE POPULATION 3.0. Introduction

More information

Infinite Dimensional Vector Spaces. 1. Motivation: Statistical machine learning and reproducing kernel Hilbert Spaces

Infinite Dimensional Vector Spaces. 1. Motivation: Statistical machine learning and reproducing kernel Hilbert Spaces MA 751 Part 3 Infinite Dimensional Vector Spaces 1. Motivation: Statistical machine learning and reproducing kernel Hilbert Spaces Microarray experiment: Question: Gene expression - when is the DNA in

More information

Handout #5 Computation of estimates, ANOVA Table and Orthogonal Designs

Handout #5 Computation of estimates, ANOVA Table and Orthogonal Designs Hout 5 Computation of estimates, ANOVA Table Orthogonal Designs In this hout e ill derive, for an arbitrary design, estimates of treatment effects in each stratum. Then the results are applied to completely

More information

1. (Regular) Exponential Family

1. (Regular) Exponential Family 1. (Regular) Exponential Family The density function of a regular exponential family is: [ ] Example. Poisson(θ) [ ] Example. Normal. (both unknown). ) [ ] [ ] [ ] [ ] 2. Theorem (Exponential family &

More information

Exercises Measure Theoretic Probability

Exercises Measure Theoretic Probability Exercises Measure Theoretic Probability 2002-2003 Week 1 1. Prove the folloing statements. (a) The intersection of an arbitrary family of d-systems is again a d- system. (b) The intersection of an arbitrary

More information

SVM example: cancer classification Support Vector Machines

SVM example: cancer classification Support Vector Machines SVM example: cancer classification Support Vector Machines 1. Cancer genomics: TCGA The cancer genome atlas (TCGA) will provide high-quality cancer data for large scale analysis by many groups: SVM example:

More information

Metric spaces and metrizability

Metric spaces and metrizability 1 Motivation Metric spaces and metrizability By this point in the course, this section should not need much in the way of motivation. From the very beginning, we have talked about R n usual and how relatively

More information

Dr. Nestler - Math 2 - Ch 3: Polynomial and Rational Functions

Dr. Nestler - Math 2 - Ch 3: Polynomial and Rational Functions Dr. Nestler - Math 2 - Ch 3: Polynomial and Rational Functions 3.1 - Polynomial Functions We have studied linear functions and quadratic functions Defn. A monomial or power function is a function of the

More information

Evaluating the Performance of Estimators (Section 7.3)

Evaluating the Performance of Estimators (Section 7.3) Evaluating the Performance of Estimators (Section 7.3) Example: Suppose we observe X 1,..., X n iid N(θ, σ 2 0 ), with σ2 0 known, and wish to estimate θ. Two possible estimators are: ˆθ = X sample mean

More information

Modèles stochastiques II

Modèles stochastiques II Modèles stochastiques II INFO 154 Gianluca Bontempi Département d Informatique Boulevard de Triomphe - CP 1 http://ulbacbe/di Modéles stochastiques II p1/50 The basics of statistics Statistics starts ith

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

An exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form.

An exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form. Stat 8112 Lecture Notes Asymptotics of Exponential Families Charles J. Geyer January 23, 2013 1 Exponential Families An exponential family of distributions is a parametric statistical model having densities

More information

Introduction to Dynamical Systems

Introduction to Dynamical Systems Introduction to Dynamical Systems France-Kosovo Undergraduate Research School of Mathematics March 2017 This introduction to dynamical systems was a course given at the march 2017 edition of the France

More information

ACTEX. SOA Exam P Study Manual. With StudyPlus Edition Samuel A. Broverman, Ph.D., ASA. ACTEX Learning Learn Today. Lead Tomorrow.

ACTEX. SOA Exam P Study Manual. With StudyPlus Edition Samuel A. Broverman, Ph.D., ASA. ACTEX Learning Learn Today. Lead Tomorrow. ACTEX SOA Exam P Study Manual With StudyPlus + StudyPlus + gives you digital access* to: Flashcards & Formula Sheet Actuarial Exam & Career Strategy Guides Technical Skill elearning Tools Samples of Supplemental

More information

From Handout #1, the randomization model for a design with a simple block structure can be written as

From Handout #1, the randomization model for a design with a simple block structure can be written as Hout 4 Strata Null ANOVA From Hout 1 the romization model for a design with a simple block structure can be written as C œ.1 \ X α % (4.1) w where α œðα á α > Ñ E ÐÑœ % 0 Z œcov ÐÑ % with cov( % 3 % 4

More information

Peter Hoff Minimax estimation November 12, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11

Peter Hoff Minimax estimation November 12, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11 Contents 1 Motivation and definition 1 2 Least favorable prior 3 3 Least favorable prior sequence 11 4 Nonparametric problems 15 5 Minimax and admissibility 18 6 Superefficiency and sparsity 19 Most of

More information

y(x) = x w + ε(x), (1)

y(x) = x w + ε(x), (1) Linear regression We are ready to consider our first machine-learning problem: linear regression. Suppose that e are interested in the values of a function y(x): R d R, here x is a d-dimensional vector-valued

More information

ECE 275B Homework # 1 Solutions Version Winter 2015

ECE 275B Homework # 1 Solutions Version Winter 2015 ECE 275B Homework # 1 Solutions Version Winter 2015 1. (a) Because x i are assumed to be independent realizations of a continuous random variable, it is almost surely (a.s.) 1 the case that x 1 < x 2

More information

Consider this problem. A person s utility function depends on consumption and leisure. Of his market time (the complement to leisure), h t

Consider this problem. A person s utility function depends on consumption and leisure. Of his market time (the complement to leisure), h t VI. INEQUALITY CONSTRAINED OPTIMIZATION Application of the Kuhn-Tucker conditions to inequality constrained optimization problems is another very, very important skill to your career as an economist. If

More information

Relations. Relations occur all the time in mathematics. For example, there are many relations between # and % À

Relations. Relations occur all the time in mathematics. For example, there are many relations between # and % À Relations Relations occur all the time in mathematics. For example, there are many relations between and % À Ÿ% % Á% l% Ÿ,, Á ß and l are examples of relations which might or might not hold between two

More information

ECE 275B Homework # 1 Solutions Winter 2018

ECE 275B Homework # 1 Solutions Winter 2018 ECE 275B Homework # 1 Solutions Winter 2018 1. (a) Because x i are assumed to be independent realizations of a continuous random variable, it is almost surely (a.s.) 1 the case that x 1 < x 2 < < x n Thus,

More information

Proofs Involving Quantifiers. Proof Let B be an arbitrary member Proof Somehow show that there is a value

Proofs Involving Quantifiers. Proof Let B be an arbitrary member Proof Somehow show that there is a value Proofs Involving Quantifiers For a given universe Y : Theorem ÐaBÑ T ÐBÑ Theorem ÐbBÑ T ÐBÑ Proof Let B be an arbitrary member Proof Somehow show that there is a value of Y. Call it B œ +, say Þ ÐYou can

More information

Measures. 1 Introduction. These preliminary lecture notes are partly based on textbooks by Athreya and Lahiri, Capinski and Kopp, and Folland.

Measures. 1 Introduction. These preliminary lecture notes are partly based on textbooks by Athreya and Lahiri, Capinski and Kopp, and Folland. Measures These preliminary lecture notes are partly based on textbooks by Athreya and Lahiri, Capinski and Kopp, and Folland. 1 Introduction Our motivation for studying measure theory is to lay a foundation

More information

Chapter 3. Point Estimation. 3.1 Introduction

Chapter 3. Point Estimation. 3.1 Introduction Chapter 3 Point Estimation Let (Ω, A, P θ ), P θ P = {P θ θ Θ}be probability space, X 1, X 2,..., X n : (Ω, A) (IR k, B k ) random variables (X, B X ) sample space γ : Θ IR k measurable function, i.e.

More information

Section 1.3 Functions and Their Graphs 19

Section 1.3 Functions and Their Graphs 19 23. 0 1 2 24. 0 1 2 y 0 1 0 y 1 0 0 Section 1.3 Functions and Their Graphs 19 3, Ÿ 1, 0 25. y œ 26. y œ œ 2, 1 œ, 0 Ÿ " 27. (a) Line through a!ß! band a"ß " b: y œ Line through a"ß " band aß! b: y œ 2,

More information

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n = Spring 2012 Math 541A Exam 1 1. (a) Let Z i be independent N(0, 1), i = 1, 2,, n. Are Z = 1 n n Z i and S 2 Z = 1 n 1 n (Z i Z) 2 independent? Prove your claim. (b) Let X 1, X 2,, X n be independent identically

More information

Math 249B. Geometric Bruhat decomposition

Math 249B. Geometric Bruhat decomposition Math 249B. Geometric Bruhat decomposition 1. Introduction Let (G, T ) be a split connected reductive group over a field k, and Φ = Φ(G, T ). Fix a positive system of roots Φ Φ, and let B be the unique

More information

Bloom Filters and Locality-Sensitive Hashing

Bloom Filters and Locality-Sensitive Hashing Randomized Algorithms, Summer 2016 Bloom Filters and Locality-Sensitive Hashing Instructor: Thomas Kesselheim and Kurt Mehlhorn 1 Notation Lecture 4 (6 pages) When e talk about the probability of an event,

More information

Measurable functions are approximately nice, even if look terrible.

Measurable functions are approximately nice, even if look terrible. Tel Aviv University, 2015 Functions of real variables 74 7 Approximation 7a A terrible integrable function........... 74 7b Approximation of sets................ 76 7c Approximation of functions............

More information

0.2 Vector spaces. J.A.Beachy 1

0.2 Vector spaces. J.A.Beachy 1 J.A.Beachy 1 0.2 Vector spaces I m going to begin this section at a rather basic level, giving the definitions of a field and of a vector space in much that same detail as you would have met them in a

More information

Construction of a general measure structure

Construction of a general measure structure Chapter 4 Construction of a general measure structure We turn to the development of general measure theory. The ingredients are a set describing the universe of points, a class of measurable subsets along

More information

Gene expression experiments. Infinite Dimensional Vector Spaces. 1. Motivation: Statistical machine learning and reproducing kernel Hilbert Spaces

Gene expression experiments. Infinite Dimensional Vector Spaces. 1. Motivation: Statistical machine learning and reproducing kernel Hilbert Spaces MA 751 Part 3 Gene expression experiments Infinite Dimensional Vector Spaces 1. Motivation: Statistical machine learning and reproducing kernel Hilbert Spaces Gene expression experiments Question: Gene

More information

Logic Effort Revisited

Logic Effort Revisited Logic Effort Revisited Mark This note ill take another look at logical effort, first revieing the asic idea ehind logical effort, and then looking at some of the more sutle issues in sizing transistors..0

More information

Inner Product Spaces. Another notation sometimes used is X.?

Inner Product Spaces. Another notation sometimes used is X.? Inner Product Spaces X In 8 we have an inner product? @? @?@ ÞÞÞ? 8@ 8Þ Another notation sometimes used is X? ß@? @? @? @ ÞÞÞ? @ 8 8 The inner product in?@ ß in 8 has several essential properties ( see

More information

Outline of Elementary Probability Theory Math 495, Spring 2013

Outline of Elementary Probability Theory Math 495, Spring 2013 Outline of Elementary Probability Theory Math 495, Spring 2013 1 Þ Probability spaces 1.1 Definition. A probability space is a triple ( H, T,P) here ( 3ÑH is a non-empty setà Ð33ÑT is a collection of subsets

More information

STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN

STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN Massimo Guidolin Massimo.Guidolin@unibocconi.it Dept. of Finance STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN SECOND PART, LECTURE 2: MODES OF CONVERGENCE AND POINT ESTIMATION Lecture 2:

More information

Econometrics I, Estimation

Econometrics I, Estimation Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the

More information

Suggestions - Problem Set (a) Show the discriminant condition (1) takes the form. ln ln, # # R R

Suggestions - Problem Set (a) Show the discriminant condition (1) takes the form. ln ln, # # R R Suggetion - Problem Set 3 4.2 (a) Show the dicriminant condition (1) take the form x D Ð.. Ñ. D.. D. ln ln, a deired. We then replace the quantitie. 3ß D3 by their etimate to get the proper form for thi

More information

Peter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11

Peter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11 Contents 1 Motivation and definition 1 2 Least favorable prior 3 3 Least favorable prior sequence 11 4 Nonparametric problems 15 5 Minimax and admissibility 18 6 Superefficiency and sparsity 19 Most of

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Ð"Ñ + Ð"Ñ, Ð"Ñ +, +, + +, +,,

ÐÑ + ÐÑ, ÐÑ +, +, + +, +,, Handout #11 Confounding: Complete factorial experiments in incomplete blocks Blocking is one of the important principles in experimental design. In this handout we address the issue of designing complete

More information

STATC141 Spring 2005 The materials are from Pairwise Sequence Alignment by Robert Giegerich and David Wheeler

STATC141 Spring 2005 The materials are from Pairwise Sequence Alignment by Robert Giegerich and David Wheeler STATC141 Spring 2005 The materials are from Pairise Sequence Alignment by Robert Giegerich and David Wheeler Lecture 6, 02/08/05 The analysis of multiple DNA or protein sequences (I) Sequence similarity

More information

2. The Concept of Convergence: Ultrafilters and Nets

2. The Concept of Convergence: Ultrafilters and Nets 2. The Concept of Convergence: Ultrafilters and Nets NOTE: AS OF 2008, SOME OF THIS STUFF IS A BIT OUT- DATED AND HAS A FEW TYPOS. I WILL REVISE THIS MATE- RIAL SOMETIME. In this lecture we discuss two

More information

Chapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic

Chapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic Chapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic Unbiased estimation Unbiased or asymptotically unbiased estimation plays an important role in

More information

Fourth Week: Lectures 10-12

Fourth Week: Lectures 10-12 Fourth Week: Lectures 10-12 Lecture 10 The fact that a power series p of positive radius of convergence defines a function inside its disc of convergence via substitution is something that we cannot ignore

More information

Chapter Eight The Formal Structure of Quantum Mechanical Function Spaces

Chapter Eight The Formal Structure of Quantum Mechanical Function Spaces Chapter Eight The Formal Structure of Quantum Mechanical Function Spaces Introduction In this chapter, e introduce the formal structure of quantum theory. This formal structure is based primarily on the

More information

The small ball property in Banach spaces (quantitative results)

The small ball property in Banach spaces (quantitative results) The small ball property in Banach spaces (quantitative results) Ehrhard Behrends Abstract A metric space (M, d) is said to have the small ball property (sbp) if for every ε 0 > 0 there exists a sequence

More information

4 Sums of Independent Random Variables

4 Sums of Independent Random Variables 4 Sums of Independent Random Variables Standing Assumptions: Assume throughout this section that (,F,P) is a fixed probability space and that X 1, X 2, X 3,... are independent real-valued random variables

More information

On Least Squares Linear Regression Without Second Moment

On Least Squares Linear Regression Without Second Moment On Least Squares Linear Regression Without Second Moment BY RAJESHWARI MAJUMDAR University of Connecticut If \ and ] are real valued random variables such that the first moments of \, ], and \] exist and

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem

Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem 56 Chapter 7 Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem Recall that C(X) is not a normed linear space when X is not compact. On the other hand we could use semi

More information