STAT 5380 Advanced Mathematical Statistics I (Lecture Notes Spring 2018) 1. Alex Trindade Department of Mathematics & Statistics Texas Tech University

Size: px
Start display at page:

Download "STAT 5380 Advanced Mathematical Statistics I (Lecture Notes Spring 2018) 1. Alex Trindade Department of Mathematics & Statistics Texas Tech University"

Transcription

1 STAT 5380 Advanced Mathematical Statistics I (Lecture Notes Spring 208) Alex Trindade Department of Mathematics & Statistics Texas Tech University Based primarily on TPE: Theory of Point Estimation, 2nd edition, by E.L. Lehmann and George Casella, Springer (998).

2

3 Contents Chapter. Preliminaries 5.. Conditional Expectation 5.2. Sufficiency 6.3. Exponential Families Convex Loss Function 27 Chapter 2. Unbiasedness UMVU estimators Non-parametric families The Information Inequality Multiparameter Case 47 Chapter 3. Equivariance Equivariance for Location family The General Equivariant Framework Location-Scale Families 7 Chapter 4. Average-Risk Optimality Bayes Estimation Minimax Estimation Minimaxity and Admissibility in Exponential families Shrinkage Estimators and Bigdata 95 Chapter 5. Large Sample Theory Convergence in Probability and Order in Probability Convergence in Distribution Asymptotic Comparisons (Pitman Efficiency) Comparison of sample mean, median and trimmed mean (M-estimation) Chapter 6. Maximum Likelihood Estimation Consistency Asymptotic Normality of the MLE Asymptotic Optimality of the MLE 25 3

4

5 CHAPTER Preliminaries.. Conditional Expectation Definition. Let (X, A, P ) be a probability space. If X L (A, P ) and G is a sub-σfield of A, then E(X G) is a random variable such that (i) E(X G) G (i.e. is G measurable) (ii) E(I G X) = E(I G E(X G)), G G Construction. For X 0, µ(g) = E(I G X) is a measure on G and P (G) = 0 µ(g) = 0, so by the Radon-Nikodym theorem there exists a G-measurable function E(X G) such that µ(g) = E(X G)dP, i.e.(ii) is satisfied. This shows the existence of G E(X + G) and E(X G). Then we define E(X G) = E(X + G) E(X G). Remark... (ii) generalizes to E(Y X) = E(Y E(X G)) Y G such that E Y X <. The conditional probability of A given G is defined for all A A as P (A G) = E(I A G). Remark..2. If X L 2 (A, P ), then E(X G) is the orthogonal projection in L 2 (A, P ) of X onto the closed linear subspace L 2 (G, P ) of L 2 (A, P ) since (i) E(X G) L 2 (G, P ) and (ii) E(Y (X E(X G))) = 0, Y L 2 (G, P ). Conditioning on a Statistic Let X be a r.v. defined on (X, A, P ) with E X < and let T be a measurable function (not necessarily real-valued) from (X, A) into (T, F). (X, A, P ) T (T, F, P T ) Such a T is called a statistic (and is not necessarily real-valued). The σ-field of subsets of X induced by T is σ(t ) = {T S, S F} = T F Definition..3. E(X T ) E(X σ(t )) Recall that a real-valued function f on X is σ(t ) measurable f = g T for some F-measurable g on T, i.e. f(x) = g(t (x)) as shown below.. X T T 5 g R

6 6. PRELIMINARIES This implies that E(X T ) is expressible as E(X T ) = h(t ) for some function h F which is unique a.e. P T. T h X T R Definition..4. E(X t) h(t) Example..5. Suppose (X, T ) has probability density p(x, t) w.r.t. Lebesgue measure on R 2 and E X <. Then E(X σ(t )) = h(t ) where h(t) = E(X T = t) = xp(x,t) dx p(x,t) dx I p T (t)>0(t), a.s. P T. PROOF (i) R.S. is Borel measurable in t (by Fubini) (ii) G σ(t ) G = T F for some F F I G = I F (T ) E(I G E(X σ(t ))) = E(I G X) = I G X dp = xi F (t)p(x, t) dxdt = I F (t)h(t)p T (t) dt = E[I F (T )h(t )] = E[I G h(t )] Properties of Conditional Expectation If T is a statistic, X is the identity function on X and f n, f, g are integrable, then (i) E[af(X) + bg(x) T ] = ae[f(x) T ] + be[g(x) T ] a.s. (ii) a f(x) b a.s. a E[f(X) T ] b a.s. (iii) f n g, f n (x) f(x) a.s. E[f n (X) T ] E[f(X) T ] a.s. (iv) E[E(f(X) T )] = Ef(X). (v) If E h(t )f(x) <, then E[h(T )f(x) T ] = h(t )E[f(X) T ] a.s. (vi) If G and G 2 are sub-σ-fields of G with G G 2, then E[E(X G ) G 2 ] = E(X G 2 )..2. Sufficiency Set up X: random observable quantity (the identity function on (X, A, P)) X : sample space, the set of possible values of X A: σ-algebra of subsets of X P: {P θ, θ Ω} is a family of probability measures on A (distributions of X ) T: X T is an A/F measurable function and T (X) is called a statistic. probability space (X, A, P) X sample space (X, A, P) T (T, F, P T ) We adopt this notation because sometimes we wish to talk about T (X( )) the random variable and sometimes about T (X(x)) = T (x), a particular element of T. We shall also use the notation P (A T (x)) for P (A T = T (x)) and P (A T ) for the random variable P (A T ( )) on X.

7 .2. SUFFICIENCY 7 Definition.2.. The statistic T is sufficient for θ(or P) iff the conditional distribution of X given T = t is independent of θ for all t, i.e. there exists an F measurable P (A T = ) such that P (A T = t) = P θ (A T = t) a.s. P T θ for all A A and all θ Ω. Example.2.2. X = (X,..., X n ) iid with pdf f θ (x) w.r.t. dx P = P θ (dx,..., dx n ) = f θ (x ) f θ (x n ) dx dx n T (X) = (X (),..., X (n) ) The probability mass function of X given T = t is where X (i) is the i th order statistic. X T =t pθ (x t) = δ t (x () ) δ tn (x (n) ) n! i.e. it assigns point mass to each x such that x n! () = t,, x (n) = t n. This is independent of θ, indicating that T contains all the information about θ contained in the sample. The Factorization Criterion Definition.2.3. A family of probability measure s P = {P θ : θ Ω} is equivalent to a p.m. λ if λ (A) = 0 P θ (A) = 0 θ Ω. We also say that P is dominated by a σ-finite measure µ on (X, A) if P θ µ for all θ Ω. It is clear that equivalence to λ implies domination by λ. Theorem.2.4. Let P be dominated by a p.m. λ where λ = c i P θi (c i 0, c i = ). i=0 Then the statistic T (with range (T, F)) is sufficient for P there exists an F- measurable function g θ ( ) such that dp θ (x) = g θ (T (x)) dλ(x) θ Ω. Proof. ( ) Suppose T is sufficient for P. Then P θ (A T (x)) = P (A T (x)) θ. Throughout this part of the proof X will denote the indicator function of a subset of X. The preceding equality then implies that E θ (X T ) = E(X T ) X A, θ.

8 8. PRELIMINARIES Hence for all θ Ω, X A, G σ(t ), we have E θ (I G E(X T )) = E θ (E θ (I G X T )) = E θ (I G X). Set θ = θ i, multiply by c i and sum over i = 0,, 2,..., to get E λ (I G E(X T )) = E λ (I G X) X A, G σ(t ). This implies that E(X T ) = E λ (X T ) X A, and hence E θ (X T ) = E(X T ) = E λ (X T ) X A, θ. Now define g θ (T ( )) to be the Radon-Nikodym derivative of P θ with respect to λ, with both regarded as measures on σ(t ). We know this exists since λ dominates every P θ. We also know it is σ(t ) measurable, so it can be written in the form g θ (T ( )), and we know that E θ (X) = E λ (g θ (T )X) for all X σ(t ). We need to establish however that this last relation holds for all X A. We do this as follows. X A E θ (X) = E θ [E(X T )] = E λ [g θ (T )E(X T )] = E λ [E(g θ (T )X T )] = E λ [E λ (g θ (T )X T )] = E λ [g θ (T )X]. This shows that g θ (T (x)) = dp θ dλ (x) when P θ and λ are regarded as measures on A. ( ) Suppose that for each θ, dp θ (x) = g dλ θ(t (x)) for some g θ. We shall then show that the conditional probability P λ (A t) is a version of P θ (A t) θ. A A, G σ(t ) I A dp θ = P θ (A T ) dp θ G G = P θ (A T )g θ (T ) dλ and G I A dp θ = = = G G G G I A g θ (T ) dλ E λ [I A g θ (T ) T ] dλ E λ [I A T ]g θ (T ) dλ P θ (A T )g θ (T ) = E λ (I A T )g θ (T ) a.s. λ and hence a.s. P θ θ. Also g θ (T ) 0, a.s. P θ, since dp θ = g θ (T ) dλ. Hence P θ (A T ) = E λ (I A T ) = P λ (A T ) a.s. P θ and the R.S. is independent of θ.

9 .2. SUFFICIENCY 9 Theorem.2.5. (Theorem A.4.2 in appendix of TSH ) If P = {P θ, θ Ω} is dominated by a σ-finite measure µ, then it is equivalent to λ = i=0 c ip θi for some countable subcollection P θi P, i = 0,, 2,..., with c i 0 and c i =. Proof. µ is σ finite, A n A with A, A 2,... disjoint, and A i = X such that 0 < µ(a i ) <, i =, 2,.... Set µ (A) = i= µ(a A i ) 2 i µ(a i ) Then, µ is a probability measure equivalent to µ. Hence we can assume without loss of generality that the dominating measure µ is a probability measure Let and set Then f θ = dp θ dµ S θ = {x: f θ (x) > 0} (.2.) P θ (A) = P θ (A S θ ) = 0 iff µ(a S θ ) = 0. (Since P θ µ and since µ(a S θ ) > 0, f θ > 0 on A S θ P θ (A S θ ) > 0.) A set A A is a kernel if A S θ for some θ; a finite or countable union of kernels is called a chain. Set α = sup µ(c) chains C Then α = µ(c) for some chain C = n=a n, A n S θn. (since {C n } such that µ(c n ) α and for this sequence µ( C n ) = α.) It follows from the following Lemma that P is dominated by λ( ) = n= P 2 n θn ( ). Since it is obvious that λ(a) = 0 P θn (A) = 0 n P θ (A) = 0 θ (by the Lemma), P θ (A) = 0 θ λ(a) = 0 Hence P is equivalent to λ( ) = n= P 2 n θn ( ). Lemma.2.6. If {θ n } is the sequence used in the construction of C, then {P θ, θ Ω} is dominated by {P θn, n =, 2,...}, i.e. P θn (A) = 0 n P θ (A) = 0 θ TSH stands for Testing Statistical Hypotheses, Lehmann & Romano, 3rd ed., Springer, 2005.

10 0. PRELIMINARIES Proof. P θn (A) = 0 n µ(a S θn ) = 0 n (by.2.) (C S θn ) µ(a C) = 0 (Pθ µ) P θ (A C) = 0 θ If P θ (A) > 0 for some θ then, since P θ (A) = P θ (A C) + P θ (A C c ), P θ (A C c ) = P θ (A C c S θ ) > 0 A C c S θ is a kernel disjoint from C C (A C c S θ ) is a chain with µ > α, (P θ (A) > 0 µ(a) > 0) contradicting the definition of α. Hence, P θ (A) = 0 θ. Theorem.2.7. The Factorization Theorem Let µ be a σ-finite measure which dominates P = {P θ : θ Ω} and let p θ = dp θ dµ. Then the statistic T is sufficient for P if and only if there exists a non negative F- measurable function g θ : T R and an A-measurable function h : X R such that (.2.2) p θ (x) = g θ (T (x)) h (x) a.e. µ. Proof. By theorem.2.5, P is equivalent to λ = i c i P θi, where c i 0, i c i =. If T is sufficient for P, p θ (x) = dp θ (x) dµ (x) = dp θ (x) dλ (x) dλ (x) dµ (x) = g θ (T (x)) h (x) by theorem.2.4. On the other hand, if equation (.2.2) holds, (.2.3) dλ (x) = c i dp θi (x) = c i p θi (x) dµ(x) = c i g θi (T (x)) h (x) dµ (x) i= = K (T (x)) h (x) dµ (x).

11 .2. SUFFICIENCY Thus, dp θ (x) = p θ (x) dµ (x) by the definition of p θ (x) = g θ (T (x)) h (x) dλ (x) by equations (.2.2) and (.2.3) K (T (x)) h (x) = g θ (T (x)) dλ (x) where g θ (T (x)) := 0 if K (T (x)) = 0. Hence T is sufficient for P by theorem.2.4. Remark.2.8. If f θ (x) is the density of X with respect to Lebesgue measure then T is sufficient for P iff f θ (x) = g θ (T (x)) h (x) where h is independent of θ. Example.2.9. Let X, X 2,..., X n be iid N (µ, σ 2 ), µ R, σ > 0, and write X = (X, X 2,, X n ). A σ-finite dominating measure on B n is Lebesgue measure with ( ) n p µ,σ 2 (x) = ( ) n exp x 2 σ 2π 2σ 2 i + µ xi nµ2 σ 2 2σ 2 ( = g µ,σ 2 xi, ) x 2 i. Therefore T (X) = ( X i, X 2 i )is sufficient for P = {P µ,σ 2}. Remark.2.0. T (X) = ( X, ) S 2 is also sufficient for P = {P µ,σ 2}, since ( g µ,σ 2 xi, ) x 2 i = g ( x, ) µ,σ S 2 2 T and T are equivalent in the following sense. Definition.2.. Two statistics T and S are equivalent if they induce the same σ-algebra up to P-null sets. i.e. if there exists a P-null set N and functions f and g such that T (x) = f (S (x)) and S (x) = g (T (x)) for all x N c. Example.2.2. Let X,..., X n be iid U(0, θ), θ > 0 and X = (X,..., X n ). p θ (x) = n I θ n [0, ) (x i )I (,θ] (x i ) = θ n I [0, )(x () )I (,θ] (x (n) ) = g θ (x (n) )h(x) T (X) = X (n) is sufficient for θ.

12 2. PRELIMINARIES Example.2.3. X,..., X n iid N(0, σ 2 ), Ω = {σ 2 : σ 2 > 0}. Define T (X) = (X,..., X n ) T 2 (X) = (X 2,..., X 2 n) T 3 (X) = (X X 2 m, X 2 m+ + + X 2 n) T 4 (X) = X Xn 2 p θ (x) = (σ exp ( 2π) n 2σ 2 n Xi 2 ) Each T i (X) is sufficient. However σ(t 4 ) σ(t 3 ) σ(t 2 ) σ(t ). (since functions of T 4 are functions of T 3, functions of T 3 are functions of T 2 and functions of T 2 are functions of T.) Remark.2.4. If T is sufficient for θ and T = H (S) where S is some statistic, then S is also sufficient since p θ (x) = g θ (T (x)) h (x) = g θ (H (S (x)) h (x) Since σ (T ) = S H B T S B S ((X, A) S (S, B S ) H (T, B T )), T provides a greater reduction of the data than S, strictly greater unless H is one to one, in which case S and T are equivalent. Definition.2.5. T is a minimal sufficient statistic, if for any sufficient statistic S, there exists a measurable function H such that T = H (S) a.s. P. Theorem.2.6. If P is dominated by a σ-finite measure µ, then the statistic U is sufficient iff for every fixed θ and θ 0, the ratio of the densities p θ and p θ0 with respect to µ, defined to be when both densities are zero, satisfies p θ (x) p θ0 (x) = f θ,θ 0 (U (x)) a.s. P for some measurable f θ,θ0. Proof. HW problem (TPE Ch Problem 6.6). Theorem.2.7. Let P be a finite family with densities {p 0, p,..., p k }, all having the same support (i.e. S = {x: p i (x) > 0} is independent of i). Then ( p (x) T (x) = p 0 (x), p 2 (x) p 0 (x),, p ) k (x) p 0 (x) is minimal sufficient. (Also true for a countable collection of densities with no change in the proof.)

13 .2. SUFFICIENCY 3 Proof. First T is sufficient by theorem (.2.6) since p i(x) p j is a function of T (x) for (x) all i and j (need common support here.) If U is a sufficient statistic then by theorem (.2.6), p i (x) p 0 (x) is a function of U for each i T is a function of U T is minimal sufficient. Remark.2.8. The theorem.2.7 extends to uncountable collections under further conditions. Theorem.2.9. Let P be a family with common support and suppose P 0 P. If T is minimal sufficient for P 0 and sufficient for P, then T is minimal sufficient for P. Proof. Remark U is sufficient for P U is sufficient for P 0 by Definition.2.. T is minimal sufficient for P 0 T (x) = H(U(x)) a.s. P 0. But since P has common support, T (x) = H(U(x)) a.s. P. () Minimal sufficient statistics for uncountable families P can often be obtained by combining the above theorems. (2) Minimal sufficient statistics exist under weak assumptions (but not always). In particular they exist if (X, A) = (R n, B n ) and P is dominated by a σ-finite measure. Example.2.2. P 0 : (X,..., X n ) iid N(θ, ), θ {θ 0, θ }. P : (X,..., X n ) iid N(θ, ), θ R. p θ (x) p θ0 (x) { = exp [ (xi θ ) 2 ] } (x i θ 0 ) 2 2 { = exp [ ] } 2xi (θ 0 θ ) + nθ 2 nθ0 2 2 This is a function of x, hence X is minimal sufficient for P 0 by Theorem.2.7. Since X is sufficient for P (by the factorization theorem), X is minimal sufficient for P. Example P : (X,..., X n ) iid U(0, θ), θ > 0. Show that X (n) is minimal sufficient (This is part of problem.6.6 for which you will need to use problem.6.).

14 4. PRELIMINARIES Example Logistic P : (X,..., X n ) iid L(θ, ), θ R. P 0 : (X,..., X n ) iid L(θ, ), θ {0, θ,..., θ n }. p θ (x) = exp [ (x i θ)] n i= { + exp [ (x i θ)]} 2, where so T = (T (X),..., T n (X)) is minimal sufficient, T i (x) = p θ i (x) p 0 (x) = enθ i n j= ( + e x j ) 2 ( + e (x j θ i) ) 2. We will show that T (X) is equivalent to (X (),..., X (n) ), by showing that T (x) = T (y) x () = y (),, x (n) = y (n). Proof. ( ) Obvious from the expression for T i (x). ( ) Suppose that T i (x) = T i (y) for i =, 2,..., n, i.e. i.e. n j= n j= ( + e x j ) 2 ( + e (x j θ i) ) = n ( + e y j ) 2 2 ( + e (y j θ i), i =,..., n, ) 2 + u j ω + u j = n j= j= + v j ω + v j, ω = ω,..., ω n, where u j = e x j, v j = e y j and ω i = e θ i. Here we have two polynomials in ω of degree n which are equal for n + distinct values,, ω,..., ω n, of ω and hence for all ω. ω = 0 n ( + u j ) = j= n ( + u j ω) = j= n ( + v j ) j= n ( + v j ω) ω j= the zero sets of both these polynomials are the same x and y have the same order statistics. By theorem.2.7, the order statistics are therefore minimal sufficient for P 0. They are also sufficient for P, so by theorem.2.9, the order statistics are minimal sufficient for P. There is not much reduction possible here! This is fairly typical of location families, the normal, uniform and exponential distributions providing happy exceptions.

15 .2. SUFFICIENCY 5 Ancillarity Definition A statistic V is said to be ancillary for P if the distribution, P V θ, of V does not depend on θ. It is called first order ancillary if E θ V is independent of θ. Example In example.2.23, X (2) X () is ancillary since Y = X θ,..., Y n = X n θ are iid P 0, and X (2) X () = Y (2) Y (). since Example P : (X,..., X n ) iid N(θ, ), θ R. S 2 = (X i X) 2 is ancillary S 2 = (Y i Ȳ )2 where Y i = X i µ, i =, 2,..., are iid N(0, ). Remark Ancillary statistics by themselves contain no information about θ, however minimal sufficient statistics may contain ancillary components. For example, in.2.23, T = (X (),, X (n) ) is equivalent to T = (X (), X (2) X (),, X (n) X () ), whose last (n ) components are ancillary. You can t drop them as X () is not even sufficient. Complete Statistic A sufficient statistic should bring about the best reduction of the data if it contains as little ancillary material as possible. This suggests requiring that no non-constant function of T be ancillary, or not even first order ancillary, i.e. that or equivalently that E θ f (T ) = c for all θ Ω f (T ) = c a.s. P E θ f (T ) = 0 for all θ Ω f (T ) = 0 a.s. P. Definition A statistic T is complete if (.2.4) E θ f (T ) = 0 for all θ Ω f (T ) = 0 a.s. P T is said to be boundedly complete if equation (.2.4) holds for all bounded measurable functions f. Since complete sufficient statistics are intended to give a good reduction of the data, it is not unreasonable to expect them to minimal. We shall prove a slightly weaker result. Theorem Let U be a complete sufficient statistic. If there exists a minimal sufficient statistic, then U is minimal sufficient. Proof. Let T be a minimal sufficient statistic and let ψ be a bounded measurable function. We will show that ψ(u) σ(t ) i.e. E(ψ(U) T ) = ψ(u) a.s.

16 6. PRELIMINARIES Now E(ψ(U) T ) = g(u) for some measurable g since T is minimal and U is sufficient. Let h(u) = E(ψ(U) T ) ψ(u), then E θ h(u) = 0 θ so h(u) = 0 a.s. P since U is complete. Hence ψ(u) = E(ψ(U) T ) σ(t ). Hence U-measurable bounded functions are T -measurable, i.e. σ(u) σ(t ), i.e. U is minimal sufficient. Remark () If P is dominated by a σ-finite measure and (X, A) = (R n, B n ), the existence of a minimal sufficient statistic does not need to be assumed. (2) A minimal sufficient statistic is not necessarily complete. See the next example. Example.2.3. P = {N(θ, θ 2 ), θ > 0} p θ (x) = θ (x θ) 2 2π e 2 θ 2 = θ 2π e 2 ( x θ )2 The single observation X is minimal sufficient but not complete since E θ [I (0, ) (X) Φ()] = P θ (X > 0) Φ() = 0 however P θ (I (0, ) (X) Φ() = 0) = 0 θ. Theorem (Basu s theorem) If T is complete and sufficient for P, then any ancillary statistic is independent of T. Proof. If S is ancillary, then P θ (S B) = p B, independent of θ. Sufficiency of T P θ (S B T ) = h(t ), independent of θ. E θ (h(t ) p B ) = 0 h(t ) = p B a.s. P S is independent of T by completeness.3. Exponential Families. Definition.3.. A family of probability measure s {P θ : θ Ω} is said to be an s-parameter exponential family if there exists a σ-finite measure µ such that ( s ) p θ (x) = dp θ (x) dµ (x) = exp η i (θ) T i (x) B (θ) h (x), where η i, T i and B are real-valued. Remark.3.2. () P θ, θ Ω are equivalent (since {x: p θ (x) > 0} is independent of θ). θ

17 .3. EXPONENTIAL FAMILIES. 7 (2) The factorization theorem implies that T = (T,, T s ) is sufficient. (3) If we observe X,..., X n, iid with marginal distributions P θ then n j= T (X j) is sufficient for θ. Theorem.3.3. If {, η,..., η s } is LI, then T = (T,..., T s ) is minimal sufficient. (Linear independence of {, η,..., η s } means c η (θ) + + c s η s (θ) + d = 0 θ c = = c s = d = 0. Equivalently we can say that {η i } is affinely independent or AI since the set of points {(η (θ),..., η s (θ)), θ Ω} then lie in a proper affine subspace of R s.) (.3.) Proof. Fix θ 0 Ω and consider { s } dp θ (x) = p θ(x) dp θ0 p θ0 (x) = exp {B(θ 0) B(θ)} exp (η i (θ) η i (θ 0 )) T i (x). If {, η,..., η s } is LI then so is {, η η (θ 0 ),..., η s η s (θ 0 )}. Set S = {(η (θ) η (θ 0 ),..., η s (θ) η s (θ 0 )), θ Ω} R s. subspace of R s. Then span(s) is a linear If dim(span(s)) < s, then there exists a non-zero vector v = (v,..., v s ) s.t. v (η (θ) η (θ 0 )) + + v s (η s (θ) η s (θ 0 )) = 0 θ contradicting the linear independence of {, η i η i (θ 0 )}. Hence (.3.2) dim(span(s)) = s i.e. θ,..., θ s Ω s.t. {(η (θ i ) η (θ 0 ),, η s (θ i ) η s (θ 0 )), i =,, s} is LI. From.3., s j= (η j (θ i ) η j (θ 0 ))T j (x) = ln p θ i (x) p θ0 (x) + (B(θ i) B(θ 0 ))i =,..., s. Since the matrix [η j (θ i ) η j (θ 0 )] s i,j= is non-singular, T j (x) can be expressed uniquely in terms of ln p θ i (x) p θ0, i =,..., s. (x) But p θ i (x), i =,..., s is minimal sufficient for P p θ0 (x) 0 = {P θj, j = 0,,, s} by theorem.2.7. Hence T is minimal sufficient by theorem.2.9.

18 8. PRELIMINARIES Example.3.4. p θ (x) = θ 2π exp{ 2 θx2 + θx θ 2 }. η (θ) = 2 θ, η 2(θ) = θ, T (x) = (x 2, x) is sufficient but not minimal θ since rewriting the model as p θ (x) = 2π exp{ 2 θ(x )2 }, we see that T (x) = (x ) 2 is minimal sufficient. Remark.3.5. The exponential family can always be rewritten in such a way that the functions {T i } and {η i } are AI. If there exist constants c,..., c s, d, not all zero, such that c T (x) + + c s T s (x) = d a.s. P then one of the T i s can be expressed in terms of the others (or is constant). After reducing the number of functions T i as far as possible, the same can be done with their coefficients until the new functions {T i } and {η i } are AI. Definition.3.6. (Order of the exponential family.) If the functions {T i, i =,..., s} on X and {η i, i =,..., s} on Ω are both AI, then s is the order of the exponential family ( s ) p θ (x) = dp θ (x) = exp η i (θ) T i (x) B (θ) h (x). dµ Proposition.3.7. The order is well-defined. Proof. We shall show that s + = dim(v ) where V is the set of functions on X defined by V = span{, ln dp θ dp θ0 ( ), θ Ω} (independent of the dominating measure µ and the choice of {η i }, {T i }). ln dp θ dp θ0 (x) = s (η i (θ) η i (θ 0 ))T i (x) + B(θ 0 ) B(θ) i= so that V span{, T i ( ), i =,..., s} dim(v ) s + On the other hand, since {, η i, i =,..., s} is LI, each T j (x) can be expressed as a linear combination of, ln dp θ i dp θ0 (x), i =,..., s, as in the proof of the previous theorem, span{, T i ( ), i =,..., s} V s + dim(v )

19 .3. EXPONENTIAL FAMILIES. 9 Definition.3.8. (Canonical Form) For any s-parameter exponential family (not necessarily of order s) we can view the vector η(θ) = (η (θ),..., η s (θ)) as the parameter rather than θ. Then the density with respect to µ can be rewritten as s p(x, η) = exp[ η i T i (x) A(η)]h(x), η η(ω). Since p(, η) is a probability density with respect to µ, (.3.3) e A(η) = e s η it i (x) h(x)dµ(x). i= Definition.3.9. (The Natural Parameter Set) This is a possibly larger set than {η(θ), θ Ω}. It is the set of all s-vectors for which, by suitable choice of A(η), p(, η) can be a probability density, i.e. N = {η = (η,, η s ) R s : e s η it i (x) h(x)dµ(x) < } Theorem.3.0. N is convex. Proof. Suppose α = (α,..., α s ) and β = (β,..., β s ) N. Then, e p s α it i (x)+( p) s β it i (x) h(x) dµ(x) [ e p [ α i T i (x) h(x) dµ(x)] + e β i T i (x) h(x) dµ(x)] p (Holder s Inequality) < Theorem.3.. T = (T,, T s ) has density p η (t) = exp (η t A (η)) relative to ν = µ T where d µ (x) = h (x) dµ (x). Proof. If f:t R is a bounded measurable function, Ef(T ) = f(t (x))e η T (x) e A(η) d µ(x) = f(t)e η t e A(η) d µ T (t) Definition.3.2. The family of densities p η (t) = exp (η t A (η)), η η(ω), n is called an s-dimensional or s-parameter standard exponential family. (Defined on R s, not X.)

20 20. PRELIMINARIES Theorem.3.3. Let {p η (x)} be the s-parameter exponential family, ( s ) p η (x) = exp η i (θ) T i (x) B (θ) h (x)), η η(ω), and suppose (.3.4) i= φ (x) e s η jt j (x) dµ (x) exists and is finite for some φ and all η j = a j + ib j such that a N (=natural parameter space). Then (i) φ (x) e s η jt j (x) dµ (x) is an analytic function of each η i on {η : R (η) int (N )} and (ii) the derivative of all orders with respect to the η i s of φ (x) e s η jt j (x) dµ (x) can be computed by differentiating under the integral sign. Proof. Let a 0 = (a 0,..., a 0 s) be in int(n ) and let η 0 = a 0 + ib 0. Then φ(x)e s 2 η jt j (x) = h (x) h 2 (x) + i(h 3 (x) h 4 (x)) where h and h 2 are the positive and negative parts of the real part and h 3 and h 4 are the positive and negative parts of the imaginary part. Then φ (x) e s η jt j (x) dµ (x) can be expressed as e η T (x) dµ (x) e η T (x) dµ 2 (x) + i e η T (x) dµ 3 (x) i e η T (x) dµ 4 (x), where dµ i (x) = h i (x) dµ(x), i =,..., 4. Hence it suffices to prove (i) and (ii) for ψ(η ) = e η T (x) dµ(x). Since a 0 int(n ), there exists δ > 0 s.t. ψ(η ) exists and is finite for all η with a a 0 < δ. Now consider the difference quotient ψ(η ) ψ(η 0 ( ) ) = e η0 η η 0 T (x) e(η η 0)T (x) µ(dx) with η η η 0 η 0 < δ/2. Observe that e zt = (zt) j j! zt j = e zt j! zt e zt ezt t e zt z

21 .3. EXPONENTIAL FAMILIES. 2 The integrand in (*) is therefore bounded in absolute value by T (x) e (a0 + δ 2 ) T(x), where a 0 = Re(η) 0 and T (x) e (a0 + δ 2 ) T(x) µ(dx) < since T e δ 4 T }{{} e(a0 + 3δ 4 )T }{{} if T > 0 T e (a0 + δ 2 ) T = bounded integrable {}}{{}}{ T e δ 4 T e (a0 + δ 4 )T if T < 0 (independent of η ). Letting η η 0 in (*) and using the dominated convergence theorem therefore gives (.3.5) φ (η) 0 = T (x)e η0 T(x) µ(dx), where the integral exists and is finite η 0 which is the first component of some η 0 for which Re(η 0 ) N. Applying the same argument to (.3.5) which we applied to (.3.4) existence of all derivatives (i) and (ii). Theorem.3.4. For an exponential family of order s in canonical form and η int (N ), where N is the natural parameter space, ( ) T (i) E η (T ) = A = A η η,, A η s, and (ii) Cov η (T ) = 2 A = η η T [ 2 A η i η j ] s i,j=. so Proof. From theorem.3. e A(η) = e η t ν(dt) = (i) A η i e A(η) = T i (x)e η T (x) h(x)µ(dx) whence E η T i = A (ii) η i. e η T (x) h(x)µ(dx) 2 A η i η j e A(η) + A A η i η j e A(η) = T i (x)t j (x)e η T (x) h(x)µ(dx) i.e. 2 A η i η j = E η (T i T j ) E η (T i )E η (T j ) = Cov η (T i, T j ) Higher order moments of T,, T s are frequently required, e.g. α r r s = E(T r Ts rs ) µ r r s = E[(T E(T )) r (T s E(T s )) rs ]

22 22. PRELIMINARIES etc. These can often be obtained readily from the MGF: M T (u,, u s ) := E(e u T + +u st s ) If M T exists in some neighborhood of 0 ( u 2 i < δ), then all the moments α r,,r s exist and are the coefficients in the power series expansion r u u rs s M T (u,, u s ) = α r,,r s r r,...,r! r s! s The cumulant generating function, CGF, is sometimes more convenient for calculations, especially in connection with sums of independent random vectors. The CGF is defined as K T (u,, u s ) := log M T (u,, u s ). If M T exists in a neighborhood of 0, then so does K T and u r u rs s K T (u,, u s ) = K r r s r! r s!, r,,r s=0 where the coefficients K r r s are called the cumulants of T. The moments and cumulants can be found from each other by formal comparison of the two series. Theorem.3.5. If X has the density s p η (x) = exp [ η i T i (x) A(η)]h(x) i= w.r.t some σ-finite measure µ, then for any η int(n ) the MGF and CGF of T exist in a neighborhood of 0 and Proof. HW problem. K T (u) = A(η + u) A(η) M T (u) = e A(η+u) A(η) Summary on Exponential Families. The family of probability measures {P θ } with densities relative to some σ-finite measure µ, (.3.6) p θ (x) = dp s θ dµ (x) = exp{ η i (θ)t i (x) B(θ)}h(x), θ Ω, is an s-parameter exponential family By redefining the functions T i ( ) and η i ( ) if necessary, we can always arrange for both sets of functions to be affinely independent. The number of summands in the exponent is then the order of the exponential family.

23 .3. EXPONENTIAL FAMILIES. 23 If {, η,..., η s } and {, T,..., T s } are both L.I., then the family is said to be minimal and s = dim(span{, log dp θ ( ), θ Ω}) dp θ0 = order of the exponential family Remark.3.6. Since (.3.6) is by definition a probability density w.r.t. µ for each θ Ω, we have { } exp ηi (θ)t i (x) B(θ) h(x)µ(dx) = { } exp B(θ) = exp ηi (θ)t i (x) h(x)µ(dx) which shows that the dependence of B on θ is through η(θ) = (η (θ),..., η s (θ)) only, i.e. B(θ) = A(η(θ)). Remark.3.7. The previous note implies that each member of the family (.3.6) is a member of the family. s (.3.7) π ξ (x) = exp{ ξ i T i (x) A(ξ)}h(x), ξ = (ξ,..., ξ s ) η(ω) (in fact p θ (x) = π η(θ) (x)). The family of densities {π ξ, ξ η(ω)} defined by (.3.7) is the canonical family associated with (.3.6). It is the same family parameterized by the natural parameter, ξ =vector of coefficients of T i (x), i =,..., s. Remark.3.8. Instead of restricting ξ to the set η(ω), it is natural to extend the family (.3.7) to allow all ξ R s for which we can choose a value of A(ξ) to make (.3.7) a probability density, i.e. for which (.3.8) exp{ ξ i T i (x)}h(x)µ(dx) < N = {ξ R s : (.3.8) holds} is the natural parameter space of the family (.3.7). Remark.3.9. N η(ω) since (.3.7) is by definition a family of probability densities. Definition (Full rank family) As with the original parameterization, we can always redefine ξ to ensure that {T,..., T s } is A.I. If η(ω) contains an s-dimensional rectangle and {T ( ),..., T s ( )} is A.I., then T is minimal sufficient and we say the family (.3.7) is of full rank. (A full rank family is clearly minimal.)

24 24. PRELIMINARIES Remark.3.2. Since N η(ω), full rank int(n ) φ and this is important in view of the consequence of theorem.3.3 that s e A(ξ) = exp( ξ i T i (x))h(x)µ(dx) i= is analytic in each ξ i on the set of s-dimensional complex vectors, ξ : Re(ξ) int(n ). (So derivatives of e A(ξ) w.r.t. ξ i, i =,..., s of all orders can be obtained by differentiation under the integral, yielding explicit expressions for the moments of T for all values of the canonical parameter vector ξ int(n ).) Example Multinomial X M(θ 0,..., θ s ; n) = (X 0,..., X s ), where X i = number of outcomes of type i in n independent trials where θ i, i = 0,..., s, is the probability of an outcome of type i on any one trial. Ω = {θ : θ 0 0,, θ s 0, θ θ s = } () Probability density with respect to counting measure on Z s+ + n! s p θ (x) = x 0! x s! θx 0 0 θs xs I [0, n] (x i )I {n} ( x i ) = exp{ i=0 s x i log θ i }h(x), θ Ω. i=0 This is an (s + )-parameter exponential family with T i (x) = x i, η i (θ) = log θ i. The vectors η(θ), θ Ω, are not confined to a proper affine subspace of R s, so T is minimal sufficient. (2) {T 0,..., T s } is not A.I. since T T s = n. Setting T 0 (x) = x 0 = n x x n gives p θ (x) = h(x) exp{n log θ 0 + s i= x i log θ i θ 0 } Redefining η(θ) = (log θ θ 0,, log θs θ 0 ), we now have an s-parameter representation in which {T,..., T s } is A.I., since the vectors (x,, x s ), x X, are subject only to the constraints x i 0 and s i= x i n. (3) Furthermore the new parameter vectors, η(θ) = (log θ θ 0,, log θs θ 0 ), θ Ω, are not confined to any proper affine subspace of R s, since for any x R s θ 0,..., θ s such that η(θ) = x and so η(ω) = R s. Hence T (x) = (x,..., x s ) is minimal sufficient for P and the order of the family is s. (4) The canonical representation of the family (2) is π ξ (x) = exp{ s ξ i x i A(ξ)}h(x), ξ η(ω) = {(log θ θ 0,, log θ s θ 0 ): θ Ω}

25 .3. EXPONENTIAL FAMILIES. 25 We know from remark.3.6 before that B(θ) = A(η(θ)) for some function A( ). Although it is not necessary, we can verify this directly in this example, since from the representation (2) we have and B(θ) = n log θ 0 θ 0 = θ θ s θ 0 = + θ θ θ s θ 0 = + e η (θ) + + e ηs(θ) A(ξ) = n log( + e ξ + + e ξs ) A(ξ) is of course also determined by e A(ξ) = B(θ) = n log( + e η (θ) + + e ηs(θ) ) exp{ s ξ i x i }h(x)dµ(x). (5) The natural parameter space in this case is N = R s, since we know that N η(ω) and η(ω) = R s by (3) above. Clearly N contains an s-dimensional rectangle and {T,..., T s } is A.I., hence {π ξ (x), ξ N } is of full rank. (6) Moments of T (X) = (X,..., X s ) Theorem.3.4 E ξ T i = A ξ R s ξ i ne ξ i = + e ξ + + e ξ s nθ i /θ 0 = + θ θ θs θ 0 = nθ i and Cov(T i, T j ) = 2 A ξ i ξ j { = ne ξ i (+ +e ξs ) (Moments exist ξ int(n ) = R s ) ne ξ ie ξ j = nθ (+e ξ + +e ξs ) 2 i θ j i j ne2ξ i = nθ (+ +e ξs ) 2 i ( θ i ) i = j

26 26. PRELIMINARIES Theorem (Sufficient condition for completeness of T ) If ( s ) π ξ (x) = exp ξ i T i (x) A (ξ) h (x), ξ η (Ω) i= is a minimal canonical representation of the exponential family P = {p θ : θ Ω} and η (Ω) contains an open subset of R s, then T = (T,..., T s )is complete for P. Proof. Suppose E ξ (f(t )) = 0 ξ η(ω). Then, (.3.9) E ξ f + (T ) = E ξ f (T ) ξ η(ω). Choose ξ 0 int(η(ω)) and r > 0 such that N(ξ 0, r) := {ξ : ξ ξ 0 < r} η(ω). Now define the probability measures, λ + (A) = f + e ξ0 t ν(dt) A f J + e ξ 0 t ν(dt), ν = µ T, d µ(x) = h(x)µ(dx), λ (A) = f e ξ0 t ν(dt) A f J e ξ 0 t ν(dt), where we have assumed that ν({t: f(t) 0}) > 0, since otherwise f = 0 a.s. P T and we are done. Observe now that (.3.0) e δ t λ + (dt) = e δ t λ (dt) δ R s with δ < r since by (.3.9) L.S. = = J J f + (t)e (ξ0+δ) t ν(dt)/ f (t)e (ξ0+δ) t ν(dt)/ J J f + (t)e ξ 0 t ν(dt) f (t)e ξ 0 t ν(dt) Now consider each side of (.3.0) as a function of the complex argument δ = δ 0 + iθ, θ R s. Then L(δ) = R(δ) δ = δ 0 + i θ with δ 0 < r, since (by Theorem.3.3 (i)) both sides are analytic in each component of δ on the set where Re(ξ 0 + δ) N and they are equal when δ is real. In particular, L(iθ) = e iθ t λ + (dt) = R(iθ) = e iθ t λ (dt)

27 .4. CONVEX LOSS FUNCTION 27 for all θ R s. Hence λ + and λ have the same characteristic function λ + = λ f + = f a.s., contradicting ν(f 0) > 0. So f = 0 a.s. ν. Example X,..., X n iid N(σ, σ 2 ) p σ (x) = (σ 2π) exp{ x 2 n 2σ 2 i + xi n σ 2 }, η (σ) = 2σ 2, η 2 (σ) = σ, T (x) = x 2 i T 2(x) = x i η(ω) does not contain a 2-dim rectangle in R 2. T (x) = ( x 2 i, x i ) is not complete since E θ ( x 2 i 2 n + ( x i ) 2 ) = n(2σ 2 ) 2 n + (nσ2 + n 2 σ 2 ) = 0 but there exists no P-null set N such that x 2 i 2 ( x n+ i ) 2 = 0 on N c..4. Convex Loss Function Lemma.4.. Let φ be a convex function on (, ) which is bounded below and suppose that φ is not monotone. Then, φ takes on its minimum value c and φ (c)is a closed interval and is a singleton when φis strictly convex. Proof. Since φ is convex and not monotone, lim φ (x) =. x ± Since φ is continuous, φ attains its minimum value c. φ ({c}) is closed by continuity and interval by convexity. The interval must have zero length if φ is strictly convex. Theorem.4.2. Let ρ be a convex function defined on (, ) and X a random variable such that φ (a) = E (ρ (X a)) is finite for some a. If ρ is not monotone, φ (a)takes on its minimum value and φ (a) is a closed set and is a singleton when ρ is strictly convex. Proof. By the lemma, we only need to show that φ is convex and not monotone. Because lim t ± ρ (t) = and lim a ± x a = ±, so that φ is not monotone. The convexity comes from lim φ (a) = a ± φ (pa + ( p) b) = Eρ (p (X a) + ( p) (X b)) E (pρ (X a) + ( p) ρ (X b)) = pφ (a) + ( p) φ (b).

28

29 CHAPTER 2 Unbiasedness 2.. UMVU estimators. Notation. P={P θ, θ Ω} is a family of probability measures on A (distributions of X). T:X R is an A/B measurable function and T (or T (X)) is called a statistic. g : Ω R is a function on Ω whose value at θ is to be estimated. if (X, A, P θ ) X (X, A, P θ ) T ( R, B, P T θ Definition 2... A statistic T (or T (X)) is called an unbiased estimators of g (θ) E θ (T (X)) = g (θ) for all θ Ω. Objectives of point estimation. In order to specify what we mean by a good estimator of g(θ), we need to specify what we mean when we say that T (X) is close to g(θ). A fairly general way of defining this is to specify a loss function: L(θ, d) = cost of concluding that g(θ) = d, when the parameter value is θ. L(θ, d) 0 and L(θ, g(θ)) = 0. Since T (X) is a random variable, we measure the performance of T (X) for estimating g(θ) in terms of its expected (or long-term average) loss known as the risk function. R(θ, T ) = E θ L(θ, T (X)), Choice of a loss function will depend on the problem and the purpose of the estimation. For many estimation problem, the conclusion is not particularly sensitive to the choice of loss function within a reasonable range of alternatives. Because of this and especially because of its mathematical convenience, we often choose (and will do so in this chapter) the squared-error loss function with corresponding risk function L(θ, d) = (g(θ) d) 2 (2..) R(θ, T ) = E θ (T (X) g(θ)) 2 29 )

30 30 2. UNBIASEDNESS Ideally we would like to choose T to minimize (2..) uniformly in θ. Unfortunately this is impossible since the estimator T defined by (2..2) T (x) = g(θ 0 ) x X (where θ 0 is some fixed parameter value in Ω) has the risk function, { 0 if θ = θ R(θ, T ) = 0 (g(θ) g(θ 0 )) 2 if θ θ 0 An estimator which simultaneously minimized R(θ, T ) for all θ Ω would necessarily have R(θ, T ) = 0 θ Ω and this is impossible except in trivial cases. Why consider the class of unbiased estimators? There is nothing intrinsically good about unbiased estimators. The only criterion for goodness is that R(θ, T ) should be small. The hope is that by restricting attention to a class of estimators which excludes (2..2), we may be able to minimize R(θ, T ) uniformly in θ and that the resulting estimator will give small values of R(θ, T ). This programme is frequently successful if we attempt to minimize R(θ, T ) with T restricted to the class of unbiased estimators of g(θ). Definition g(θ) is U-estimable, if there exists an unbiased estimator of g(θ). Example X,..., X n iid Bernoulli(p), p (0, ). g(p) = p is U-estimable, since E X n = p p (0, ), while h(p) = is not U-estimable, since if p T (x)p xi ( p) n x i = p p (0, ), lim p 0 RS = and lim p 0 LS = T (0). So T (0) =, but this is not possible since then E p T (X) = p (0, ). p n i= X i Remark a.s. n p and n n a.s. i= X i p n p (0, ). Hence reasonable estimate of p even though it is not unbiased. Theorem If T 0 is an unbiased estimator of g (θ) then the totality of unbiased estimators of g (θ)is given by {T 0 U : E θ U = 0 for all θ Ω}. Proof. If T is unbiased for g(θ), then T = T 0 (T 0 T ) where E θ (T 0 T ) = 0 θ Ω. Conversely if T = T 0 U where E θ U = 0 θ Ω, then E θ T = E θ T 0 = g(θ) θ Ω. Remark For squared error loss, L(θ, d) = (d g(θ)) 2, the risk R(θ, T ) is R(θ, T ) = E θ ((T (X) g(θ)) 2 ) = V ar θ (T (X)) if T is unbiased = V ar θ (T 0 (X) U) = E θ [(T 0 (X) U) 2 ] g(θ) 2 Xi is a

31 2.. UMVU ESTIMATORS. 3 and hence the risk is minimized by minimizing E θ [(T 0 (X) U) 2 ] with respect to U, i.e. by taking any fixed unbiased estimator of g(θ) and finding the unbiased estimator of zero which minimizes E θ [(T 0 (X) U) 2 ]. Then if U does not depend on θ we shall have found a uniformly minimum risk estimator of g(θ), while if U depends on θ, there is no uniformly minimum risk estimator. Note that for unbiased estimators and squared error loss, the risk is the same as the variance of the estimator, so uniformly minimum risk unbiased is the same as uniformly minimum variance unbiased in this case. Example P (X = ) = p, P (X = k) = q 2 p k, k = 0,,..., where q = p. U is unbiased for 0 0 = T 0 (X) = I { } (X) is unbiased for p, 0 < p < T (X) = I {0} (X) is unbiased for q 2, U(k)P (X = k) = pu( ) + k= = U(0) + U(k)q 2 p k k=0 (U(k) 2U(k ) + U(k 2))p k k= U(k) = ku( ) = ka for some a (comparing coefficients of p k, k = 0,, 2,...) So an unbiased estimator of p with minimum risk (i.e. variance) is T 0 (X) a 0X where a 0 is the value of a which minimizes E p (T 0 (X) ax) 2 = P p (X = k)[t 0 (k) ak] 2 Similarly an unbiased estimator of q 2 with minimum risk (i.e. variance) is T (X) a X where a is the value of a which minimizes E p (T (X) ax) 2 = P p (X = k)[t (k) ak] 2 Some straightforward calculations give a p 0 = p + q 2 k2 p k and a = 0 Since a is independent of p, the estimator T (X) of q 2 is minimum variance unbiased for all p, i.e. UMVU. However a 0 does depend on p and so the estimator T0 (X) = T 0 (X) a 0X is only locally minimum variance unbiased at p. (We are using estimator in a generalized sense here since T0 (X) depends on p. We shall continue to use this terminology.) An UMVU estimator of p does not exist in this case. Definition Let V (θ) = inf T V ar θ (T ) where the inf is over all unbiased estimators of g(θ). If an unbiased estimator T of g(θ) satisfies V ar θ (T ) = V (θ) θ Ω it is called UMVU p

32 32 2. UNBIASEDNESS If V ar θ0 T = V (θ 0 ) for some θ 0 Ω T is called LMVU at θ 0 Remark Let H be the Hilbert space of functions on X which are square integrable with respect to P (i.e. with respect to every P θ P), and let U be the set of all unbiased estimators of 0. If T 0 is an unbiased estimator of g(θ) in H, then a LMVU estimator in H at θ 0 is T 0 P U (T 0 ), where P U denotes orthogonal projection on U in the inner product space L 2 (P θ0 ), i.e. P U (T 0 ) is the unique element of U such that T 0 P U (T 0 ) U (in L 2 (P θ0 )). T 0 P U (T 0 ) is LMVU since P U (T 0 ) = arg min U U E θ0 (T 0 U) 2. Notation We denote the set of all estimators T with E θ T 2 < for all θ Ω by and the set of all unbiased estimators of 0 in by U. Theorem 2... An unbiased estimator T of g (θ) is UMVU iff E θ (T U) = 0 for all U U and for all θ Ω. (i.e. Cov θ (T, U) = 0 since E θ U = 0 for all θ and E θ T = g (θ) for all θ Ω.) Proof. ( ) Suppose T is UMVU. For U U, let T = T + λu with λ real. Then T is unbiased and, by definition of T, V ar θ (T ) = V ar θ (T ) + λ 2 V ar θ (U) + 2λCov θ (T, U) V ar θ (T ) therefore, λ 2 V ar θ (U) + 2λCov θ (T, U) 0. Setting λ = Cov θ(t,u) V ar θ (U) to this inequality unless Cov θ (T, U) = 0. Hence Cov θ (T, U) = 0. gives a contradiction ( ) If E θ (T U) = 0 U U and θ Ω, let T be any other unbiased estimator. If V ar θ (T ) =, then V ar θ (T ) < V ar θ (T ), so suppose V ar θ (T ) <. Then T = T U, for some U which is unbiased for 0 (by Theorem 2..5). Hence U = T T E θ U 2 = E θ (T T ) 2 2E θ T 2 + 2E θ T 2 < U U V ar θ (T ) = V ar θ (T U) = V ar θ (T ) + V ar θ (U) 2Cov θ (T, U) V ar θ (T ) since Cov θ (T, U) = 0, T is UMVU.

33 2.. UMVU ESTIMATORS. 33 Unbiasedness and sufficiency. Suppose now that T is unbiased for g(θ) and S is sufficient for P = {P θ, θ Ω}. Consider Then (a) (b) T = E θ (T S) = E(T S) independent of θ E θ T = E θ E(T S) = E θ (T ) = g(θ) θ. V ar θ (T ) = E θ (T E(T S) + E(T S) g(θ)) 2 = E θ ((T E(T S)) 2 ) + V ar θ (T ) + 2E θ [(T E(T S))(E(T S) g(θ))] V ar θ (T ). On the second line we used the fact that T E(T S) is orthogonal to σ(s). The inequality on the third line is strict for all θ T = E(T S) a.s. P. Theorem If S is a complete sufficient statistic for P, then every U-estimable function g (θ) has one and only one unbiased estimator which is a function of S. Proof. T unbiased E(T S) is unbiased and a function of S T (S), T 2 (S) unbiased E θ (T (S) T 2 (S)) = 0 θ T (S) = T 2 (S) a.s. P (completeness) Theorem (Rao-Blackwell) Suppose S is a complete sufficient statistic for P. Then (i) If g (θ) is U-estimable, there exists an unbiased estimator which uniformly minimizes the risk for any loss function L (θ, d) which is convex in d. (ii) The UMV U in (i) is the unique unbiased estimator which is a function of S; it is the unique unbiased estimator with minimum risk provided the risk is finite and L is strictly convex in d. Proof. (i) L(θ, d) convex in d means L(θ, pd + ( p)d 2 ) pl(θ, d ) + ( p)l(θ, d 2 ), 0 < p <. Let T be any unbiased estimator of g(θ) and let T = E(T S), another unbiased estimator of g(θ). Then R(θ, T ) = E θ [L(θ, E(T S))] E θ [E θ (L(θ, T ) S)], by Jensen s inequality for conditional expectation, = E θ L(θ, T ) = R(θ, T ) θ.

34 34 2. UNBIASEDNESS If T 2 is any other unbiased estimator then T 2 = E(T 2 S) = T a.s. P by Theorem Hence starting from any unbiased estimator and conditioning on the CSS S gives a uniquely defined unbiased estimator which is UMVU and is the unique function of S which is unbiased for g(θ). (ii) The first statement was established at the end of the proof of (i). If T is UMVU then so is T = E(T S) as shown in (i); We will show that T is necessarily the uniquely determined unbiased function of S, by showing that T is a function of S a.s. P. The proof is by contradiction. Suppose that "T is a function of S a.s. P" is false. Then there exists θ and a set of positive P θ measure where But this implies that R(θ, T ) = E θ (L(θ, E(T S))) < E θ (E θ (L(θ, T ) S)) T := E(T S) T (Jensen s inequality is strict unless E(T S) = T a.s. P θ ) = R(θ, T ) contradicting the UMVU property of T. Theorem If P is an exponential family of full rank (i.e. {η,..., η s } and {T,..., T s } are A.I. and η (Ω) contains an open subset of R s ) then the Rao-Blackwell theorem applies to any U-estimable g (θ) with S = T. Proof. T is complete sufficient for P. [Some obvious U-estimable g(θ) s are E θ T i (X) = A ξ i ξ=η(θ), {θ : η(θ) int(n )}, where π ξ (x) = e ξ i T i (x) A(ξ) h(x) is the canonical representation of p θ (x).] Two methods for finding UMVU s Method. Search for a function δ(t ), where T is a CSS, such that E θ δ(t ) = g(θ), θ Ω.

35 2.. UMVU ESTIMATORS. 35 Example X,..., X n iid N(µ, σ 2 ), µ R, σ 2 > 0. T =( X, S 2 ) is CSS. E µ,σ 2 X = µ X is UMVU for µ. Method 2. Search for an unbiased δ(x) and a CSS T. Then S = E(δ(X) T ) is UMV U Example X,..., X n iid U(0, θ), θ > 0 g(θ) = θ 2 δ (X) = X is unbiased X (n) is CSS S = E(X X (n) ) is UMVU To compute S we note that given X (n) = x, X = x w.p. n Remark X U(0, x) w.p. n S(x) = x n + ( n )x 2 = n + x n 2 S(X (n) ) = n + 2 n X (n) is UMVU for θ 2 n + n X (n) is UMVU for θ (a) Convexity of L(θ, ) is crucial to the Rao-Blackwell theorem. (b) Large-sample theory tends to support the use of convex L(θ, ). Heuristically if X,..., X n are iid, then as n the error in estimating g(θ) 0 for any reasonable estimates (in some probabilistic sense). Thus only the behavior of L(θ, d) for d close to g(θ) is relevant for large samples. A Taylor expansion around d = g(θ) gives But L(θ, d) = a(θ) + b(θ)(d g(θ)) + c(θ)(d g(θ)) 2 + Remainder L(θ, g(θ)) = 0 a(θ) = 0 L(θ, d) 0 b(θ) = 0 Hence locally, L(θ, d) c(θ)(d g(θ)) 2, a convex weighted squared error loss function.

36 36 2. UNBIASEDNESS Example Observe X,..., X m, iid N(ξ, σ 2 ), and Y,..., Y n, iid N(η, τ 2 ), independent of X,..., X m. (i) For the 4-parameter family P = {P ξ,η,σ 2,τ 2}, ( X, Ȳ, S2 X, S2 Y ) is a CSS since the exponential family is of full rank. Hence X and SX 2 are UMVU for ξ and σ2 respectively and Ȳ and S2 Y are UMVU for η and τ 2. (ii) For the 3-parameter family P = {P ξ,η,σ 2,σ2}, ( X, Ȳ, SS) is a CSS, where SS := (m )SX 2 + (n )S2 Y. Hence X, Ȳ and SS are UMVU for ξ, η and σ2 m+n 2 respectively. (iii) For the 3-parameter family with ξ = η, σ 2 τ 2 (which arises when estimating a mean from 2 sets of readings with different accuracies), ( X, Ȳ, S2 X, S2 Y ) is minimal sufficient but not complete, since X Ȳ 0 a.s. P, but E θ( X Ȳ ) = 0 θ. To deal with Case (iii) we shall first show the following: If σ2 = r for some fixed τ 2 r, i.e. P = {P ξ,ξ,rτ 2,τ 2} then T = ( X i + r Y j, X 2 i + r Y 2 j ) is CSS Proof. p ξ,τ 2(x, y) = exp (2π) m+n 2 (rτ 2 ) m 2 (τ 2 ) n 2 { x 2 2rτ 2 i + rτ = exp { A(ξ, τ 2 ) } exp 2 mξ x mξ2 2rτ 2 { 2rτ 2 ( x 2 i + r y 2 i ) + y 2 2τ 2 i + } nξ2 nξȳ τ 2 2τ 2 ξ rτ ( x 2 i + r } y i ) Since T is a CSS for P and since T = X i +r Y i m+rn for ξ in P. is unbiased for ξ, it is UMVU T is also unbiased for ξ in P = {P ξ,ξ,σ 2,τ 2} V (ξ 0, σ 2 0, τ 2 0 ) V ar ξ0,σ 2 0,τ 2 0 (T ) = σ0τ 2 0 2, where mτ0 2 + nσ0 2 σ0 2 τ0 2 = r. (V is the smallest variance of all unbiased estimators of ξ for P evaluated at ξ 0, σ 2 0, τ 2 0.)

37 2.2. NON-PARAMETRIC FAMILIES 37 On the other hand, every T which is unbiased for ξ in P is also unbiased in P. Hence if T is unbiased for ξ in P, then V ar ξ0,σ0 2,τ 0 2(T ) V ar ξ 0,σ0 2,τ 0 2( Xi + r Y i ), where r = σ2 0, m + rn τ0 2 and the inequality continues to hold with the left-hand side replaced by V (ξ 0, σ0, 2 τ0 2 ). So V (ξ 0, σ0, 2 τ0 2 ) = σ2 0 τ 0 2 mτ0 2+nσ2 0 and the LMVU estimator at (ξ 0, σ0, 2 τ0 2 ) is Xi + σ2 0 τ 2 0 Yi m + σ2 0 τ 2 0 n Since this estimate depends on the ratio r = σ2 0, an UMVU for ξ does not exist τ0 2 in P. A natural estimate for ξ is Xi + S2 X SY ˆξ = 2 Yi. m + S2 X n SY 2 (See Graybill and Deal, Biometrics, 959, pp for its properties.) 2.2. Non-parametric families Consider X = (X,..., X n ), where X,..., X n are iid F, where F F, a family of distribution functions, and P is the corresponding product measure on (R n, B n ). For example, F 0 = df s with density relative to Lebesgue measure, F = df s with x F (dx) <, F 2 = df s with x 2 F (dx) <, etc.. The estimand is g : F R. For example, g(f ) = g(f ) = xf (dx) = µ F x 2 F (dx) g(f ) = F (a) g(f ) = F (p) Proposition If F 0 is defined as above, then (X (),..., X (n) ) is complete sufficient for F 0 (i.e. for the corresponding family of probability measures P).

38 38 2. UNBIASEDNESS Proof. We know that T (X) = (X (),..., X (n) ) is sufficient for P. It remains to show (by problem.6.32, p.72) that T is complete and sufficient for a family P 0 P such that each member of P 0 has positive density on R n. Choose P 0 to be the set of probability measures on B n with densities relative to Lebesgue measure, C(θ,, θ n ) exp{θ xi + θ 2 x i x j + + θ n x x n x 2n i }} i<j This is an exponential family whose natural parameter set N contains an open set (N = R n ). So S(x) = ( x i, i<j x ix j,, x x n ) is complete. But S is equivalent to T (consider the n th degree polynomial whose zeroes are x (),, x (n) ), so T is complete for F 0. Measurable functions of the order statistics. If T (x) := (x (),..., x (n) ) then δ(x,..., X n ) σ(t ) δ(x,..., X n ) = δ(x π,..., X πn ) for every permutation (π,..., π n ) of (,..., n). Since T is a CSS for F 0, this enables us to identify UMVU estimators of estimands g for which they exist. Example g(f ) = F (a). An obvious unbiased estimator of F (a) is T (X) := n I (,a] (X i ) n and T σ(t ) so T is UMVU for F (a). i= Example g(f ) = xdf, F F 0 F 2. Let T 2 (x) = n X i n Then T 2 σ(t ) and, since T is also complete for F 0 F 2, it is therefore UMVU for µ F. Example g(f ) = σf 2, F F 0 F 4. Let T 3 (x) = S(x) 2 (xi x) 2 (x(i) x(i) ) 2 n = = n n T 3 σ(t ) and is unbiased for σf 2. Since T is complete for F 0 F 4, T 3 is UMVU for σf 2. Remark T complete for F does not imply generally that T is complete for F F. In fact the reverse is true. Completeness for F implies completeness for F. However the same argument used in the proof of Proposition 2.2. shows that T is complete for F 0 F 2 (used in example 2.2.3) and i= T is complete for F 0 F 4 (used in example 2.2.4).

Fundamentals of Statistics

Fundamentals of Statistics Chapter 2 Fundamentals of Statistics This chapter discusses some fundamental concepts of mathematical statistics. These concepts are essential for the material in later chapters. 2.1 Populations, Samples,

More information

ECE 275B Homework # 1 Solutions Version Winter 2015

ECE 275B Homework # 1 Solutions Version Winter 2015 ECE 275B Homework # 1 Solutions Version Winter 2015 1. (a) Because x i are assumed to be independent realizations of a continuous random variable, it is almost surely (a.s.) 1 the case that x 1 < x 2

More information

ECE 275B Homework # 1 Solutions Winter 2018

ECE 275B Homework # 1 Solutions Winter 2018 ECE 275B Homework # 1 Solutions Winter 2018 1. (a) Because x i are assumed to be independent realizations of a continuous random variable, it is almost surely (a.s.) 1 the case that x 1 < x 2 < < x n Thus,

More information

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources STA 732: Inference Notes 10. Parameter Estimation from a Decision Theoretic Angle Other resources 1 Statistical rules, loss and risk We saw that a major focus of classical statistics is comparing various

More information

ST5215: Advanced Statistical Theory

ST5215: Advanced Statistical Theory Department of Statistics & Applied Probability Monday, September 26, 2011 Lecture 10: Exponential families and Sufficient statistics Exponential Families Exponential families are important parametric families

More information

Evaluating the Performance of Estimators (Section 7.3)

Evaluating the Performance of Estimators (Section 7.3) Evaluating the Performance of Estimators (Section 7.3) Example: Suppose we observe X 1,..., X n iid N(θ, σ 2 0 ), with σ2 0 known, and wish to estimate θ. Two possible estimators are: ˆθ = X sample mean

More information

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Data from one or a series of random experiments are collected. Planning experiments and collecting data (not discussed here). Analysis:

More information

Chapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic

Chapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic Chapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic Unbiased estimation Unbiased or asymptotically unbiased estimation plays an important role in

More information

Chapter 3. Point Estimation. 3.1 Introduction

Chapter 3. Point Estimation. 3.1 Introduction Chapter 3 Point Estimation Let (Ω, A, P θ ), P θ P = {P θ θ Θ}be probability space, X 1, X 2,..., X n : (Ω, A) (IR k, B k ) random variables (X, B X ) sample space γ : Θ IR k measurable function, i.e.

More information

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology Singapore University of Design and Technology Lecture 9: Hypothesis testing, uniformly most powerful tests. The Neyman-Pearson framework Let P be the family of distributions of concern. The Neyman-Pearson

More information

1. Fisher Information

1. Fisher Information 1. Fisher Information Let f(x θ) be a density function with the property that log f(x θ) is differentiable in θ throughout the open p-dimensional parameter set Θ R p ; then the score statistic (or score

More information

ST5215: Advanced Statistical Theory

ST5215: Advanced Statistical Theory Department of Statistics & Applied Probability Wednesday, October 5, 2011 Lecture 13: Basic elements and notions in decision theory Basic elements X : a sample from a population P P Decision: an action

More information

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,

More information

ST5215: Advanced Statistical Theory

ST5215: Advanced Statistical Theory Department of Statistics & Applied Probability Wednesday, October 19, 2011 Lecture 17: UMVUE and the first method of derivation Estimable parameters Let ϑ be a parameter in the family P. If there exists

More information

Integral Jensen inequality

Integral Jensen inequality Integral Jensen inequality Let us consider a convex set R d, and a convex function f : (, + ]. For any x,..., x n and λ,..., λ n with n λ i =, we have () f( n λ ix i ) n λ if(x i ). For a R d, let δ a

More information

An exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form.

An exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form. Stat 8112 Lecture Notes Asymptotics of Exponential Families Charles J. Geyer January 23, 2013 1 Exponential Families An exponential family of distributions is a parametric statistical model having densities

More information

Brownian Motion and Conditional Probability

Brownian Motion and Conditional Probability Math 561: Theory of Probability (Spring 2018) Week 10 Brownian Motion and Conditional Probability 10.1 Standard Brownian Motion (SBM) Brownian motion is a stochastic process with both practical and theoretical

More information

Lecture 2: Basic Concepts of Statistical Decision Theory

Lecture 2: Basic Concepts of Statistical Decision Theory EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

LECTURE 2 NOTES. 1. Minimal sufficient statistics.

LECTURE 2 NOTES. 1. Minimal sufficient statistics. LECTURE 2 NOTES 1. Minimal sufficient statistics. Definition 1.1 (minimal sufficient statistic). A statistic t := φ(x) is a minimial sufficient statistic for the parametric model F if 1. it is sufficient.

More information

Mathematical Statistics

Mathematical Statistics Mathematical Statistics Chapter Three. Point Estimation 3.4 Uniformly Minimum Variance Unbiased Estimator(UMVUE) Criteria for Best Estimators MSE Criterion Let F = {p(x; θ) : θ Θ} be a parametric distribution

More information

1 Complete Statistics

1 Complete Statistics Complete Statistics February 4, 2016 Debdeep Pati 1 Complete Statistics Suppose X P θ, θ Θ. Let (X (1),..., X (n) ) denote the order statistics. Definition 1. A statistic T = T (X) is complete if E θ g(t

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

MATH MEASURE THEORY AND FOURIER ANALYSIS. Contents

MATH MEASURE THEORY AND FOURIER ANALYSIS. Contents MATH 3969 - MEASURE THEORY AND FOURIER ANALYSIS ANDREW TULLOCH Contents 1. Measure Theory 2 1.1. Properties of Measures 3 1.2. Constructing σ-algebras and measures 3 1.3. Properties of the Lebesgue measure

More information

Lecture 17: Minimal sufficiency

Lecture 17: Minimal sufficiency Lecture 17: Minimal sufficiency Maximal reduction without loss of information There are many sufficient statistics for a given family P. In fact, X (the whole data set) is sufficient. If T is a sufficient

More information

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2) 14:17 11/16/2 TOPIC. Convergence in distribution and related notions. This section studies the notion of the so-called convergence in distribution of real random variables. This is the kind of convergence

More information

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define HILBERT SPACES AND THE RADON-NIKODYM THEOREM STEVEN P. LALLEY 1. DEFINITIONS Definition 1. A real inner product space is a real vector space V together with a symmetric, bilinear, positive-definite mapping,

More information

Lecture 6 Basic Probability

Lecture 6 Basic Probability Lecture 6: Basic Probability 1 of 17 Course: Theory of Probability I Term: Fall 2013 Instructor: Gordan Zitkovic Lecture 6 Basic Probability Probability spaces A mathematical setup behind a probabilistic

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due ). Show that the open disk x 2 + y 2 < 1 is a countable union of planar elementary sets. Show that the closed disk x 2 + y 2 1 is a countable

More information

Solutions to Tutorial 11 (Week 12)

Solutions to Tutorial 11 (Week 12) THE UIVERSITY OF SYDEY SCHOOL OF MATHEMATICS AD STATISTICS Solutions to Tutorial 11 (Week 12) MATH3969: Measure Theory and Fourier Analysis (Advanced) Semester 2, 2017 Web Page: http://sydney.edu.au/science/maths/u/ug/sm/math3969/

More information

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due 9/5). Prove that every countable set A is measurable and µ(a) = 0. 2 (Bonus). Let A consist of points (x, y) such that either x or y is

More information

Probability and Measure

Probability and Measure Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability

More information

Lecture 12 November 3

Lecture 12 November 3 STATS 300A: Theory of Statistics Fall 2015 Lecture 12 November 3 Lecturer: Lester Mackey Scribe: Jae Hyuck Park, Christian Fong Warning: These notes may contain factual and/or typographic errors. 12.1

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 7 Maximum Likelihood Estimation 7. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function

More information

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Review and continuation from last week Properties of MLEs

Review and continuation from last week Properties of MLEs Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that

More information

THEOREMS, ETC., FOR MATH 515

THEOREMS, ETC., FOR MATH 515 THEOREMS, ETC., FOR MATH 515 Proposition 1 (=comment on page 17). If A is an algebra, then any finite union or finite intersection of sets in A is also in A. Proposition 2 (=Proposition 1.1). For every

More information

Chapter 1. Measure Spaces. 1.1 Algebras and σ algebras of sets Notation and preliminaries

Chapter 1. Measure Spaces. 1.1 Algebras and σ algebras of sets Notation and preliminaries Chapter 1 Measure Spaces 1.1 Algebras and σ algebras of sets 1.1.1 Notation and preliminaries We shall denote by X a nonempty set, by P(X) the set of all parts (i.e., subsets) of X, and by the empty set.

More information

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011 LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS S. G. Bobkov and F. L. Nazarov September 25, 20 Abstract We study large deviations of linear functionals on an isotropic

More information

Minimax lower bounds I

Minimax lower bounds I Minimax lower bounds I Kyoung Hee Kim Sungshin University 1 Preliminaries 2 General strategy 3 Le Cam, 1973 4 Assouad, 1983 5 Appendix Setting Family of probability measures {P θ : θ Θ} on a sigma field

More information

Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor)

Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor) Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor) Matija Vidmar February 7, 2018 1 Dynkin and π-systems Some basic

More information

MATHS 730 FC Lecture Notes March 5, Introduction

MATHS 730 FC Lecture Notes March 5, Introduction 1 INTRODUCTION MATHS 730 FC Lecture Notes March 5, 2014 1 Introduction Definition. If A, B are sets and there exists a bijection A B, they have the same cardinality, which we write as A, #A. If there exists

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 8 Maximum Likelihood Estimation 8. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function

More information

A D VA N C E D P R O B A B I L - I T Y

A D VA N C E D P R O B A B I L - I T Y A N D R E W T U L L O C H A D VA N C E D P R O B A B I L - I T Y T R I N I T Y C O L L E G E T H E U N I V E R S I T Y O F C A M B R I D G E Contents 1 Conditional Expectation 5 1.1 Discrete Case 6 1.2

More information

Applications of Ito s Formula

Applications of Ito s Formula CHAPTER 4 Applications of Ito s Formula In this chapter, we discuss several basic theorems in stochastic analysis. Their proofs are good examples of applications of Itô s formula. 1. Lévy s martingale

More information

STAT215: Solutions for Homework 2

STAT215: Solutions for Homework 2 STAT25: Solutions for Homework 2 Due: Wednesday, Feb 4. (0 pt) Suppose we take one observation, X, from the discrete distribution, x 2 0 2 Pr(X x θ) ( θ)/4 θ/2 /2 (3 θ)/2 θ/4, 0 θ Find an unbiased estimator

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

Miscellaneous Errors in the Chapter 6 Solutions

Miscellaneous Errors in the Chapter 6 Solutions Miscellaneous Errors in the Chapter 6 Solutions 3.30(b In this problem, early printings of the second edition use the beta(a, b distribution, but later versions use the Poisson(λ distribution. If your

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 05 Full points may be obtained for correct answers to eight questions Each numbered question (which may have several parts) is worth

More information

Lecture 4. f X T, (x t, ) = f X,T (x, t ) f T (t )

Lecture 4. f X T, (x t, ) = f X,T (x, t ) f T (t ) LECURE NOES 21 Lecture 4 7. Sufficient statistics Consider the usual statistical setup: the data is X and the paramter is. o gain information about the parameter we study various functions of the data

More information

Classical Estimation Topics

Classical Estimation Topics Classical Estimation Topics Namrata Vaswani, Iowa State University February 25, 2014 This note fills in the gaps in the notes already provided (l0.pdf, l1.pdf, l2.pdf, l3.pdf, LeastSquares.pdf). 1 Min

More information

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM c 2007-2016 by Armand M. Makowski 1 ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM 1 The basic setting Throughout, p, q and k are positive integers. The setup With

More information

1 Probability Model. 1.1 Types of models to be discussed in the course

1 Probability Model. 1.1 Types of models to be discussed in the course Sufficiency January 18, 016 Debdeep Pati 1 Probability Model Model: A family of distributions P θ : θ Θ}. P θ (B) is the probability of the event B when the parameter takes the value θ. P θ is described

More information

If Y and Y 0 satisfy (1-2), then Y = Y 0 a.s.

If Y and Y 0 satisfy (1-2), then Y = Y 0 a.s. 20 6. CONDITIONAL EXPECTATION Having discussed at length the limit theory for sums of independent random variables we will now move on to deal with dependent random variables. An important tool in this

More information

Chapter 1. Statistical Spaces

Chapter 1. Statistical Spaces Chapter 1 Statistical Spaces Mathematical statistics is a science that studies the statistical regularity of random phenomena, essentially by some observation values of random variable (r.v.) X. Sometimes

More information

Translation Invariant Experiments with Independent Increments

Translation Invariant Experiments with Independent Increments Translation Invariant Statistical Experiments with Independent Increments (joint work with Nino Kordzakhia and Alex Novikov Steklov Mathematical Institute St.Petersburg, June 10, 2013 Outline 1 Introduction

More information

Chapter 7. Basic Probability Theory

Chapter 7. Basic Probability Theory Chapter 7. Basic Probability Theory I-Liang Chern October 20, 2016 1 / 49 What s kind of matrices satisfying RIP Random matrices with iid Gaussian entries iid Bernoulli entries (+/ 1) iid subgaussian entries

More information

Qualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf

Qualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part : Sample Problems for the Elementary Section of Qualifying Exam in Probability and Statistics https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part 2: Sample Problems for the Advanced Section

More information

Lecture 17: Likelihood ratio and asymptotic tests

Lecture 17: Likelihood ratio and asymptotic tests Lecture 17: Likelihood ratio and asymptotic tests Likelihood ratio When both H 0 and H 1 are simple (i.e., Θ 0 = {θ 0 } and Θ 1 = {θ 1 }), Theorem 6.1 applies and a UMP test rejects H 0 when f θ1 (X) f

More information

STAT 512 sp 2018 Summary Sheet

STAT 512 sp 2018 Summary Sheet STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}

More information

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes Neyman-Pearson paradigm. Suppose that a researcher is interested in whether the new drug works. The process of determining whether the outcome of the experiment points to yes or no is called hypothesis

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

I. ANALYSIS; PROBABILITY

I. ANALYSIS; PROBABILITY ma414l1.tex Lecture 1. 12.1.2012 I. NLYSIS; PROBBILITY 1. Lebesgue Measure and Integral We recall Lebesgue measure (M411 Probability and Measure) λ: defined on intervals (a, b] by λ((a, b]) := b a (so

More information

Random Process Lecture 1. Fundamentals of Probability

Random Process Lecture 1. Fundamentals of Probability Random Process Lecture 1. Fundamentals of Probability Husheng Li Min Kao Department of Electrical Engineering and Computer Science University of Tennessee, Knoxville Spring, 2016 1/43 Outline 2/43 1 Syllabus

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

3 Integration and Expectation

3 Integration and Expectation 3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ

More information

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete

More information

5 Measure theory II. (or. lim. Prove the proposition. 5. For fixed F A and φ M define the restriction of φ on F by writing.

5 Measure theory II. (or. lim. Prove the proposition. 5. For fixed F A and φ M define the restriction of φ on F by writing. 5 Measure theory II 1. Charges (signed measures). Let (Ω, A) be a σ -algebra. A map φ: A R is called a charge, (or signed measure or σ -additive set function) if φ = φ(a j ) (5.1) A j for any disjoint

More information

MIT Spring 2016

MIT Spring 2016 MIT 18.655 Dr. Kempthorne Spring 2016 1 MIT 18.655 Outline 1 2 MIT 18.655 Decision Problem: Basic Components P = {P θ : θ Θ} : parametric model. Θ = {θ}: Parameter space. A{a} : Action space. L(θ, a) :

More information

Chapter 3 : Likelihood function and inference

Chapter 3 : Likelihood function and inference Chapter 3 : Likelihood function and inference 4 Likelihood function and inference The likelihood Information and curvature Sufficiency and ancilarity Maximum likelihood estimation Non-regular models EM

More information

Integration on Measure Spaces

Integration on Measure Spaces Chapter 3 Integration on Measure Spaces In this chapter we introduce the general notion of a measure on a space X, define the class of measurable functions, and define the integral, first on a class of

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

Qualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf

Qualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part 1: Sample Problems for the Elementary Section of Qualifying Exam in Probability and Statistics https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part 2: Sample Problems for the Advanced Section

More information

Econ 508B: Lecture 5

Econ 508B: Lecture 5 Econ 508B: Lecture 5 Expectation, MGF and CGF Hongyi Liu Washington University in St. Louis July 31, 2017 Hongyi Liu (Washington University in St. Louis) Math Camp 2017 Stats July 31, 2017 1 / 23 Outline

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Real Analysis Chapter 3 Solutions Jonathan Conder. ν(f n ) = lim

Real Analysis Chapter 3 Solutions Jonathan Conder. ν(f n ) = lim . Suppose ( n ) n is an increasing sequence in M. For each n N define F n : n \ n (with 0 : ). Clearly ν( n n ) ν( nf n ) ν(f n ) lim n If ( n ) n is a decreasing sequence in M and ν( )

More information

Lecture 7 October 13

Lecture 7 October 13 STATS 300A: Theory of Statistics Fall 2015 Lecture 7 October 13 Lecturer: Lester Mackey Scribe: Jing Miao and Xiuyuan Lu 7.1 Recap So far, we have investigated various criteria for optimal inference. We

More information

Part II Probability and Measure

Part II Probability and Measure Part II Probability and Measure Theorems Based on lectures by J. Miller Notes taken by Dexter Chua Michaelmas 2016 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

Lecture 3 September 29

Lecture 3 September 29 STATS 300A: Theory of Statistics Fall 015 Lecture 3 September 9 Lecturer: Lester Mackey Scribe: Konstantin Lopyrev, Karthik Rajkumar Warning: These notes may contain factual and/or typographic errors.

More information

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R. Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989),

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989), Real Analysis 2, Math 651, Spring 2005 April 26, 2005 1 Real Analysis 2, Math 651, Spring 2005 Krzysztof Chris Ciesielski 1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer

More information

here, this space is in fact infinite-dimensional, so t σ ess. Exercise Let T B(H) be a self-adjoint operator on an infinitedimensional

here, this space is in fact infinite-dimensional, so t σ ess. Exercise Let T B(H) be a self-adjoint operator on an infinitedimensional 15. Perturbations by compact operators In this chapter, we study the stability (or lack thereof) of various spectral properties under small perturbations. Here s the type of situation we have in mind:

More information

7 Convergence in R d and in Metric Spaces

7 Convergence in R d and in Metric Spaces STA 711: Probability & Measure Theory Robert L. Wolpert 7 Convergence in R d and in Metric Spaces A sequence of elements a n of R d converges to a limit a if and only if, for each ǫ > 0, the sequence a

More information

MIT Spring 2016

MIT Spring 2016 Dr. Kempthorne Spring 2016 1 Outline Building 1 Building 2 Definition Building Let X be a random variable/vector with sample space X R q and probability model P θ. The class of probability models P = {P

More information

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015 Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

1 Directional Derivatives and Differentiability

1 Directional Derivatives and Differentiability Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=

More information

Chapter 1: Probability Theory Lecture 1: Measure space, measurable function, and integration

Chapter 1: Probability Theory Lecture 1: Measure space, measurable function, and integration Chapter 1: Probability Theory Lecture 1: Measure space, measurable function, and integration Random experiment: uncertainty in outcomes Ω: sample space: a set containing all possible outcomes Definition

More information

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n = Spring 2012 Math 541A Exam 1 1. (a) Let Z i be independent N(0, 1), i = 1, 2,, n. Are Z = 1 n n Z i and S 2 Z = 1 n 1 n (Z i Z) 2 independent? Prove your claim. (b) Let X 1, X 2,, X n be independent identically

More information

MIT Spring 2016

MIT Spring 2016 Exponential Families II MIT 18.655 Dr. Kempthorne Spring 2016 1 Outline Exponential Families II 1 Exponential Families II 2 : Expectation and Variance U (k 1) and V (l 1) are random vectors If A (m k),

More information

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Machine Learning. Lecture 3: Logistic Regression. Feng Li. Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification

More information

discrete random variable: probability mass function continuous random variable: probability density function

discrete random variable: probability mass function continuous random variable: probability density function CHAPTER 1 DISTRIBUTION THEORY 1 Basic Concepts Random Variables discrete random variable: probability mass function continuous random variable: probability density function CHAPTER 1 DISTRIBUTION THEORY

More information

Notes on Measure, Probability and Stochastic Processes. João Lopes Dias

Notes on Measure, Probability and Stochastic Processes. João Lopes Dias Notes on Measure, Probability and Stochastic Processes João Lopes Dias Departamento de Matemática, ISEG, Universidade de Lisboa, Rua do Quelhas 6, 1200-781 Lisboa, Portugal E-mail address: jldias@iseg.ulisboa.pt

More information

Bayes spaces: use of improper priors and distances between densities

Bayes spaces: use of improper priors and distances between densities Bayes spaces: use of improper priors and distances between densities J. J. Egozcue 1, V. Pawlowsky-Glahn 2, R. Tolosana-Delgado 1, M. I. Ortego 1 and G. van den Boogaart 3 1 Universidad Politécnica de

More information

SOLUTION FOR HOMEWORK 7, STAT p(x σ) = (1/[2πσ 2 ] 1/2 )e (x µ)2 /2σ 2.

SOLUTION FOR HOMEWORK 7, STAT p(x σ) = (1/[2πσ 2 ] 1/2 )e (x µ)2 /2σ 2. SOLUTION FOR HOMEWORK 7, STAT 6332 1. We have (for a general case) Denote p (x) p(x σ)/ σ. Then p(x σ) (1/[2πσ 2 ] 1/2 )e (x µ)2 /2σ 2. p (x σ) p(x σ) 1 (x µ)2 +. σ σ 3 Then E{ p (x σ) p(x σ) } σ 2 2σ

More information