STAT 5380 Advanced Mathematical Statistics I (Lecture Notes Spring 2018) 1. Alex Trindade Department of Mathematics & Statistics Texas Tech University
|
|
- Shanon McCarthy
- 5 years ago
- Views:
Transcription
1 STAT 5380 Advanced Mathematical Statistics I (Lecture Notes Spring 208) Alex Trindade Department of Mathematics & Statistics Texas Tech University Based primarily on TPE: Theory of Point Estimation, 2nd edition, by E.L. Lehmann and George Casella, Springer (998).
2
3 Contents Chapter. Preliminaries 5.. Conditional Expectation 5.2. Sufficiency 6.3. Exponential Families Convex Loss Function 27 Chapter 2. Unbiasedness UMVU estimators Non-parametric families The Information Inequality Multiparameter Case 47 Chapter 3. Equivariance Equivariance for Location family The General Equivariant Framework Location-Scale Families 7 Chapter 4. Average-Risk Optimality Bayes Estimation Minimax Estimation Minimaxity and Admissibility in Exponential families Shrinkage Estimators and Bigdata 95 Chapter 5. Large Sample Theory Convergence in Probability and Order in Probability Convergence in Distribution Asymptotic Comparisons (Pitman Efficiency) Comparison of sample mean, median and trimmed mean (M-estimation) Chapter 6. Maximum Likelihood Estimation Consistency Asymptotic Normality of the MLE Asymptotic Optimality of the MLE 25 3
4
5 CHAPTER Preliminaries.. Conditional Expectation Definition. Let (X, A, P ) be a probability space. If X L (A, P ) and G is a sub-σfield of A, then E(X G) is a random variable such that (i) E(X G) G (i.e. is G measurable) (ii) E(I G X) = E(I G E(X G)), G G Construction. For X 0, µ(g) = E(I G X) is a measure on G and P (G) = 0 µ(g) = 0, so by the Radon-Nikodym theorem there exists a G-measurable function E(X G) such that µ(g) = E(X G)dP, i.e.(ii) is satisfied. This shows the existence of G E(X + G) and E(X G). Then we define E(X G) = E(X + G) E(X G). Remark... (ii) generalizes to E(Y X) = E(Y E(X G)) Y G such that E Y X <. The conditional probability of A given G is defined for all A A as P (A G) = E(I A G). Remark..2. If X L 2 (A, P ), then E(X G) is the orthogonal projection in L 2 (A, P ) of X onto the closed linear subspace L 2 (G, P ) of L 2 (A, P ) since (i) E(X G) L 2 (G, P ) and (ii) E(Y (X E(X G))) = 0, Y L 2 (G, P ). Conditioning on a Statistic Let X be a r.v. defined on (X, A, P ) with E X < and let T be a measurable function (not necessarily real-valued) from (X, A) into (T, F). (X, A, P ) T (T, F, P T ) Such a T is called a statistic (and is not necessarily real-valued). The σ-field of subsets of X induced by T is σ(t ) = {T S, S F} = T F Definition..3. E(X T ) E(X σ(t )) Recall that a real-valued function f on X is σ(t ) measurable f = g T for some F-measurable g on T, i.e. f(x) = g(t (x)) as shown below.. X T T 5 g R
6 6. PRELIMINARIES This implies that E(X T ) is expressible as E(X T ) = h(t ) for some function h F which is unique a.e. P T. T h X T R Definition..4. E(X t) h(t) Example..5. Suppose (X, T ) has probability density p(x, t) w.r.t. Lebesgue measure on R 2 and E X <. Then E(X σ(t )) = h(t ) where h(t) = E(X T = t) = xp(x,t) dx p(x,t) dx I p T (t)>0(t), a.s. P T. PROOF (i) R.S. is Borel measurable in t (by Fubini) (ii) G σ(t ) G = T F for some F F I G = I F (T ) E(I G E(X σ(t ))) = E(I G X) = I G X dp = xi F (t)p(x, t) dxdt = I F (t)h(t)p T (t) dt = E[I F (T )h(t )] = E[I G h(t )] Properties of Conditional Expectation If T is a statistic, X is the identity function on X and f n, f, g are integrable, then (i) E[af(X) + bg(x) T ] = ae[f(x) T ] + be[g(x) T ] a.s. (ii) a f(x) b a.s. a E[f(X) T ] b a.s. (iii) f n g, f n (x) f(x) a.s. E[f n (X) T ] E[f(X) T ] a.s. (iv) E[E(f(X) T )] = Ef(X). (v) If E h(t )f(x) <, then E[h(T )f(x) T ] = h(t )E[f(X) T ] a.s. (vi) If G and G 2 are sub-σ-fields of G with G G 2, then E[E(X G ) G 2 ] = E(X G 2 )..2. Sufficiency Set up X: random observable quantity (the identity function on (X, A, P)) X : sample space, the set of possible values of X A: σ-algebra of subsets of X P: {P θ, θ Ω} is a family of probability measures on A (distributions of X ) T: X T is an A/F measurable function and T (X) is called a statistic. probability space (X, A, P) X sample space (X, A, P) T (T, F, P T ) We adopt this notation because sometimes we wish to talk about T (X( )) the random variable and sometimes about T (X(x)) = T (x), a particular element of T. We shall also use the notation P (A T (x)) for P (A T = T (x)) and P (A T ) for the random variable P (A T ( )) on X.
7 .2. SUFFICIENCY 7 Definition.2.. The statistic T is sufficient for θ(or P) iff the conditional distribution of X given T = t is independent of θ for all t, i.e. there exists an F measurable P (A T = ) such that P (A T = t) = P θ (A T = t) a.s. P T θ for all A A and all θ Ω. Example.2.2. X = (X,..., X n ) iid with pdf f θ (x) w.r.t. dx P = P θ (dx,..., dx n ) = f θ (x ) f θ (x n ) dx dx n T (X) = (X (),..., X (n) ) The probability mass function of X given T = t is where X (i) is the i th order statistic. X T =t pθ (x t) = δ t (x () ) δ tn (x (n) ) n! i.e. it assigns point mass to each x such that x n! () = t,, x (n) = t n. This is independent of θ, indicating that T contains all the information about θ contained in the sample. The Factorization Criterion Definition.2.3. A family of probability measure s P = {P θ : θ Ω} is equivalent to a p.m. λ if λ (A) = 0 P θ (A) = 0 θ Ω. We also say that P is dominated by a σ-finite measure µ on (X, A) if P θ µ for all θ Ω. It is clear that equivalence to λ implies domination by λ. Theorem.2.4. Let P be dominated by a p.m. λ where λ = c i P θi (c i 0, c i = ). i=0 Then the statistic T (with range (T, F)) is sufficient for P there exists an F- measurable function g θ ( ) such that dp θ (x) = g θ (T (x)) dλ(x) θ Ω. Proof. ( ) Suppose T is sufficient for P. Then P θ (A T (x)) = P (A T (x)) θ. Throughout this part of the proof X will denote the indicator function of a subset of X. The preceding equality then implies that E θ (X T ) = E(X T ) X A, θ.
8 8. PRELIMINARIES Hence for all θ Ω, X A, G σ(t ), we have E θ (I G E(X T )) = E θ (E θ (I G X T )) = E θ (I G X). Set θ = θ i, multiply by c i and sum over i = 0,, 2,..., to get E λ (I G E(X T )) = E λ (I G X) X A, G σ(t ). This implies that E(X T ) = E λ (X T ) X A, and hence E θ (X T ) = E(X T ) = E λ (X T ) X A, θ. Now define g θ (T ( )) to be the Radon-Nikodym derivative of P θ with respect to λ, with both regarded as measures on σ(t ). We know this exists since λ dominates every P θ. We also know it is σ(t ) measurable, so it can be written in the form g θ (T ( )), and we know that E θ (X) = E λ (g θ (T )X) for all X σ(t ). We need to establish however that this last relation holds for all X A. We do this as follows. X A E θ (X) = E θ [E(X T )] = E λ [g θ (T )E(X T )] = E λ [E(g θ (T )X T )] = E λ [E λ (g θ (T )X T )] = E λ [g θ (T )X]. This shows that g θ (T (x)) = dp θ dλ (x) when P θ and λ are regarded as measures on A. ( ) Suppose that for each θ, dp θ (x) = g dλ θ(t (x)) for some g θ. We shall then show that the conditional probability P λ (A t) is a version of P θ (A t) θ. A A, G σ(t ) I A dp θ = P θ (A T ) dp θ G G = P θ (A T )g θ (T ) dλ and G I A dp θ = = = G G G G I A g θ (T ) dλ E λ [I A g θ (T ) T ] dλ E λ [I A T ]g θ (T ) dλ P θ (A T )g θ (T ) = E λ (I A T )g θ (T ) a.s. λ and hence a.s. P θ θ. Also g θ (T ) 0, a.s. P θ, since dp θ = g θ (T ) dλ. Hence P θ (A T ) = E λ (I A T ) = P λ (A T ) a.s. P θ and the R.S. is independent of θ.
9 .2. SUFFICIENCY 9 Theorem.2.5. (Theorem A.4.2 in appendix of TSH ) If P = {P θ, θ Ω} is dominated by a σ-finite measure µ, then it is equivalent to λ = i=0 c ip θi for some countable subcollection P θi P, i = 0,, 2,..., with c i 0 and c i =. Proof. µ is σ finite, A n A with A, A 2,... disjoint, and A i = X such that 0 < µ(a i ) <, i =, 2,.... Set µ (A) = i= µ(a A i ) 2 i µ(a i ) Then, µ is a probability measure equivalent to µ. Hence we can assume without loss of generality that the dominating measure µ is a probability measure Let and set Then f θ = dp θ dµ S θ = {x: f θ (x) > 0} (.2.) P θ (A) = P θ (A S θ ) = 0 iff µ(a S θ ) = 0. (Since P θ µ and since µ(a S θ ) > 0, f θ > 0 on A S θ P θ (A S θ ) > 0.) A set A A is a kernel if A S θ for some θ; a finite or countable union of kernels is called a chain. Set α = sup µ(c) chains C Then α = µ(c) for some chain C = n=a n, A n S θn. (since {C n } such that µ(c n ) α and for this sequence µ( C n ) = α.) It follows from the following Lemma that P is dominated by λ( ) = n= P 2 n θn ( ). Since it is obvious that λ(a) = 0 P θn (A) = 0 n P θ (A) = 0 θ (by the Lemma), P θ (A) = 0 θ λ(a) = 0 Hence P is equivalent to λ( ) = n= P 2 n θn ( ). Lemma.2.6. If {θ n } is the sequence used in the construction of C, then {P θ, θ Ω} is dominated by {P θn, n =, 2,...}, i.e. P θn (A) = 0 n P θ (A) = 0 θ TSH stands for Testing Statistical Hypotheses, Lehmann & Romano, 3rd ed., Springer, 2005.
10 0. PRELIMINARIES Proof. P θn (A) = 0 n µ(a S θn ) = 0 n (by.2.) (C S θn ) µ(a C) = 0 (Pθ µ) P θ (A C) = 0 θ If P θ (A) > 0 for some θ then, since P θ (A) = P θ (A C) + P θ (A C c ), P θ (A C c ) = P θ (A C c S θ ) > 0 A C c S θ is a kernel disjoint from C C (A C c S θ ) is a chain with µ > α, (P θ (A) > 0 µ(a) > 0) contradicting the definition of α. Hence, P θ (A) = 0 θ. Theorem.2.7. The Factorization Theorem Let µ be a σ-finite measure which dominates P = {P θ : θ Ω} and let p θ = dp θ dµ. Then the statistic T is sufficient for P if and only if there exists a non negative F- measurable function g θ : T R and an A-measurable function h : X R such that (.2.2) p θ (x) = g θ (T (x)) h (x) a.e. µ. Proof. By theorem.2.5, P is equivalent to λ = i c i P θi, where c i 0, i c i =. If T is sufficient for P, p θ (x) = dp θ (x) dµ (x) = dp θ (x) dλ (x) dλ (x) dµ (x) = g θ (T (x)) h (x) by theorem.2.4. On the other hand, if equation (.2.2) holds, (.2.3) dλ (x) = c i dp θi (x) = c i p θi (x) dµ(x) = c i g θi (T (x)) h (x) dµ (x) i= = K (T (x)) h (x) dµ (x).
11 .2. SUFFICIENCY Thus, dp θ (x) = p θ (x) dµ (x) by the definition of p θ (x) = g θ (T (x)) h (x) dλ (x) by equations (.2.2) and (.2.3) K (T (x)) h (x) = g θ (T (x)) dλ (x) where g θ (T (x)) := 0 if K (T (x)) = 0. Hence T is sufficient for P by theorem.2.4. Remark.2.8. If f θ (x) is the density of X with respect to Lebesgue measure then T is sufficient for P iff f θ (x) = g θ (T (x)) h (x) where h is independent of θ. Example.2.9. Let X, X 2,..., X n be iid N (µ, σ 2 ), µ R, σ > 0, and write X = (X, X 2,, X n ). A σ-finite dominating measure on B n is Lebesgue measure with ( ) n p µ,σ 2 (x) = ( ) n exp x 2 σ 2π 2σ 2 i + µ xi nµ2 σ 2 2σ 2 ( = g µ,σ 2 xi, ) x 2 i. Therefore T (X) = ( X i, X 2 i )is sufficient for P = {P µ,σ 2}. Remark.2.0. T (X) = ( X, ) S 2 is also sufficient for P = {P µ,σ 2}, since ( g µ,σ 2 xi, ) x 2 i = g ( x, ) µ,σ S 2 2 T and T are equivalent in the following sense. Definition.2.. Two statistics T and S are equivalent if they induce the same σ-algebra up to P-null sets. i.e. if there exists a P-null set N and functions f and g such that T (x) = f (S (x)) and S (x) = g (T (x)) for all x N c. Example.2.2. Let X,..., X n be iid U(0, θ), θ > 0 and X = (X,..., X n ). p θ (x) = n I θ n [0, ) (x i )I (,θ] (x i ) = θ n I [0, )(x () )I (,θ] (x (n) ) = g θ (x (n) )h(x) T (X) = X (n) is sufficient for θ.
12 2. PRELIMINARIES Example.2.3. X,..., X n iid N(0, σ 2 ), Ω = {σ 2 : σ 2 > 0}. Define T (X) = (X,..., X n ) T 2 (X) = (X 2,..., X 2 n) T 3 (X) = (X X 2 m, X 2 m+ + + X 2 n) T 4 (X) = X Xn 2 p θ (x) = (σ exp ( 2π) n 2σ 2 n Xi 2 ) Each T i (X) is sufficient. However σ(t 4 ) σ(t 3 ) σ(t 2 ) σ(t ). (since functions of T 4 are functions of T 3, functions of T 3 are functions of T 2 and functions of T 2 are functions of T.) Remark.2.4. If T is sufficient for θ and T = H (S) where S is some statistic, then S is also sufficient since p θ (x) = g θ (T (x)) h (x) = g θ (H (S (x)) h (x) Since σ (T ) = S H B T S B S ((X, A) S (S, B S ) H (T, B T )), T provides a greater reduction of the data than S, strictly greater unless H is one to one, in which case S and T are equivalent. Definition.2.5. T is a minimal sufficient statistic, if for any sufficient statistic S, there exists a measurable function H such that T = H (S) a.s. P. Theorem.2.6. If P is dominated by a σ-finite measure µ, then the statistic U is sufficient iff for every fixed θ and θ 0, the ratio of the densities p θ and p θ0 with respect to µ, defined to be when both densities are zero, satisfies p θ (x) p θ0 (x) = f θ,θ 0 (U (x)) a.s. P for some measurable f θ,θ0. Proof. HW problem (TPE Ch Problem 6.6). Theorem.2.7. Let P be a finite family with densities {p 0, p,..., p k }, all having the same support (i.e. S = {x: p i (x) > 0} is independent of i). Then ( p (x) T (x) = p 0 (x), p 2 (x) p 0 (x),, p ) k (x) p 0 (x) is minimal sufficient. (Also true for a countable collection of densities with no change in the proof.)
13 .2. SUFFICIENCY 3 Proof. First T is sufficient by theorem (.2.6) since p i(x) p j is a function of T (x) for (x) all i and j (need common support here.) If U is a sufficient statistic then by theorem (.2.6), p i (x) p 0 (x) is a function of U for each i T is a function of U T is minimal sufficient. Remark.2.8. The theorem.2.7 extends to uncountable collections under further conditions. Theorem.2.9. Let P be a family with common support and suppose P 0 P. If T is minimal sufficient for P 0 and sufficient for P, then T is minimal sufficient for P. Proof. Remark U is sufficient for P U is sufficient for P 0 by Definition.2.. T is minimal sufficient for P 0 T (x) = H(U(x)) a.s. P 0. But since P has common support, T (x) = H(U(x)) a.s. P. () Minimal sufficient statistics for uncountable families P can often be obtained by combining the above theorems. (2) Minimal sufficient statistics exist under weak assumptions (but not always). In particular they exist if (X, A) = (R n, B n ) and P is dominated by a σ-finite measure. Example.2.2. P 0 : (X,..., X n ) iid N(θ, ), θ {θ 0, θ }. P : (X,..., X n ) iid N(θ, ), θ R. p θ (x) p θ0 (x) { = exp [ (xi θ ) 2 ] } (x i θ 0 ) 2 2 { = exp [ ] } 2xi (θ 0 θ ) + nθ 2 nθ0 2 2 This is a function of x, hence X is minimal sufficient for P 0 by Theorem.2.7. Since X is sufficient for P (by the factorization theorem), X is minimal sufficient for P. Example P : (X,..., X n ) iid U(0, θ), θ > 0. Show that X (n) is minimal sufficient (This is part of problem.6.6 for which you will need to use problem.6.).
14 4. PRELIMINARIES Example Logistic P : (X,..., X n ) iid L(θ, ), θ R. P 0 : (X,..., X n ) iid L(θ, ), θ {0, θ,..., θ n }. p θ (x) = exp [ (x i θ)] n i= { + exp [ (x i θ)]} 2, where so T = (T (X),..., T n (X)) is minimal sufficient, T i (x) = p θ i (x) p 0 (x) = enθ i n j= ( + e x j ) 2 ( + e (x j θ i) ) 2. We will show that T (X) is equivalent to (X (),..., X (n) ), by showing that T (x) = T (y) x () = y (),, x (n) = y (n). Proof. ( ) Obvious from the expression for T i (x). ( ) Suppose that T i (x) = T i (y) for i =, 2,..., n, i.e. i.e. n j= n j= ( + e x j ) 2 ( + e (x j θ i) ) = n ( + e y j ) 2 2 ( + e (y j θ i), i =,..., n, ) 2 + u j ω + u j = n j= j= + v j ω + v j, ω = ω,..., ω n, where u j = e x j, v j = e y j and ω i = e θ i. Here we have two polynomials in ω of degree n which are equal for n + distinct values,, ω,..., ω n, of ω and hence for all ω. ω = 0 n ( + u j ) = j= n ( + u j ω) = j= n ( + v j ) j= n ( + v j ω) ω j= the zero sets of both these polynomials are the same x and y have the same order statistics. By theorem.2.7, the order statistics are therefore minimal sufficient for P 0. They are also sufficient for P, so by theorem.2.9, the order statistics are minimal sufficient for P. There is not much reduction possible here! This is fairly typical of location families, the normal, uniform and exponential distributions providing happy exceptions.
15 .2. SUFFICIENCY 5 Ancillarity Definition A statistic V is said to be ancillary for P if the distribution, P V θ, of V does not depend on θ. It is called first order ancillary if E θ V is independent of θ. Example In example.2.23, X (2) X () is ancillary since Y = X θ,..., Y n = X n θ are iid P 0, and X (2) X () = Y (2) Y (). since Example P : (X,..., X n ) iid N(θ, ), θ R. S 2 = (X i X) 2 is ancillary S 2 = (Y i Ȳ )2 where Y i = X i µ, i =, 2,..., are iid N(0, ). Remark Ancillary statistics by themselves contain no information about θ, however minimal sufficient statistics may contain ancillary components. For example, in.2.23, T = (X (),, X (n) ) is equivalent to T = (X (), X (2) X (),, X (n) X () ), whose last (n ) components are ancillary. You can t drop them as X () is not even sufficient. Complete Statistic A sufficient statistic should bring about the best reduction of the data if it contains as little ancillary material as possible. This suggests requiring that no non-constant function of T be ancillary, or not even first order ancillary, i.e. that or equivalently that E θ f (T ) = c for all θ Ω f (T ) = c a.s. P E θ f (T ) = 0 for all θ Ω f (T ) = 0 a.s. P. Definition A statistic T is complete if (.2.4) E θ f (T ) = 0 for all θ Ω f (T ) = 0 a.s. P T is said to be boundedly complete if equation (.2.4) holds for all bounded measurable functions f. Since complete sufficient statistics are intended to give a good reduction of the data, it is not unreasonable to expect them to minimal. We shall prove a slightly weaker result. Theorem Let U be a complete sufficient statistic. If there exists a minimal sufficient statistic, then U is minimal sufficient. Proof. Let T be a minimal sufficient statistic and let ψ be a bounded measurable function. We will show that ψ(u) σ(t ) i.e. E(ψ(U) T ) = ψ(u) a.s.
16 6. PRELIMINARIES Now E(ψ(U) T ) = g(u) for some measurable g since T is minimal and U is sufficient. Let h(u) = E(ψ(U) T ) ψ(u), then E θ h(u) = 0 θ so h(u) = 0 a.s. P since U is complete. Hence ψ(u) = E(ψ(U) T ) σ(t ). Hence U-measurable bounded functions are T -measurable, i.e. σ(u) σ(t ), i.e. U is minimal sufficient. Remark () If P is dominated by a σ-finite measure and (X, A) = (R n, B n ), the existence of a minimal sufficient statistic does not need to be assumed. (2) A minimal sufficient statistic is not necessarily complete. See the next example. Example.2.3. P = {N(θ, θ 2 ), θ > 0} p θ (x) = θ (x θ) 2 2π e 2 θ 2 = θ 2π e 2 ( x θ )2 The single observation X is minimal sufficient but not complete since E θ [I (0, ) (X) Φ()] = P θ (X > 0) Φ() = 0 however P θ (I (0, ) (X) Φ() = 0) = 0 θ. Theorem (Basu s theorem) If T is complete and sufficient for P, then any ancillary statistic is independent of T. Proof. If S is ancillary, then P θ (S B) = p B, independent of θ. Sufficiency of T P θ (S B T ) = h(t ), independent of θ. E θ (h(t ) p B ) = 0 h(t ) = p B a.s. P S is independent of T by completeness.3. Exponential Families. Definition.3.. A family of probability measure s {P θ : θ Ω} is said to be an s-parameter exponential family if there exists a σ-finite measure µ such that ( s ) p θ (x) = dp θ (x) dµ (x) = exp η i (θ) T i (x) B (θ) h (x), where η i, T i and B are real-valued. Remark.3.2. () P θ, θ Ω are equivalent (since {x: p θ (x) > 0} is independent of θ). θ
17 .3. EXPONENTIAL FAMILIES. 7 (2) The factorization theorem implies that T = (T,, T s ) is sufficient. (3) If we observe X,..., X n, iid with marginal distributions P θ then n j= T (X j) is sufficient for θ. Theorem.3.3. If {, η,..., η s } is LI, then T = (T,..., T s ) is minimal sufficient. (Linear independence of {, η,..., η s } means c η (θ) + + c s η s (θ) + d = 0 θ c = = c s = d = 0. Equivalently we can say that {η i } is affinely independent or AI since the set of points {(η (θ),..., η s (θ)), θ Ω} then lie in a proper affine subspace of R s.) (.3.) Proof. Fix θ 0 Ω and consider { s } dp θ (x) = p θ(x) dp θ0 p θ0 (x) = exp {B(θ 0) B(θ)} exp (η i (θ) η i (θ 0 )) T i (x). If {, η,..., η s } is LI then so is {, η η (θ 0 ),..., η s η s (θ 0 )}. Set S = {(η (θ) η (θ 0 ),..., η s (θ) η s (θ 0 )), θ Ω} R s. subspace of R s. Then span(s) is a linear If dim(span(s)) < s, then there exists a non-zero vector v = (v,..., v s ) s.t. v (η (θ) η (θ 0 )) + + v s (η s (θ) η s (θ 0 )) = 0 θ contradicting the linear independence of {, η i η i (θ 0 )}. Hence (.3.2) dim(span(s)) = s i.e. θ,..., θ s Ω s.t. {(η (θ i ) η (θ 0 ),, η s (θ i ) η s (θ 0 )), i =,, s} is LI. From.3., s j= (η j (θ i ) η j (θ 0 ))T j (x) = ln p θ i (x) p θ0 (x) + (B(θ i) B(θ 0 ))i =,..., s. Since the matrix [η j (θ i ) η j (θ 0 )] s i,j= is non-singular, T j (x) can be expressed uniquely in terms of ln p θ i (x) p θ0, i =,..., s. (x) But p θ i (x), i =,..., s is minimal sufficient for P p θ0 (x) 0 = {P θj, j = 0,,, s} by theorem.2.7. Hence T is minimal sufficient by theorem.2.9.
18 8. PRELIMINARIES Example.3.4. p θ (x) = θ 2π exp{ 2 θx2 + θx θ 2 }. η (θ) = 2 θ, η 2(θ) = θ, T (x) = (x 2, x) is sufficient but not minimal θ since rewriting the model as p θ (x) = 2π exp{ 2 θ(x )2 }, we see that T (x) = (x ) 2 is minimal sufficient. Remark.3.5. The exponential family can always be rewritten in such a way that the functions {T i } and {η i } are AI. If there exist constants c,..., c s, d, not all zero, such that c T (x) + + c s T s (x) = d a.s. P then one of the T i s can be expressed in terms of the others (or is constant). After reducing the number of functions T i as far as possible, the same can be done with their coefficients until the new functions {T i } and {η i } are AI. Definition.3.6. (Order of the exponential family.) If the functions {T i, i =,..., s} on X and {η i, i =,..., s} on Ω are both AI, then s is the order of the exponential family ( s ) p θ (x) = dp θ (x) = exp η i (θ) T i (x) B (θ) h (x). dµ Proposition.3.7. The order is well-defined. Proof. We shall show that s + = dim(v ) where V is the set of functions on X defined by V = span{, ln dp θ dp θ0 ( ), θ Ω} (independent of the dominating measure µ and the choice of {η i }, {T i }). ln dp θ dp θ0 (x) = s (η i (θ) η i (θ 0 ))T i (x) + B(θ 0 ) B(θ) i= so that V span{, T i ( ), i =,..., s} dim(v ) s + On the other hand, since {, η i, i =,..., s} is LI, each T j (x) can be expressed as a linear combination of, ln dp θ i dp θ0 (x), i =,..., s, as in the proof of the previous theorem, span{, T i ( ), i =,..., s} V s + dim(v )
19 .3. EXPONENTIAL FAMILIES. 9 Definition.3.8. (Canonical Form) For any s-parameter exponential family (not necessarily of order s) we can view the vector η(θ) = (η (θ),..., η s (θ)) as the parameter rather than θ. Then the density with respect to µ can be rewritten as s p(x, η) = exp[ η i T i (x) A(η)]h(x), η η(ω). Since p(, η) is a probability density with respect to µ, (.3.3) e A(η) = e s η it i (x) h(x)dµ(x). i= Definition.3.9. (The Natural Parameter Set) This is a possibly larger set than {η(θ), θ Ω}. It is the set of all s-vectors for which, by suitable choice of A(η), p(, η) can be a probability density, i.e. N = {η = (η,, η s ) R s : e s η it i (x) h(x)dµ(x) < } Theorem.3.0. N is convex. Proof. Suppose α = (α,..., α s ) and β = (β,..., β s ) N. Then, e p s α it i (x)+( p) s β it i (x) h(x) dµ(x) [ e p [ α i T i (x) h(x) dµ(x)] + e β i T i (x) h(x) dµ(x)] p (Holder s Inequality) < Theorem.3.. T = (T,, T s ) has density p η (t) = exp (η t A (η)) relative to ν = µ T where d µ (x) = h (x) dµ (x). Proof. If f:t R is a bounded measurable function, Ef(T ) = f(t (x))e η T (x) e A(η) d µ(x) = f(t)e η t e A(η) d µ T (t) Definition.3.2. The family of densities p η (t) = exp (η t A (η)), η η(ω), n is called an s-dimensional or s-parameter standard exponential family. (Defined on R s, not X.)
20 20. PRELIMINARIES Theorem.3.3. Let {p η (x)} be the s-parameter exponential family, ( s ) p η (x) = exp η i (θ) T i (x) B (θ) h (x)), η η(ω), and suppose (.3.4) i= φ (x) e s η jt j (x) dµ (x) exists and is finite for some φ and all η j = a j + ib j such that a N (=natural parameter space). Then (i) φ (x) e s η jt j (x) dµ (x) is an analytic function of each η i on {η : R (η) int (N )} and (ii) the derivative of all orders with respect to the η i s of φ (x) e s η jt j (x) dµ (x) can be computed by differentiating under the integral sign. Proof. Let a 0 = (a 0,..., a 0 s) be in int(n ) and let η 0 = a 0 + ib 0. Then φ(x)e s 2 η jt j (x) = h (x) h 2 (x) + i(h 3 (x) h 4 (x)) where h and h 2 are the positive and negative parts of the real part and h 3 and h 4 are the positive and negative parts of the imaginary part. Then φ (x) e s η jt j (x) dµ (x) can be expressed as e η T (x) dµ (x) e η T (x) dµ 2 (x) + i e η T (x) dµ 3 (x) i e η T (x) dµ 4 (x), where dµ i (x) = h i (x) dµ(x), i =,..., 4. Hence it suffices to prove (i) and (ii) for ψ(η ) = e η T (x) dµ(x). Since a 0 int(n ), there exists δ > 0 s.t. ψ(η ) exists and is finite for all η with a a 0 < δ. Now consider the difference quotient ψ(η ) ψ(η 0 ( ) ) = e η0 η η 0 T (x) e(η η 0)T (x) µ(dx) with η η η 0 η 0 < δ/2. Observe that e zt = (zt) j j! zt j = e zt j! zt e zt ezt t e zt z
21 .3. EXPONENTIAL FAMILIES. 2 The integrand in (*) is therefore bounded in absolute value by T (x) e (a0 + δ 2 ) T(x), where a 0 = Re(η) 0 and T (x) e (a0 + δ 2 ) T(x) µ(dx) < since T e δ 4 T }{{} e(a0 + 3δ 4 )T }{{} if T > 0 T e (a0 + δ 2 ) T = bounded integrable {}}{{}}{ T e δ 4 T e (a0 + δ 4 )T if T < 0 (independent of η ). Letting η η 0 in (*) and using the dominated convergence theorem therefore gives (.3.5) φ (η) 0 = T (x)e η0 T(x) µ(dx), where the integral exists and is finite η 0 which is the first component of some η 0 for which Re(η 0 ) N. Applying the same argument to (.3.5) which we applied to (.3.4) existence of all derivatives (i) and (ii). Theorem.3.4. For an exponential family of order s in canonical form and η int (N ), where N is the natural parameter space, ( ) T (i) E η (T ) = A = A η η,, A η s, and (ii) Cov η (T ) = 2 A = η η T [ 2 A η i η j ] s i,j=. so Proof. From theorem.3. e A(η) = e η t ν(dt) = (i) A η i e A(η) = T i (x)e η T (x) h(x)µ(dx) whence E η T i = A (ii) η i. e η T (x) h(x)µ(dx) 2 A η i η j e A(η) + A A η i η j e A(η) = T i (x)t j (x)e η T (x) h(x)µ(dx) i.e. 2 A η i η j = E η (T i T j ) E η (T i )E η (T j ) = Cov η (T i, T j ) Higher order moments of T,, T s are frequently required, e.g. α r r s = E(T r Ts rs ) µ r r s = E[(T E(T )) r (T s E(T s )) rs ]
22 22. PRELIMINARIES etc. These can often be obtained readily from the MGF: M T (u,, u s ) := E(e u T + +u st s ) If M T exists in some neighborhood of 0 ( u 2 i < δ), then all the moments α r,,r s exist and are the coefficients in the power series expansion r u u rs s M T (u,, u s ) = α r,,r s r r,...,r! r s! s The cumulant generating function, CGF, is sometimes more convenient for calculations, especially in connection with sums of independent random vectors. The CGF is defined as K T (u,, u s ) := log M T (u,, u s ). If M T exists in a neighborhood of 0, then so does K T and u r u rs s K T (u,, u s ) = K r r s r! r s!, r,,r s=0 where the coefficients K r r s are called the cumulants of T. The moments and cumulants can be found from each other by formal comparison of the two series. Theorem.3.5. If X has the density s p η (x) = exp [ η i T i (x) A(η)]h(x) i= w.r.t some σ-finite measure µ, then for any η int(n ) the MGF and CGF of T exist in a neighborhood of 0 and Proof. HW problem. K T (u) = A(η + u) A(η) M T (u) = e A(η+u) A(η) Summary on Exponential Families. The family of probability measures {P θ } with densities relative to some σ-finite measure µ, (.3.6) p θ (x) = dp s θ dµ (x) = exp{ η i (θ)t i (x) B(θ)}h(x), θ Ω, is an s-parameter exponential family By redefining the functions T i ( ) and η i ( ) if necessary, we can always arrange for both sets of functions to be affinely independent. The number of summands in the exponent is then the order of the exponential family.
23 .3. EXPONENTIAL FAMILIES. 23 If {, η,..., η s } and {, T,..., T s } are both L.I., then the family is said to be minimal and s = dim(span{, log dp θ ( ), θ Ω}) dp θ0 = order of the exponential family Remark.3.6. Since (.3.6) is by definition a probability density w.r.t. µ for each θ Ω, we have { } exp ηi (θ)t i (x) B(θ) h(x)µ(dx) = { } exp B(θ) = exp ηi (θ)t i (x) h(x)µ(dx) which shows that the dependence of B on θ is through η(θ) = (η (θ),..., η s (θ)) only, i.e. B(θ) = A(η(θ)). Remark.3.7. The previous note implies that each member of the family (.3.6) is a member of the family. s (.3.7) π ξ (x) = exp{ ξ i T i (x) A(ξ)}h(x), ξ = (ξ,..., ξ s ) η(ω) (in fact p θ (x) = π η(θ) (x)). The family of densities {π ξ, ξ η(ω)} defined by (.3.7) is the canonical family associated with (.3.6). It is the same family parameterized by the natural parameter, ξ =vector of coefficients of T i (x), i =,..., s. Remark.3.8. Instead of restricting ξ to the set η(ω), it is natural to extend the family (.3.7) to allow all ξ R s for which we can choose a value of A(ξ) to make (.3.7) a probability density, i.e. for which (.3.8) exp{ ξ i T i (x)}h(x)µ(dx) < N = {ξ R s : (.3.8) holds} is the natural parameter space of the family (.3.7). Remark.3.9. N η(ω) since (.3.7) is by definition a family of probability densities. Definition (Full rank family) As with the original parameterization, we can always redefine ξ to ensure that {T,..., T s } is A.I. If η(ω) contains an s-dimensional rectangle and {T ( ),..., T s ( )} is A.I., then T is minimal sufficient and we say the family (.3.7) is of full rank. (A full rank family is clearly minimal.)
24 24. PRELIMINARIES Remark.3.2. Since N η(ω), full rank int(n ) φ and this is important in view of the consequence of theorem.3.3 that s e A(ξ) = exp( ξ i T i (x))h(x)µ(dx) i= is analytic in each ξ i on the set of s-dimensional complex vectors, ξ : Re(ξ) int(n ). (So derivatives of e A(ξ) w.r.t. ξ i, i =,..., s of all orders can be obtained by differentiation under the integral, yielding explicit expressions for the moments of T for all values of the canonical parameter vector ξ int(n ).) Example Multinomial X M(θ 0,..., θ s ; n) = (X 0,..., X s ), where X i = number of outcomes of type i in n independent trials where θ i, i = 0,..., s, is the probability of an outcome of type i on any one trial. Ω = {θ : θ 0 0,, θ s 0, θ θ s = } () Probability density with respect to counting measure on Z s+ + n! s p θ (x) = x 0! x s! θx 0 0 θs xs I [0, n] (x i )I {n} ( x i ) = exp{ i=0 s x i log θ i }h(x), θ Ω. i=0 This is an (s + )-parameter exponential family with T i (x) = x i, η i (θ) = log θ i. The vectors η(θ), θ Ω, are not confined to a proper affine subspace of R s, so T is minimal sufficient. (2) {T 0,..., T s } is not A.I. since T T s = n. Setting T 0 (x) = x 0 = n x x n gives p θ (x) = h(x) exp{n log θ 0 + s i= x i log θ i θ 0 } Redefining η(θ) = (log θ θ 0,, log θs θ 0 ), we now have an s-parameter representation in which {T,..., T s } is A.I., since the vectors (x,, x s ), x X, are subject only to the constraints x i 0 and s i= x i n. (3) Furthermore the new parameter vectors, η(θ) = (log θ θ 0,, log θs θ 0 ), θ Ω, are not confined to any proper affine subspace of R s, since for any x R s θ 0,..., θ s such that η(θ) = x and so η(ω) = R s. Hence T (x) = (x,..., x s ) is minimal sufficient for P and the order of the family is s. (4) The canonical representation of the family (2) is π ξ (x) = exp{ s ξ i x i A(ξ)}h(x), ξ η(ω) = {(log θ θ 0,, log θ s θ 0 ): θ Ω}
25 .3. EXPONENTIAL FAMILIES. 25 We know from remark.3.6 before that B(θ) = A(η(θ)) for some function A( ). Although it is not necessary, we can verify this directly in this example, since from the representation (2) we have and B(θ) = n log θ 0 θ 0 = θ θ s θ 0 = + θ θ θ s θ 0 = + e η (θ) + + e ηs(θ) A(ξ) = n log( + e ξ + + e ξs ) A(ξ) is of course also determined by e A(ξ) = B(θ) = n log( + e η (θ) + + e ηs(θ) ) exp{ s ξ i x i }h(x)dµ(x). (5) The natural parameter space in this case is N = R s, since we know that N η(ω) and η(ω) = R s by (3) above. Clearly N contains an s-dimensional rectangle and {T,..., T s } is A.I., hence {π ξ (x), ξ N } is of full rank. (6) Moments of T (X) = (X,..., X s ) Theorem.3.4 E ξ T i = A ξ R s ξ i ne ξ i = + e ξ + + e ξ s nθ i /θ 0 = + θ θ θs θ 0 = nθ i and Cov(T i, T j ) = 2 A ξ i ξ j { = ne ξ i (+ +e ξs ) (Moments exist ξ int(n ) = R s ) ne ξ ie ξ j = nθ (+e ξ + +e ξs ) 2 i θ j i j ne2ξ i = nθ (+ +e ξs ) 2 i ( θ i ) i = j
26 26. PRELIMINARIES Theorem (Sufficient condition for completeness of T ) If ( s ) π ξ (x) = exp ξ i T i (x) A (ξ) h (x), ξ η (Ω) i= is a minimal canonical representation of the exponential family P = {p θ : θ Ω} and η (Ω) contains an open subset of R s, then T = (T,..., T s )is complete for P. Proof. Suppose E ξ (f(t )) = 0 ξ η(ω). Then, (.3.9) E ξ f + (T ) = E ξ f (T ) ξ η(ω). Choose ξ 0 int(η(ω)) and r > 0 such that N(ξ 0, r) := {ξ : ξ ξ 0 < r} η(ω). Now define the probability measures, λ + (A) = f + e ξ0 t ν(dt) A f J + e ξ 0 t ν(dt), ν = µ T, d µ(x) = h(x)µ(dx), λ (A) = f e ξ0 t ν(dt) A f J e ξ 0 t ν(dt), where we have assumed that ν({t: f(t) 0}) > 0, since otherwise f = 0 a.s. P T and we are done. Observe now that (.3.0) e δ t λ + (dt) = e δ t λ (dt) δ R s with δ < r since by (.3.9) L.S. = = J J f + (t)e (ξ0+δ) t ν(dt)/ f (t)e (ξ0+δ) t ν(dt)/ J J f + (t)e ξ 0 t ν(dt) f (t)e ξ 0 t ν(dt) Now consider each side of (.3.0) as a function of the complex argument δ = δ 0 + iθ, θ R s. Then L(δ) = R(δ) δ = δ 0 + i θ with δ 0 < r, since (by Theorem.3.3 (i)) both sides are analytic in each component of δ on the set where Re(ξ 0 + δ) N and they are equal when δ is real. In particular, L(iθ) = e iθ t λ + (dt) = R(iθ) = e iθ t λ (dt)
27 .4. CONVEX LOSS FUNCTION 27 for all θ R s. Hence λ + and λ have the same characteristic function λ + = λ f + = f a.s., contradicting ν(f 0) > 0. So f = 0 a.s. ν. Example X,..., X n iid N(σ, σ 2 ) p σ (x) = (σ 2π) exp{ x 2 n 2σ 2 i + xi n σ 2 }, η (σ) = 2σ 2, η 2 (σ) = σ, T (x) = x 2 i T 2(x) = x i η(ω) does not contain a 2-dim rectangle in R 2. T (x) = ( x 2 i, x i ) is not complete since E θ ( x 2 i 2 n + ( x i ) 2 ) = n(2σ 2 ) 2 n + (nσ2 + n 2 σ 2 ) = 0 but there exists no P-null set N such that x 2 i 2 ( x n+ i ) 2 = 0 on N c..4. Convex Loss Function Lemma.4.. Let φ be a convex function on (, ) which is bounded below and suppose that φ is not monotone. Then, φ takes on its minimum value c and φ (c)is a closed interval and is a singleton when φis strictly convex. Proof. Since φ is convex and not monotone, lim φ (x) =. x ± Since φ is continuous, φ attains its minimum value c. φ ({c}) is closed by continuity and interval by convexity. The interval must have zero length if φ is strictly convex. Theorem.4.2. Let ρ be a convex function defined on (, ) and X a random variable such that φ (a) = E (ρ (X a)) is finite for some a. If ρ is not monotone, φ (a)takes on its minimum value and φ (a) is a closed set and is a singleton when ρ is strictly convex. Proof. By the lemma, we only need to show that φ is convex and not monotone. Because lim t ± ρ (t) = and lim a ± x a = ±, so that φ is not monotone. The convexity comes from lim φ (a) = a ± φ (pa + ( p) b) = Eρ (p (X a) + ( p) (X b)) E (pρ (X a) + ( p) ρ (X b)) = pφ (a) + ( p) φ (b).
28
29 CHAPTER 2 Unbiasedness 2.. UMVU estimators. Notation. P={P θ, θ Ω} is a family of probability measures on A (distributions of X). T:X R is an A/B measurable function and T (or T (X)) is called a statistic. g : Ω R is a function on Ω whose value at θ is to be estimated. if (X, A, P θ ) X (X, A, P θ ) T ( R, B, P T θ Definition 2... A statistic T (or T (X)) is called an unbiased estimators of g (θ) E θ (T (X)) = g (θ) for all θ Ω. Objectives of point estimation. In order to specify what we mean by a good estimator of g(θ), we need to specify what we mean when we say that T (X) is close to g(θ). A fairly general way of defining this is to specify a loss function: L(θ, d) = cost of concluding that g(θ) = d, when the parameter value is θ. L(θ, d) 0 and L(θ, g(θ)) = 0. Since T (X) is a random variable, we measure the performance of T (X) for estimating g(θ) in terms of its expected (or long-term average) loss known as the risk function. R(θ, T ) = E θ L(θ, T (X)), Choice of a loss function will depend on the problem and the purpose of the estimation. For many estimation problem, the conclusion is not particularly sensitive to the choice of loss function within a reasonable range of alternatives. Because of this and especially because of its mathematical convenience, we often choose (and will do so in this chapter) the squared-error loss function with corresponding risk function L(θ, d) = (g(θ) d) 2 (2..) R(θ, T ) = E θ (T (X) g(θ)) 2 29 )
30 30 2. UNBIASEDNESS Ideally we would like to choose T to minimize (2..) uniformly in θ. Unfortunately this is impossible since the estimator T defined by (2..2) T (x) = g(θ 0 ) x X (where θ 0 is some fixed parameter value in Ω) has the risk function, { 0 if θ = θ R(θ, T ) = 0 (g(θ) g(θ 0 )) 2 if θ θ 0 An estimator which simultaneously minimized R(θ, T ) for all θ Ω would necessarily have R(θ, T ) = 0 θ Ω and this is impossible except in trivial cases. Why consider the class of unbiased estimators? There is nothing intrinsically good about unbiased estimators. The only criterion for goodness is that R(θ, T ) should be small. The hope is that by restricting attention to a class of estimators which excludes (2..2), we may be able to minimize R(θ, T ) uniformly in θ and that the resulting estimator will give small values of R(θ, T ). This programme is frequently successful if we attempt to minimize R(θ, T ) with T restricted to the class of unbiased estimators of g(θ). Definition g(θ) is U-estimable, if there exists an unbiased estimator of g(θ). Example X,..., X n iid Bernoulli(p), p (0, ). g(p) = p is U-estimable, since E X n = p p (0, ), while h(p) = is not U-estimable, since if p T (x)p xi ( p) n x i = p p (0, ), lim p 0 RS = and lim p 0 LS = T (0). So T (0) =, but this is not possible since then E p T (X) = p (0, ). p n i= X i Remark a.s. n p and n n a.s. i= X i p n p (0, ). Hence reasonable estimate of p even though it is not unbiased. Theorem If T 0 is an unbiased estimator of g (θ) then the totality of unbiased estimators of g (θ)is given by {T 0 U : E θ U = 0 for all θ Ω}. Proof. If T is unbiased for g(θ), then T = T 0 (T 0 T ) where E θ (T 0 T ) = 0 θ Ω. Conversely if T = T 0 U where E θ U = 0 θ Ω, then E θ T = E θ T 0 = g(θ) θ Ω. Remark For squared error loss, L(θ, d) = (d g(θ)) 2, the risk R(θ, T ) is R(θ, T ) = E θ ((T (X) g(θ)) 2 ) = V ar θ (T (X)) if T is unbiased = V ar θ (T 0 (X) U) = E θ [(T 0 (X) U) 2 ] g(θ) 2 Xi is a
31 2.. UMVU ESTIMATORS. 3 and hence the risk is minimized by minimizing E θ [(T 0 (X) U) 2 ] with respect to U, i.e. by taking any fixed unbiased estimator of g(θ) and finding the unbiased estimator of zero which minimizes E θ [(T 0 (X) U) 2 ]. Then if U does not depend on θ we shall have found a uniformly minimum risk estimator of g(θ), while if U depends on θ, there is no uniformly minimum risk estimator. Note that for unbiased estimators and squared error loss, the risk is the same as the variance of the estimator, so uniformly minimum risk unbiased is the same as uniformly minimum variance unbiased in this case. Example P (X = ) = p, P (X = k) = q 2 p k, k = 0,,..., where q = p. U is unbiased for 0 0 = T 0 (X) = I { } (X) is unbiased for p, 0 < p < T (X) = I {0} (X) is unbiased for q 2, U(k)P (X = k) = pu( ) + k= = U(0) + U(k)q 2 p k k=0 (U(k) 2U(k ) + U(k 2))p k k= U(k) = ku( ) = ka for some a (comparing coefficients of p k, k = 0,, 2,...) So an unbiased estimator of p with minimum risk (i.e. variance) is T 0 (X) a 0X where a 0 is the value of a which minimizes E p (T 0 (X) ax) 2 = P p (X = k)[t 0 (k) ak] 2 Similarly an unbiased estimator of q 2 with minimum risk (i.e. variance) is T (X) a X where a is the value of a which minimizes E p (T (X) ax) 2 = P p (X = k)[t (k) ak] 2 Some straightforward calculations give a p 0 = p + q 2 k2 p k and a = 0 Since a is independent of p, the estimator T (X) of q 2 is minimum variance unbiased for all p, i.e. UMVU. However a 0 does depend on p and so the estimator T0 (X) = T 0 (X) a 0X is only locally minimum variance unbiased at p. (We are using estimator in a generalized sense here since T0 (X) depends on p. We shall continue to use this terminology.) An UMVU estimator of p does not exist in this case. Definition Let V (θ) = inf T V ar θ (T ) where the inf is over all unbiased estimators of g(θ). If an unbiased estimator T of g(θ) satisfies V ar θ (T ) = V (θ) θ Ω it is called UMVU p
32 32 2. UNBIASEDNESS If V ar θ0 T = V (θ 0 ) for some θ 0 Ω T is called LMVU at θ 0 Remark Let H be the Hilbert space of functions on X which are square integrable with respect to P (i.e. with respect to every P θ P), and let U be the set of all unbiased estimators of 0. If T 0 is an unbiased estimator of g(θ) in H, then a LMVU estimator in H at θ 0 is T 0 P U (T 0 ), where P U denotes orthogonal projection on U in the inner product space L 2 (P θ0 ), i.e. P U (T 0 ) is the unique element of U such that T 0 P U (T 0 ) U (in L 2 (P θ0 )). T 0 P U (T 0 ) is LMVU since P U (T 0 ) = arg min U U E θ0 (T 0 U) 2. Notation We denote the set of all estimators T with E θ T 2 < for all θ Ω by and the set of all unbiased estimators of 0 in by U. Theorem 2... An unbiased estimator T of g (θ) is UMVU iff E θ (T U) = 0 for all U U and for all θ Ω. (i.e. Cov θ (T, U) = 0 since E θ U = 0 for all θ and E θ T = g (θ) for all θ Ω.) Proof. ( ) Suppose T is UMVU. For U U, let T = T + λu with λ real. Then T is unbiased and, by definition of T, V ar θ (T ) = V ar θ (T ) + λ 2 V ar θ (U) + 2λCov θ (T, U) V ar θ (T ) therefore, λ 2 V ar θ (U) + 2λCov θ (T, U) 0. Setting λ = Cov θ(t,u) V ar θ (U) to this inequality unless Cov θ (T, U) = 0. Hence Cov θ (T, U) = 0. gives a contradiction ( ) If E θ (T U) = 0 U U and θ Ω, let T be any other unbiased estimator. If V ar θ (T ) =, then V ar θ (T ) < V ar θ (T ), so suppose V ar θ (T ) <. Then T = T U, for some U which is unbiased for 0 (by Theorem 2..5). Hence U = T T E θ U 2 = E θ (T T ) 2 2E θ T 2 + 2E θ T 2 < U U V ar θ (T ) = V ar θ (T U) = V ar θ (T ) + V ar θ (U) 2Cov θ (T, U) V ar θ (T ) since Cov θ (T, U) = 0, T is UMVU.
33 2.. UMVU ESTIMATORS. 33 Unbiasedness and sufficiency. Suppose now that T is unbiased for g(θ) and S is sufficient for P = {P θ, θ Ω}. Consider Then (a) (b) T = E θ (T S) = E(T S) independent of θ E θ T = E θ E(T S) = E θ (T ) = g(θ) θ. V ar θ (T ) = E θ (T E(T S) + E(T S) g(θ)) 2 = E θ ((T E(T S)) 2 ) + V ar θ (T ) + 2E θ [(T E(T S))(E(T S) g(θ))] V ar θ (T ). On the second line we used the fact that T E(T S) is orthogonal to σ(s). The inequality on the third line is strict for all θ T = E(T S) a.s. P. Theorem If S is a complete sufficient statistic for P, then every U-estimable function g (θ) has one and only one unbiased estimator which is a function of S. Proof. T unbiased E(T S) is unbiased and a function of S T (S), T 2 (S) unbiased E θ (T (S) T 2 (S)) = 0 θ T (S) = T 2 (S) a.s. P (completeness) Theorem (Rao-Blackwell) Suppose S is a complete sufficient statistic for P. Then (i) If g (θ) is U-estimable, there exists an unbiased estimator which uniformly minimizes the risk for any loss function L (θ, d) which is convex in d. (ii) The UMV U in (i) is the unique unbiased estimator which is a function of S; it is the unique unbiased estimator with minimum risk provided the risk is finite and L is strictly convex in d. Proof. (i) L(θ, d) convex in d means L(θ, pd + ( p)d 2 ) pl(θ, d ) + ( p)l(θ, d 2 ), 0 < p <. Let T be any unbiased estimator of g(θ) and let T = E(T S), another unbiased estimator of g(θ). Then R(θ, T ) = E θ [L(θ, E(T S))] E θ [E θ (L(θ, T ) S)], by Jensen s inequality for conditional expectation, = E θ L(θ, T ) = R(θ, T ) θ.
34 34 2. UNBIASEDNESS If T 2 is any other unbiased estimator then T 2 = E(T 2 S) = T a.s. P by Theorem Hence starting from any unbiased estimator and conditioning on the CSS S gives a uniquely defined unbiased estimator which is UMVU and is the unique function of S which is unbiased for g(θ). (ii) The first statement was established at the end of the proof of (i). If T is UMVU then so is T = E(T S) as shown in (i); We will show that T is necessarily the uniquely determined unbiased function of S, by showing that T is a function of S a.s. P. The proof is by contradiction. Suppose that "T is a function of S a.s. P" is false. Then there exists θ and a set of positive P θ measure where But this implies that R(θ, T ) = E θ (L(θ, E(T S))) < E θ (E θ (L(θ, T ) S)) T := E(T S) T (Jensen s inequality is strict unless E(T S) = T a.s. P θ ) = R(θ, T ) contradicting the UMVU property of T. Theorem If P is an exponential family of full rank (i.e. {η,..., η s } and {T,..., T s } are A.I. and η (Ω) contains an open subset of R s ) then the Rao-Blackwell theorem applies to any U-estimable g (θ) with S = T. Proof. T is complete sufficient for P. [Some obvious U-estimable g(θ) s are E θ T i (X) = A ξ i ξ=η(θ), {θ : η(θ) int(n )}, where π ξ (x) = e ξ i T i (x) A(ξ) h(x) is the canonical representation of p θ (x).] Two methods for finding UMVU s Method. Search for a function δ(t ), where T is a CSS, such that E θ δ(t ) = g(θ), θ Ω.
35 2.. UMVU ESTIMATORS. 35 Example X,..., X n iid N(µ, σ 2 ), µ R, σ 2 > 0. T =( X, S 2 ) is CSS. E µ,σ 2 X = µ X is UMVU for µ. Method 2. Search for an unbiased δ(x) and a CSS T. Then S = E(δ(X) T ) is UMV U Example X,..., X n iid U(0, θ), θ > 0 g(θ) = θ 2 δ (X) = X is unbiased X (n) is CSS S = E(X X (n) ) is UMVU To compute S we note that given X (n) = x, X = x w.p. n Remark X U(0, x) w.p. n S(x) = x n + ( n )x 2 = n + x n 2 S(X (n) ) = n + 2 n X (n) is UMVU for θ 2 n + n X (n) is UMVU for θ (a) Convexity of L(θ, ) is crucial to the Rao-Blackwell theorem. (b) Large-sample theory tends to support the use of convex L(θ, ). Heuristically if X,..., X n are iid, then as n the error in estimating g(θ) 0 for any reasonable estimates (in some probabilistic sense). Thus only the behavior of L(θ, d) for d close to g(θ) is relevant for large samples. A Taylor expansion around d = g(θ) gives But L(θ, d) = a(θ) + b(θ)(d g(θ)) + c(θ)(d g(θ)) 2 + Remainder L(θ, g(θ)) = 0 a(θ) = 0 L(θ, d) 0 b(θ) = 0 Hence locally, L(θ, d) c(θ)(d g(θ)) 2, a convex weighted squared error loss function.
36 36 2. UNBIASEDNESS Example Observe X,..., X m, iid N(ξ, σ 2 ), and Y,..., Y n, iid N(η, τ 2 ), independent of X,..., X m. (i) For the 4-parameter family P = {P ξ,η,σ 2,τ 2}, ( X, Ȳ, S2 X, S2 Y ) is a CSS since the exponential family is of full rank. Hence X and SX 2 are UMVU for ξ and σ2 respectively and Ȳ and S2 Y are UMVU for η and τ 2. (ii) For the 3-parameter family P = {P ξ,η,σ 2,σ2}, ( X, Ȳ, SS) is a CSS, where SS := (m )SX 2 + (n )S2 Y. Hence X, Ȳ and SS are UMVU for ξ, η and σ2 m+n 2 respectively. (iii) For the 3-parameter family with ξ = η, σ 2 τ 2 (which arises when estimating a mean from 2 sets of readings with different accuracies), ( X, Ȳ, S2 X, S2 Y ) is minimal sufficient but not complete, since X Ȳ 0 a.s. P, but E θ( X Ȳ ) = 0 θ. To deal with Case (iii) we shall first show the following: If σ2 = r for some fixed τ 2 r, i.e. P = {P ξ,ξ,rτ 2,τ 2} then T = ( X i + r Y j, X 2 i + r Y 2 j ) is CSS Proof. p ξ,τ 2(x, y) = exp (2π) m+n 2 (rτ 2 ) m 2 (τ 2 ) n 2 { x 2 2rτ 2 i + rτ = exp { A(ξ, τ 2 ) } exp 2 mξ x mξ2 2rτ 2 { 2rτ 2 ( x 2 i + r y 2 i ) + y 2 2τ 2 i + } nξ2 nξȳ τ 2 2τ 2 ξ rτ ( x 2 i + r } y i ) Since T is a CSS for P and since T = X i +r Y i m+rn for ξ in P. is unbiased for ξ, it is UMVU T is also unbiased for ξ in P = {P ξ,ξ,σ 2,τ 2} V (ξ 0, σ 2 0, τ 2 0 ) V ar ξ0,σ 2 0,τ 2 0 (T ) = σ0τ 2 0 2, where mτ0 2 + nσ0 2 σ0 2 τ0 2 = r. (V is the smallest variance of all unbiased estimators of ξ for P evaluated at ξ 0, σ 2 0, τ 2 0.)
37 2.2. NON-PARAMETRIC FAMILIES 37 On the other hand, every T which is unbiased for ξ in P is also unbiased in P. Hence if T is unbiased for ξ in P, then V ar ξ0,σ0 2,τ 0 2(T ) V ar ξ 0,σ0 2,τ 0 2( Xi + r Y i ), where r = σ2 0, m + rn τ0 2 and the inequality continues to hold with the left-hand side replaced by V (ξ 0, σ0, 2 τ0 2 ). So V (ξ 0, σ0, 2 τ0 2 ) = σ2 0 τ 0 2 mτ0 2+nσ2 0 and the LMVU estimator at (ξ 0, σ0, 2 τ0 2 ) is Xi + σ2 0 τ 2 0 Yi m + σ2 0 τ 2 0 n Since this estimate depends on the ratio r = σ2 0, an UMVU for ξ does not exist τ0 2 in P. A natural estimate for ξ is Xi + S2 X SY ˆξ = 2 Yi. m + S2 X n SY 2 (See Graybill and Deal, Biometrics, 959, pp for its properties.) 2.2. Non-parametric families Consider X = (X,..., X n ), where X,..., X n are iid F, where F F, a family of distribution functions, and P is the corresponding product measure on (R n, B n ). For example, F 0 = df s with density relative to Lebesgue measure, F = df s with x F (dx) <, F 2 = df s with x 2 F (dx) <, etc.. The estimand is g : F R. For example, g(f ) = g(f ) = xf (dx) = µ F x 2 F (dx) g(f ) = F (a) g(f ) = F (p) Proposition If F 0 is defined as above, then (X (),..., X (n) ) is complete sufficient for F 0 (i.e. for the corresponding family of probability measures P).
38 38 2. UNBIASEDNESS Proof. We know that T (X) = (X (),..., X (n) ) is sufficient for P. It remains to show (by problem.6.32, p.72) that T is complete and sufficient for a family P 0 P such that each member of P 0 has positive density on R n. Choose P 0 to be the set of probability measures on B n with densities relative to Lebesgue measure, C(θ,, θ n ) exp{θ xi + θ 2 x i x j + + θ n x x n x 2n i }} i<j This is an exponential family whose natural parameter set N contains an open set (N = R n ). So S(x) = ( x i, i<j x ix j,, x x n ) is complete. But S is equivalent to T (consider the n th degree polynomial whose zeroes are x (),, x (n) ), so T is complete for F 0. Measurable functions of the order statistics. If T (x) := (x (),..., x (n) ) then δ(x,..., X n ) σ(t ) δ(x,..., X n ) = δ(x π,..., X πn ) for every permutation (π,..., π n ) of (,..., n). Since T is a CSS for F 0, this enables us to identify UMVU estimators of estimands g for which they exist. Example g(f ) = F (a). An obvious unbiased estimator of F (a) is T (X) := n I (,a] (X i ) n and T σ(t ) so T is UMVU for F (a). i= Example g(f ) = xdf, F F 0 F 2. Let T 2 (x) = n X i n Then T 2 σ(t ) and, since T is also complete for F 0 F 2, it is therefore UMVU for µ F. Example g(f ) = σf 2, F F 0 F 4. Let T 3 (x) = S(x) 2 (xi x) 2 (x(i) x(i) ) 2 n = = n n T 3 σ(t ) and is unbiased for σf 2. Since T is complete for F 0 F 4, T 3 is UMVU for σf 2. Remark T complete for F does not imply generally that T is complete for F F. In fact the reverse is true. Completeness for F implies completeness for F. However the same argument used in the proof of Proposition 2.2. shows that T is complete for F 0 F 2 (used in example 2.2.3) and i= T is complete for F 0 F 4 (used in example 2.2.4).
Fundamentals of Statistics
Chapter 2 Fundamentals of Statistics This chapter discusses some fundamental concepts of mathematical statistics. These concepts are essential for the material in later chapters. 2.1 Populations, Samples,
More informationECE 275B Homework # 1 Solutions Version Winter 2015
ECE 275B Homework # 1 Solutions Version Winter 2015 1. (a) Because x i are assumed to be independent realizations of a continuous random variable, it is almost surely (a.s.) 1 the case that x 1 < x 2
More informationECE 275B Homework # 1 Solutions Winter 2018
ECE 275B Homework # 1 Solutions Winter 2018 1. (a) Because x i are assumed to be independent realizations of a continuous random variable, it is almost surely (a.s.) 1 the case that x 1 < x 2 < < x n Thus,
More informationSTA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources
STA 732: Inference Notes 10. Parameter Estimation from a Decision Theoretic Angle Other resources 1 Statistical rules, loss and risk We saw that a major focus of classical statistics is comparing various
More informationST5215: Advanced Statistical Theory
Department of Statistics & Applied Probability Monday, September 26, 2011 Lecture 10: Exponential families and Sufficient statistics Exponential Families Exponential families are important parametric families
More informationEvaluating the Performance of Estimators (Section 7.3)
Evaluating the Performance of Estimators (Section 7.3) Example: Suppose we observe X 1,..., X n iid N(θ, σ 2 0 ), with σ2 0 known, and wish to estimate θ. Two possible estimators are: ˆθ = X sample mean
More informationChapter 2: Fundamentals of Statistics Lecture 15: Models and statistics
Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Data from one or a series of random experiments are collected. Planning experiments and collecting data (not discussed here). Analysis:
More informationChapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic
Chapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic Unbiased estimation Unbiased or asymptotically unbiased estimation plays an important role in
More informationChapter 3. Point Estimation. 3.1 Introduction
Chapter 3 Point Estimation Let (Ω, A, P θ ), P θ P = {P θ θ Θ}be probability space, X 1, X 2,..., X n : (Ω, A) (IR k, B k ) random variables (X, B X ) sample space γ : Θ IR k measurable function, i.e.
More information40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology
Singapore University of Design and Technology Lecture 9: Hypothesis testing, uniformly most powerful tests. The Neyman-Pearson framework Let P be the family of distributions of concern. The Neyman-Pearson
More information1. Fisher Information
1. Fisher Information Let f(x θ) be a density function with the property that log f(x θ) is differentiable in θ throughout the open p-dimensional parameter set Θ R p ; then the score statistic (or score
More informationST5215: Advanced Statistical Theory
Department of Statistics & Applied Probability Wednesday, October 5, 2011 Lecture 13: Basic elements and notions in decision theory Basic elements X : a sample from a population P P Decision: an action
More informationNotes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed
18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,
More informationST5215: Advanced Statistical Theory
Department of Statistics & Applied Probability Wednesday, October 19, 2011 Lecture 17: UMVUE and the first method of derivation Estimable parameters Let ϑ be a parameter in the family P. If there exists
More informationIntegral Jensen inequality
Integral Jensen inequality Let us consider a convex set R d, and a convex function f : (, + ]. For any x,..., x n and λ,..., λ n with n λ i =, we have () f( n λ ix i ) n λ if(x i ). For a R d, let δ a
More informationAn exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form.
Stat 8112 Lecture Notes Asymptotics of Exponential Families Charles J. Geyer January 23, 2013 1 Exponential Families An exponential family of distributions is a parametric statistical model having densities
More informationBrownian Motion and Conditional Probability
Math 561: Theory of Probability (Spring 2018) Week 10 Brownian Motion and Conditional Probability 10.1 Standard Brownian Motion (SBM) Brownian motion is a stochastic process with both practical and theoretical
More informationLecture 2: Basic Concepts of Statistical Decision Theory
EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture
More informationUnbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.
Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it
More informationLECTURE 2 NOTES. 1. Minimal sufficient statistics.
LECTURE 2 NOTES 1. Minimal sufficient statistics. Definition 1.1 (minimal sufficient statistic). A statistic t := φ(x) is a minimial sufficient statistic for the parametric model F if 1. it is sufficient.
More informationMathematical Statistics
Mathematical Statistics Chapter Three. Point Estimation 3.4 Uniformly Minimum Variance Unbiased Estimator(UMVUE) Criteria for Best Estimators MSE Criterion Let F = {p(x; θ) : θ Θ} be a parametric distribution
More information1 Complete Statistics
Complete Statistics February 4, 2016 Debdeep Pati 1 Complete Statistics Suppose X P θ, θ Θ. Let (X (1),..., X (n) ) denote the order statistics. Definition 1. A statistic T = T (X) is complete if E θ g(t
More informationUnbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.
Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it
More informationMATH MEASURE THEORY AND FOURIER ANALYSIS. Contents
MATH 3969 - MEASURE THEORY AND FOURIER ANALYSIS ANDREW TULLOCH Contents 1. Measure Theory 2 1.1. Properties of Measures 3 1.2. Constructing σ-algebras and measures 3 1.3. Properties of the Lebesgue measure
More informationLecture 17: Minimal sufficiency
Lecture 17: Minimal sufficiency Maximal reduction without loss of information There are many sufficient statistics for a given family P. In fact, X (the whole data set) is sufficient. If T is a sufficient
More informationX n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)
14:17 11/16/2 TOPIC. Convergence in distribution and related notions. This section studies the notion of the so-called convergence in distribution of real random variables. This is the kind of convergence
More informationHILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define
HILBERT SPACES AND THE RADON-NIKODYM THEOREM STEVEN P. LALLEY 1. DEFINITIONS Definition 1. A real inner product space is a real vector space V together with a symmetric, bilinear, positive-definite mapping,
More informationLecture 6 Basic Probability
Lecture 6: Basic Probability 1 of 17 Course: Theory of Probability I Term: Fall 2013 Instructor: Gordan Zitkovic Lecture 6 Basic Probability Probability spaces A mathematical setup behind a probabilistic
More informationStatistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation
Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider
More information3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?
MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due ). Show that the open disk x 2 + y 2 < 1 is a countable union of planar elementary sets. Show that the closed disk x 2 + y 2 1 is a countable
More informationSolutions to Tutorial 11 (Week 12)
THE UIVERSITY OF SYDEY SCHOOL OF MATHEMATICS AD STATISTICS Solutions to Tutorial 11 (Week 12) MATH3969: Measure Theory and Fourier Analysis (Advanced) Semester 2, 2017 Web Page: http://sydney.edu.au/science/maths/u/ug/sm/math3969/
More information2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?
MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due 9/5). Prove that every countable set A is measurable and µ(a) = 0. 2 (Bonus). Let A consist of points (x, y) such that either x or y is
More informationProbability and Measure
Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability
More informationLecture 12 November 3
STATS 300A: Theory of Statistics Fall 2015 Lecture 12 November 3 Lecturer: Lester Mackey Scribe: Jae Hyuck Park, Christian Fong Warning: These notes may contain factual and/or typographic errors. 12.1
More informationMaximum Likelihood Estimation
Chapter 7 Maximum Likelihood Estimation 7. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function
More informationPart IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015
Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationReview and continuation from last week Properties of MLEs
Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that
More informationTHEOREMS, ETC., FOR MATH 515
THEOREMS, ETC., FOR MATH 515 Proposition 1 (=comment on page 17). If A is an algebra, then any finite union or finite intersection of sets in A is also in A. Proposition 2 (=Proposition 1.1). For every
More informationChapter 1. Measure Spaces. 1.1 Algebras and σ algebras of sets Notation and preliminaries
Chapter 1 Measure Spaces 1.1 Algebras and σ algebras of sets 1.1.1 Notation and preliminaries We shall denote by X a nonempty set, by P(X) the set of all parts (i.e., subsets) of X, and by the empty set.
More informationLARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011
LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS S. G. Bobkov and F. L. Nazarov September 25, 20 Abstract We study large deviations of linear functionals on an isotropic
More informationMinimax lower bounds I
Minimax lower bounds I Kyoung Hee Kim Sungshin University 1 Preliminaries 2 General strategy 3 Le Cam, 1973 4 Assouad, 1983 5 Appendix Setting Family of probability measures {P θ : θ Θ} on a sigma field
More informationDynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor)
Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor) Matija Vidmar February 7, 2018 1 Dynkin and π-systems Some basic
More informationMATHS 730 FC Lecture Notes March 5, Introduction
1 INTRODUCTION MATHS 730 FC Lecture Notes March 5, 2014 1 Introduction Definition. If A, B are sets and there exists a bijection A B, they have the same cardinality, which we write as A, #A. If there exists
More informationMaximum Likelihood Estimation
Chapter 8 Maximum Likelihood Estimation 8. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function
More informationA D VA N C E D P R O B A B I L - I T Y
A N D R E W T U L L O C H A D VA N C E D P R O B A B I L - I T Y T R I N I T Y C O L L E G E T H E U N I V E R S I T Y O F C A M B R I D G E Contents 1 Conditional Expectation 5 1.1 Discrete Case 6 1.2
More informationApplications of Ito s Formula
CHAPTER 4 Applications of Ito s Formula In this chapter, we discuss several basic theorems in stochastic analysis. Their proofs are good examples of applications of Itô s formula. 1. Lévy s martingale
More informationSTAT215: Solutions for Homework 2
STAT25: Solutions for Homework 2 Due: Wednesday, Feb 4. (0 pt) Suppose we take one observation, X, from the discrete distribution, x 2 0 2 Pr(X x θ) ( θ)/4 θ/2 /2 (3 θ)/2 θ/4, 0 θ Find an unbiased estimator
More informationBrief Review on Estimation Theory
Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on
More informationMiscellaneous Errors in the Chapter 6 Solutions
Miscellaneous Errors in the Chapter 6 Solutions 3.30(b In this problem, early printings of the second edition use the beta(a, b distribution, but later versions use the Poisson(λ distribution. If your
More informationMaster s Written Examination
Master s Written Examination Option: Statistics and Probability Spring 05 Full points may be obtained for correct answers to eight questions Each numbered question (which may have several parts) is worth
More informationLecture 4. f X T, (x t, ) = f X,T (x, t ) f T (t )
LECURE NOES 21 Lecture 4 7. Sufficient statistics Consider the usual statistical setup: the data is X and the paramter is. o gain information about the parameter we study various functions of the data
More informationClassical Estimation Topics
Classical Estimation Topics Namrata Vaswani, Iowa State University February 25, 2014 This note fills in the gaps in the notes already provided (l0.pdf, l1.pdf, l2.pdf, l3.pdf, LeastSquares.pdf). 1 Min
More informationENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM
c 2007-2016 by Armand M. Makowski 1 ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM 1 The basic setting Throughout, p, q and k are positive integers. The setup With
More information1 Probability Model. 1.1 Types of models to be discussed in the course
Sufficiency January 18, 016 Debdeep Pati 1 Probability Model Model: A family of distributions P θ : θ Θ}. P θ (B) is the probability of the event B when the parameter takes the value θ. P θ is described
More informationIf Y and Y 0 satisfy (1-2), then Y = Y 0 a.s.
20 6. CONDITIONAL EXPECTATION Having discussed at length the limit theory for sums of independent random variables we will now move on to deal with dependent random variables. An important tool in this
More informationChapter 1. Statistical Spaces
Chapter 1 Statistical Spaces Mathematical statistics is a science that studies the statistical regularity of random phenomena, essentially by some observation values of random variable (r.v.) X. Sometimes
More informationTranslation Invariant Experiments with Independent Increments
Translation Invariant Statistical Experiments with Independent Increments (joint work with Nino Kordzakhia and Alex Novikov Steklov Mathematical Institute St.Petersburg, June 10, 2013 Outline 1 Introduction
More informationChapter 7. Basic Probability Theory
Chapter 7. Basic Probability Theory I-Liang Chern October 20, 2016 1 / 49 What s kind of matrices satisfying RIP Random matrices with iid Gaussian entries iid Bernoulli entries (+/ 1) iid subgaussian entries
More informationQualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf
Part : Sample Problems for the Elementary Section of Qualifying Exam in Probability and Statistics https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part 2: Sample Problems for the Advanced Section
More informationLecture 17: Likelihood ratio and asymptotic tests
Lecture 17: Likelihood ratio and asymptotic tests Likelihood ratio When both H 0 and H 1 are simple (i.e., Θ 0 = {θ 0 } and Θ 1 = {θ 1 }), Theorem 6.1 applies and a UMP test rejects H 0 when f θ1 (X) f
More informationSTAT 512 sp 2018 Summary Sheet
STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}
More informationHypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes
Neyman-Pearson paradigm. Suppose that a researcher is interested in whether the new drug works. The process of determining whether the outcome of the experiment points to yes or no is called hypothesis
More informationMaster s Written Examination
Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth
More informationI. ANALYSIS; PROBABILITY
ma414l1.tex Lecture 1. 12.1.2012 I. NLYSIS; PROBBILITY 1. Lebesgue Measure and Integral We recall Lebesgue measure (M411 Probability and Measure) λ: defined on intervals (a, b] by λ((a, b]) := b a (so
More informationRandom Process Lecture 1. Fundamentals of Probability
Random Process Lecture 1. Fundamentals of Probability Husheng Li Min Kao Department of Electrical Engineering and Computer Science University of Tennessee, Knoxville Spring, 2016 1/43 Outline 2/43 1 Syllabus
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More information3 Integration and Expectation
3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ
More informationJoint Probability Distributions and Random Samples (Devore Chapter Five)
Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete
More information5 Measure theory II. (or. lim. Prove the proposition. 5. For fixed F A and φ M define the restriction of φ on F by writing.
5 Measure theory II 1. Charges (signed measures). Let (Ω, A) be a σ -algebra. A map φ: A R is called a charge, (or signed measure or σ -additive set function) if φ = φ(a j ) (5.1) A j for any disjoint
More informationMIT Spring 2016
MIT 18.655 Dr. Kempthorne Spring 2016 1 MIT 18.655 Outline 1 2 MIT 18.655 Decision Problem: Basic Components P = {P θ : θ Θ} : parametric model. Θ = {θ}: Parameter space. A{a} : Action space. L(θ, a) :
More informationChapter 3 : Likelihood function and inference
Chapter 3 : Likelihood function and inference 4 Likelihood function and inference The likelihood Information and curvature Sufficiency and ancilarity Maximum likelihood estimation Non-regular models EM
More informationIntegration on Measure Spaces
Chapter 3 Integration on Measure Spaces In this chapter we introduce the general notion of a measure on a space X, define the class of measurable functions, and define the integral, first on a class of
More informationHypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3
Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest
More informationMath 494: Mathematical Statistics
Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/
More informationQualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf
Part 1: Sample Problems for the Elementary Section of Qualifying Exam in Probability and Statistics https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part 2: Sample Problems for the Advanced Section
More informationEcon 508B: Lecture 5
Econ 508B: Lecture 5 Expectation, MGF and CGF Hongyi Liu Washington University in St. Louis July 31, 2017 Hongyi Liu (Washington University in St. Louis) Math Camp 2017 Stats July 31, 2017 1 / 23 Outline
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationReal Analysis Chapter 3 Solutions Jonathan Conder. ν(f n ) = lim
. Suppose ( n ) n is an increasing sequence in M. For each n N define F n : n \ n (with 0 : ). Clearly ν( n n ) ν( nf n ) ν(f n ) lim n If ( n ) n is a decreasing sequence in M and ν( )
More informationLecture 7 October 13
STATS 300A: Theory of Statistics Fall 2015 Lecture 7 October 13 Lecturer: Lester Mackey Scribe: Jing Miao and Xiuyuan Lu 7.1 Recap So far, we have investigated various criteria for optimal inference. We
More informationPart II Probability and Measure
Part II Probability and Measure Theorems Based on lectures by J. Miller Notes taken by Dexter Chua Michaelmas 2016 These notes are not endorsed by the lecturers, and I have modified them (often significantly)
More informationLecture 3 September 29
STATS 300A: Theory of Statistics Fall 015 Lecture 3 September 9 Lecturer: Lester Mackey Scribe: Konstantin Lopyrev, Karthik Rajkumar Warning: These notes may contain factual and/or typographic errors.
More informationErgodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.
Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions
More informationSpring 2012 Math 541B Exam 1
Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote
More informationStatistics 3858 : Maximum Likelihood Estimators
Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More information1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989),
Real Analysis 2, Math 651, Spring 2005 April 26, 2005 1 Real Analysis 2, Math 651, Spring 2005 Krzysztof Chris Ciesielski 1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer
More informationhere, this space is in fact infinite-dimensional, so t σ ess. Exercise Let T B(H) be a self-adjoint operator on an infinitedimensional
15. Perturbations by compact operators In this chapter, we study the stability (or lack thereof) of various spectral properties under small perturbations. Here s the type of situation we have in mind:
More information7 Convergence in R d and in Metric Spaces
STA 711: Probability & Measure Theory Robert L. Wolpert 7 Convergence in R d and in Metric Spaces A sequence of elements a n of R d converges to a limit a if and only if, for each ǫ > 0, the sequence a
More informationMIT Spring 2016
Dr. Kempthorne Spring 2016 1 Outline Building 1 Building 2 Definition Building Let X be a random variable/vector with sample space X R q and probability model P θ. The class of probability models P = {P
More informationPart IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015
Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)
More information1 Directional Derivatives and Differentiability
Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=
More informationChapter 1: Probability Theory Lecture 1: Measure space, measurable function, and integration
Chapter 1: Probability Theory Lecture 1: Measure space, measurable function, and integration Random experiment: uncertainty in outcomes Ω: sample space: a set containing all possible outcomes Definition
More informationSpring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =
Spring 2012 Math 541A Exam 1 1. (a) Let Z i be independent N(0, 1), i = 1, 2,, n. Are Z = 1 n n Z i and S 2 Z = 1 n 1 n (Z i Z) 2 independent? Prove your claim. (b) Let X 1, X 2,, X n be independent identically
More informationMIT Spring 2016
Exponential Families II MIT 18.655 Dr. Kempthorne Spring 2016 1 Outline Exponential Families II 1 Exponential Families II 2 : Expectation and Variance U (k 1) and V (l 1) are random vectors If A (m k),
More informationMachine Learning. Lecture 3: Logistic Regression. Feng Li.
Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification
More informationdiscrete random variable: probability mass function continuous random variable: probability density function
CHAPTER 1 DISTRIBUTION THEORY 1 Basic Concepts Random Variables discrete random variable: probability mass function continuous random variable: probability density function CHAPTER 1 DISTRIBUTION THEORY
More informationNotes on Measure, Probability and Stochastic Processes. João Lopes Dias
Notes on Measure, Probability and Stochastic Processes João Lopes Dias Departamento de Matemática, ISEG, Universidade de Lisboa, Rua do Quelhas 6, 1200-781 Lisboa, Portugal E-mail address: jldias@iseg.ulisboa.pt
More informationBayes spaces: use of improper priors and distances between densities
Bayes spaces: use of improper priors and distances between densities J. J. Egozcue 1, V. Pawlowsky-Glahn 2, R. Tolosana-Delgado 1, M. I. Ortego 1 and G. van den Boogaart 3 1 Universidad Politécnica de
More informationSOLUTION FOR HOMEWORK 7, STAT p(x σ) = (1/[2πσ 2 ] 1/2 )e (x µ)2 /2σ 2.
SOLUTION FOR HOMEWORK 7, STAT 6332 1. We have (for a general case) Denote p (x) p(x σ)/ σ. Then p(x σ) (1/[2πσ 2 ] 1/2 )e (x µ)2 /2σ 2. p (x σ) p(x σ) 1 (x µ)2 +. σ σ 3 Then E{ p (x σ) p(x σ) } σ 2 2σ
More information