Chapter 4: Asymptotic Properties of the MLE
|
|
- Brian Campbell
- 5 years ago
- Views:
Transcription
1 Chapter 4: Asymptotic Properties of the MLE Daniel O. Scharfstein 09/19/13 1 / 1
2 Maximum Likelihood Maximum likelihood is the most powerful tool for estimation. In this part of the course, we will consider the asymptotic properties of the maximum likelihood estimator. In particular, we will study issues of consistency, asymptotic normality, and efficiency. Suppose that X n = (X 1,..., X n ), where the X i s are i.i.d. with common density p(x; θ 0 ) P = {p(x; θ) : θ Θ}. We assume that θ 0 is identified in the sense that if θ θ 0 and θ Θ, then p(x; θ) p(x; θ 0 ) with respect to the dominating measure µ. 2 / 1
3 Maximum Likelihood For fixed θ Θ, the joint density of X n is equal to the product of the individual densities, i.e., n p(x n ; θ) = p(x i ; θ) If we think of p(x n ; θ) as a function of θ with x n held fixed, then we refer to the resulting function as the likelihood function, L(θ; x n ). The maximum likelihood estimate for observed x n is the value θ Θ which maximizes L(θ; x n ), ˆθ(x n ). Prior to observation, x n is unknown, so we consider the maximum likelihood estimator, MLE, to be the value θ Θ which maximizes L(θ; X n ), ˆθ(X n ). Equivalently, the MLE can be taken to be the maximum of the standardized log-likelihood, l(θ; X n ) n = log L(θ; X n) n = 1 n i=1 n log p(x i ; θ) = 1 n i=1 n l(θ; X i ) i=1 3 / 1
4 Maximum Likelihood We will show that the MLE is 1 consistent, ˆθ(X n ) P(θ 0) θ 0 2 asymptotically normal, n(ˆθ(x n ) θ 0 ) D(θ 0) Normal R.V. 3 asymptotically efficient, i.e., if we want to estimate θ 0 by any other estimator within a reasonable class, the MLE is the most precise. In order to establish 1-3, we will have to provide some regularity conditions on the probability model and the class of estimators that will be considered. 4 / 1
5 Consistency We first want to show that if we have a sample of i.i.d. data from a common distribution which belongs to a probability model, then under some regularity conditions on the form of the density, the sequence of estimators, {ˆθ(X n )}, will converge in probability to θ 0. So far, we have not discussed the issue of whether a maximum likelihood estimator exists or if one does whether it is unique. We will get to this, but first we start with a heuristic proof of consistency. 5 / 1
6 Heuristic Proof The MLE is the value θ Θ which maximizes Q(θ; X n ) = 1 n n i=1 l(θ; X i). By the WLLN, we know that Q(θ; X n ) = 1 n n i=1 l(θ; X i ) P(θ 0) Q 0 (θ) = E θ0 [l(θ; X )] = E θ0 [log p(x ; θ)] = {log p(x; θ)}p(x; θ 0 )dµ(x) We expect that, on average, the log-likelihood will be close to the expected log-likelihood. Therefore, we expect that the maximum likelihood estimator will be close to the maximum of the expected log-likelihood. We will show that the expected log-likelihood, Q 0 (θ) is maximized at θ 0 (i.e., the truth). 6 / 1
7 Lemma 4.1 Lemma 8.1: If θ 0 is identified and E θ0 [ log p(x ; θ) ] < for all θ Θ, Q 0 (θ) is uniquely maximized at θ = θ 0. Proof: By Jensen s inequality, we know that for any strictly convex function g( ), E[g(Y )] > g(e[y ]). Take g(y) = log(y). So, for θ θ 0, p(x ; θ) E θ0 [ log( p(x ; θ 0 ) )] > log(e p(x ; θ) θ 0 [ p(x ; θ 0 ) ]) Note that p(x ; θ) E θ0 [ p(x ; θ 0 ) ] = p(x; θ) p(x; θ 0 ) p(x; θ 0)dµ(x) = p(x; θ) = 1 So, E θ0 [ log( p(x ;θ) p(x ;θ 0 ) )] > 0 or Q 0 (θ 0 ) = E θ0 [log p(x ; θ 0 )] > E θ0 [log p(x ; θ)] = Q 0 (θ) This inequality holds for all θ θ 0. 7 / 1
8 Consistency Under technical conditions for the limit of the maximum to be the maximum of the limit, ˆθ(X n ) should converge in probability to θ 0. Sufficient conditions for the maximum of the limit to be the limit of the maximum are that the convergence is uniform and the parameter space is compact. If Q(θ; X n ) lies in the sleeve [Q 0 (θ) ɛ, Q 0 (θ) + ɛ] for all θ, then ˆθ(X n ) must lie in [θ L (ɛ), θ U (ɛ)], i.e., the MLE must be close to the maximum of Q 0 (θ), θ 0. The estimator should then be consistent as long as θ 0 is the true value of the parameter. 8 / 1
9 Consistency It is essential for consistency that the limit Q 0 (θ) have a unique maximum at the true parameter value. If there are multiple maxima, then the above argument will only lead to the estimator being close to one of the maxima, which does not guarantee consistency (because one of the maxima will not be the true value of the parameter). Identification guarantees that Q 0 (θ) has a unique maximum at θ 0. The discussion so far only allows for a compact parameter space. In theory compactness requires that one know bounds on the true parameter value, although this constraint is often ignored in practice. It is possible to drop this assumption if the function Q(θ; X n ) cannot rise too much as θ becomes unbounded. We will discuss this later. 9 / 1
10 Uniform Convergence in Probability Definition 4.1: Q(θ; X n ) converges uniformly in probability to Q 0 (θ) if sup Q(θ; X n ) Q 0 (θ) P(θ 0) 0 θ Θ More precisely, we have that for all ɛ > 0, P θ0 [sup Q(θ; X n ) Q 0 (θ) > ɛ] 0 θ Θ Why isn t pointwise convergence enough? Uniform convergence guarantees that for almost all realizations, the paths in θ are in the sleeve. This ensures that the maximum is close to θ 0. For pointwise convergence, we know that at each θ, most of the realizations are in the sleeve, but there is no guarantee that for another value of θ the same set of realizations are in the sleeve. Thus, the maximum need not be near θ / 1
11 Consistency Theorem Theorem 4.2: Suppose that Q(θ; X n ) is continuous in θ and there exists a function Q 0 (θ) such that 1 Q 0 (θ) is uniquely maximized at θ 0 2 Θ is compact 3 Q 0 (θ) is continuous in θ 4 Q(θ; X n ) converges uniformly in probability to Q 0 (θ). then ˆθ(X n ) defined as the value of θ Θ which for each X n = x n maximizes the objective function Q(θ; X n ) satisfies ˆθ(X n ) P θ / 1
12 Consistency Proof Proof: We need to show that for all ɛ > 0, P θ0 [ ˆθ(X n ) θ 0 < ɛ] 1 as n. Another way of saying this is as follows. Define an ɛ-neighborhood about θ 0, i.e., We want to show that Θ(ɛ) = {θ : θ θ 0 < ɛ} P θ0 [ˆθ(X n ) Θ(ɛ)] 1 as n. Since Θ(ɛ) is an open set, we know that Θ Θ(ɛ) C is a compact set (Assumption 2). Since Q 0 (θ) is a continuous function (Assumption 3), then sup θ Θ Θ(ɛ) C {Q 0 (θ)} is a achieved for a θ in the compact set. Denote this value by θ. That is, Q 0 (θ ) = sup θ Θ Θ(ɛ) C {Q 0 (θ)}. 12 / 1
13 Consistency Proof We know that θ 0 is the unique maximizer of Q 0 (θ) over all θ Θ (Assumption 1). Therefore, Q 0 (θ 0 ) Q 0 (θ ) = δ > 0. Let A n be the event that Q(θ 0 ; X n ) > Q 0 (θ 0 ) δ/2 and B n the event that sup θ Θ Θ(ɛ) C Q(θ; X n ) Q 0 (θ) < δ/2. By uniform convergence in probability (Assumption 4), we know that a. P θ0 [A n ] 1. b. P θ0 [B n ] 1 c. P θ0 [A n B n ] 1 B n implies that for all θ Θ Θ(ɛ) C, Q(θ; X n ) < Q 0 (θ) + δ/2 Q(θ; X n ) < Q 0 (θ ) + δ/2 Q(θ; X n ) < Q 0 (θ 0 ) (Q 0 (θ 0 ) Q 0 (θ )) + δ/2 Q(θ; X n ) < Q 0 (θ 0 ) δ/2 13 / 1
14 Consistency Proof Let C n be the event that Q(θ; X n ) < Q 0 (θ 0 ) δ/2 for all θ Θ Θ(ɛ) C. Since B n C n, we know that P θ0 [C n ] 1. Since A n B n A n C n, we know that P θ0 [A n C n ] 1. The event A n C n implies that Q(θ 0 ; X n ) > sup θ Θ Θ(ɛ) C {Q(θ; X n )}. Since Q(ˆθ(X n ); X n ) Q(θ 0 ; X n ), we know that Q(ˆθ(X n ); X n ) > sup θ Θ Θ(ɛ) C {Q(θ; X n )}. This implies that ˆθ(X n ) Θ(ɛ). Thus, we know that P θ0 [ˆθ(X n ) Θ(ɛ)] / 1
15 Lemma 4.3 A key element of the above proof is that Q(θ; X n ) converges uniformly in probability to Q 0 (θ). This is often difficult to prove. A useful condition is given by the following lemma: Lemma 4.3: If X 1,..., X n are i.i.d. p(x; θ 0 ) {p(x; θ) : θ Θ}, Θ is compact, log p(x; θ) is continuous in θ for all θ Θ and all x X, and if there exists a function d(x) such that log p(x; θ) d(x) for all θ Θ and x X, and E θ0 [d(x )] <, then i. Q 0 (θ) = E θ0 [log p(x ; θ)] is continuous in θ ii. sup θ Θ Q(θ; X n ) Q 0 (θ) P 0 15 / 1
16 Proof Lemma 4.3 Proof: We first prove the continuity of Q 0 (θ). For any θ Θ, choose a sequence θ k Θ which converges to θ. By the continuity of log p(x; θ), we know that log p(x; θ k ) log p(x; θ). Since log p(x; θ k ) d(x), the dominated convergence theorem tells us that Q 0 (θ k ) = E θ0 [log p(x ; θ k )] E θ0 [log p(x ; θ)] = Q 0 (θ). This implies that Q 0 (θ) is continuous. Next, we work to establish the uniform convergence in probability. We need to show that for any ɛ, η > 0 there exists N(ɛ, η) such that for all n > N(ɛ, η), P[sup Q(θ; X n ) Q 0 (θ) > ɛ] < η θ Θ 16 / 1
17 Proof Lemma 4.3 Since log p(x; θ) is continuous in θ Θ and since Θ is compact, we know that log p(x; θ) is uniformly continuous (see Rudin, page 90). Uniform continuity says that for all ɛ > 0 there exists a δ(ɛ) > 0 such that log p(x; θ 1 ) log p(x; θ 2 ) < ɛ for all θ 1, θ 2 Θ for which θ 1 θ 2 < δ(ɛ). This is a property of a function defined on a set of points. In contrast, continuity is defined relative to a particular point. Continuity of a function at a point θ says that for all ɛ > 0, there exists a δ(ɛ, θ ) such that log p(x; θ) log p(x; θ ) < ɛ for all θ Θ for which θ θ < δ(ɛ, θ ). For continuity, δ depends on ɛ and θ. For uniform continuity, δ depends only on ɛ. In general, uniform continuity is stronger than continuity; however, they are equivalent on compact sets. 17 / 1
18 Aside: Uniform Continuity vs. Continuity Consider f (x) = 1/x for x (0, 1). This function is continuous for each x (0, 1). However, it is not uniformly continuous. Suppose this function was uniformly continuous. Then, we know for any ɛ > 0, we can find a δ(ɛ) > 0 such that 1/x 1 1/x 2 < ɛ for all x 1, x 2 such that x 1 x 2 < δ(ɛ). Given an ɛ > 0, consider the points x 1 and x 2 = x 1 + δ(ɛ)/2. Then, we know that 1 x 1 1 x 2 = 1 x 1 1 x 1 + δ(ɛ)/2 = δ(ɛ)/2 x 1 (x 1 + δ(ɛ)/2) Take x 1 sufficiently small so that x 1 < min( δ(ɛ) 2, 1 2ɛ ). This implies that δ(ɛ)/2 x 1 (x 1 + δ(ɛ)/2) > δ(ɛ)/2 x 1 δ(ɛ) = 1 > ɛ 2x 1 This is a contradiction. 18 / 1
19 Proof Lemma 4.3 Uniform continuity also implies that (x, δ) = sup log p(x; θ 1 ) log p(x; θ 2 ) 0 {(θ 1,θ 2 ): θ 1 θ 2 <δ} as δ 0. By the assumption of Lemma 4.3, we know that (x, δ) 2d(x) for all δ. By the dominated convergence theorem, we know that E θ0 [ (X, δ)] 0 as δ 0. Now, consider open balls of length δ about each θ Θ, i.e., B(θ, δ) = { θ : θ θ < δ}. The union of these open balls contains Θ. This union is an open cover of Θ. Since Θ is a compact set, we know that there exists a finite subcover, which we denote by {B(θ j, δ), j = 1,..., J}. 19 / 1
20 Proof Lemma 4.3 By the triangle inequality, we know that Q(θ; X n ) Q 0 (θ) Q(θ; X n ) Q(θ j ; X n ) + (1) Q(θ j ; X n ) Q 0 (θ j ) + (2) Q 0 (θ j ) Q 0 (θ) (3) Choose θ j so that θ B(θ j ; δ). Since θ θ j < δ, we know that (1) is equal to 1 n n {log p(x i ; θ) log p(x i ; θ j )} i=1 and this is less than or equal 1 n n log p(x i ; θ) log p(x i ; θ j ) 1 n i=1 n (X i, δ) i=1 20 / 1
21 Proof Lemma 4.3 We also know that (3) is less than or equal to sup Q 0 (θ 1 ) Q 0 (θ 2 ) {(θ 1,θ 2 ): θ 1 θ 2 <δ} Since Q 0 (θ) is uniformly continuous, we know that this bound can be made arbitrarily small by choosing δ to be small. That is, this bound can be made less than ɛ (δ), where ɛ(δ) 0 as δ 0. We also know that (2) is less than max Q(θ j; X n ) Q 0 (θ j ) j=1,...,j Putting these results together, we have that sup Q(θ; X n ) Q 0 (θ) 1 θ Θ n n i=1 (X i, δ)+ max j=1,...,j Q(θ j; X n ) Q 0 (θ j ) +ɛ (δ) 21 / 1
22 Proof Lemma 4.3 Choose δ so that ɛ (δ) < ɛ/3. Call this δ 1. So for any δ < δ 1, we know that P θ0 [sup θ Θ Q(θ; X n ) Q 0 (θ) > ɛ] P θ0 [ 1 n P θ0 [ 1 n n i=1 (X i, δ) + max j=1,...,j Q(θ j; X n ) Q 0 (θ j ) > 2ɛ/3] n (X i, δ) > ɛ/3] + (4) i=1 P θ0 [ max j=1,...,j Q(θ j; X n ) Q 0 (θ j ) > ɛ/3] (5) Now, we can show that we can take n sufficiently large so that (4) and (5) can be made small. 22 / 1
23 Proof Lemma 4.3 Note that (4) is equal to P θ0 [ 1 n n { (X i, δ) E θ0 [ (X, δ)]} + E θ0 [ (X, δ)] > ɛ/3] i=1 We already have demonstrated that E θ0 [ (X, δ)] 0 as δ 0. Choose δ small enough that E θ0 [ (X, δ)] < ɛ/6. Call this number δ 2. Take δ < min(δ 1, δ 2 ). Then (4) is less than P θ0 [ 1 n n { (X i, δ) E θ0 [ (X, δ)} > ɛ/6] i=1 By the WLLN, we know that there exists N 1 (ɛ, η) so that for all n > N 1 (ɛ, η), the above term is less than η/2. So, (4) is less than η/2. 23 / 1
24 Proof Lemma 4.3 For the δ considered so far, find the finite subcover {B(θ j, δ), j = 1,..., J}. Now, (5) is equal to P θ0 [ J j=1{ Q(θ j ; X n ) Q 0 (θ j ) > ɛ/3}] J P θ0 [ Q(θ j ; X n ) Q 0 (θ j ) > ɛ/3] By the WLLN, we know that for each θ j and for any ɛ, η > 0, we know there exists N 2j (ɛ, η) so that for all n > N 2j (ɛ, η) j=1 P θ0 [ Q(θ j ; X n ) Q 0 (θ j ) > ɛ/3] η/2j Let N 2 (ɛ, η) = max j=1,...,j {N 2j }. Then, for all n > N 2 (ɛ, η), we know that J P θ0 [ Q(θ j ; X n ) Q 0 (θ j ) > ɛ/3] < η/2 j=1 This implies that (5) is less than η/2. 24 / 1
25 Proof Lemma 4.3 Combining the results for (4) and (5), we have demonstrated that there exists an N(ɛ, η) = max(n 1 (ɛ, η), N 2 (ɛ, η)) so that for all n > N(ɛ, η), P θ0 [sup Q(θ; X n ) Q 0 (θ) > ɛ] < η θ Θ Q.E.D. PHEW! 25 / 1
26 Consistency Other conditions that might be useful to establish uniform convergence in probability are given in the lemmas below. Lemma 4.4 may be useful when the data are not independent. Lemma 4.4: If Θ is compact, Q 0 (θ) is continuous in θ Θ, Q(θ; X n ) P Q 0 (θ) for all θ Θ, and there is an α > 0 and C(X n ) which is bounded in probability such that for all θ, θ Θ, then, Q( θ, X n ) Q(θ, X n ) C(X n ) θ θ α sup Q(θ; X n ) Q 0 (θ) P 0 θ Θ Aside: C(X n ) is bounded in probability if for every ɛ > 0 there exists N(ɛ) and η(ɛ) > 0 such that P θ0 [ C(X n ) > η(ɛ)] < ɛ for all n > N(ɛ). 26 / 1
27 Θ is Not Compact Suppose that we are not willing to assume that Θ is compact. One way around this is bound the objective function from above uniformly in parameters that are far away from the truth. For example, suppose that there is a compact set D such that E θ0 [ sup {log p(x ; θ)}] < Q 0 (θ 0 ) = E θ0 [log p(x ; θ 0 )] θ Θ D C By the law of large numbers, we know that with probability approaching one that sup θ Θ D C {Q(θ; X n )} 1 n < 1 n n sup log p(x i ; θ) θ Θ D C n log p(x i ; θ 0 ) = Q(θ 0 ; X n ) i=1 i=1 27 / 1
28 Θ is Not Compact Therefore, with probability approaching one, we know that ˆθ(X n ) is in the compact set D. Then, we can apply the previous theory to show consistency. The following lemma can be used in cases where the objective function is concave. Lemma 4.5: If there is a function Q 0 (θ) such that i. Q 0 (θ) is uniquely maximized at θ 0 ii. θ 0 is an element of the interior of a convex set Θ (does not have to be bounded) iii. Q(θ, x n ) is concave in θ for each x n, and iv. Q(θ; X n ) P Q 0 (θ) for all θ Θ then ˆθ(X n ) exists with probability approaching one and ˆθ(X n ) P θ / 1
29 Asymptotic Normality We assume that X n = (X 1,..., X n ), where the X i s are i.i.d. with common density p(x; θ 0 ) P = {p(x; θ) : θ Θ R k }. We assume that θ 0 is identified in the sense that if θ θ 0 and θ Θ, then p(x; θ) p(x; θ 0 ) with respect to the dominating measure µ. In order to prove asymptotic normality, we will need certain regularity conditions. Some of these were encountered in the proof of consistency, but we will need some additional assumptions. 29 / 1
30 Regularity Conditions i. θ 0 lies in the interior of Θ, which is assumed to be a compact subset of R k. ii. log p(x; θ) is continuous at each θ Θ for all x X (a.e. will suffice). iii. log p(x; θ) d(x) for all θ Θ and E θ0 [d(x )] <. iv. p(x; θ) is twice continuously differentiable and p(x; θ) > 0 in a neighborhood, N, of θ 0. v. p(x;θ) θ e(x) for all θ N and e(x)dµ(x) <. 30 / 1
31 Regularity Conditions vi. Defining the score vector ψ(x; θ) = ( log p(x; θ)/ θ 1,..., log p(x; θ)/ θ k ) then we assume that I (θ 0 ) = E θ0 [ψ(x ; θ 0 )ψ(x ; θ 0 ) ] exists and is non-singular. vii. 2 log p(x;θ) θ θ f (x) for all θ N and E θ0 [f (X )] <. viii. 2 p(x;θ) θ θ g(x) for all θ N and g(x)dµ(x) <. 31 / 1
32 Asymptotic Normality Theorem 4.6: If these 8 regularity conditions hold, then n(ˆθ(x n ) θ 0 ) D(θ 0) N(0, I 1 (θ 0 )) 32 / 1
33 Asymptotic Normality Proof Proof: Note that conditions i. - iii. guarantee that the MLE is consistent. Since θ 0 is assumed to lie in the interior of Θ, we know that with sufficiently large probability that the MLE will lie in N and cannot be on the boundary. This implies that the maximum is also a local maximum, which implies that Q(ˆθ(X n ); X n )/ θ = 0 or 1 n n i=1 ψ(x i; ˆθ(X n )) = 0. That is, the MLE is the solution to the score equations. Condition v. guarantees that we can interchange integration and differentiation. Consider the case where k = 1. We know that 1 = p(x; θ)dµ(x) for all θ N. This implies that 0 = d dθ p(x; θ)dµ(x). Let s show that d dθ p(x; θ)dµ(x) = d dθ p(x; θ)dµ(x). Choose a sequence θ n N such that θ n θ. Then, by definition of a derivative, we know that dp(x; θ) dθ = lim {p(x; θ n) p(x; θ) } for all x X n θ n θ 33 / 1
34 Asymptotic Normality Proof By the mean value theorem, we know that p(x; θ n ) = p(x; θ) + dp(x; θ n) (θ n θ) dθ where θ n lies between θ and θ n so that θ n N. This implies that p(x; θ n) p(x; θ) θ n θ = dp(x; θ n) e(x) dθ Since e(x) is integrable, we can employ the dominated convergence theorem. This says that 0 = d p(x; θ)dµ(x) = lim { p(x; θ n) p(x; θ) }dµ(x) dθ n θ n θ = lim {p(x; θ n) p(x; θ) }dµ(x) n θ n θ dp(x; θ) = dµ(x) dθ 34 / 1
35 Asymptotic Normality Proof This can be generalized to partial derivatives which can then be used to formally show that E θ [ψ(x ; θ)] = 0 for θ N. We know that p(x; θ)dµ(x) = 1. This implies that θ j p(x; θ)dµ(x) = 0. By dominated convergence, we can interchange differentiation and integration so that p(x;θ) θ j dµ(x) = 0. Then, we know that p(x; θ)/ θj p(x; θ)dµ(x) = 0 p(x; θ) We can divide by p(x; θ) since it is greater than zero for all θ N. This implies that E θ [ψ j (X ; θ)] = / 1
36 Asymptotic Normality Proof The random vectors ψ(x 1 ; θ 0 ),... ψ(x n ; θ 0 ) are i.i.d. mean zero random vectors with covariance matrix I (θ 0 ). By the multivariate central limit theorem, we know that 1 n n i=1 ψ(x i ; θ 0 ) D(θ 0) N(0, I (θ 0 )) Now we shall study the large sample behavior of the matrix of second partial derivatives of the log-likelihood. Define J n (θ) = [ 1 n n i=1 2 log p(x i ; θ) θ θ ] This is a k k random matrix. 36 / 1
37 Asymptotic Normality Proof We first want to demonstrate that E θ0 [J n (θ 0 )] = I (θ 0 ). We have already established that θ j p(x; θ)dµ(x) = 0 for all j = 1,..., k. This implies that p(x; θ)dµ(x) = 0 for all j, j = 1,..., k. θ j θ j Since the norm of the matrix of second partial derivatives of p(x; θ) is bounded by an integrable function g(x) (see condition viii), we know that we can interchange integration and differentiation. This implies that 2 p(x; θ) θ j θ j dµ(x) = 0 for all j, j = 1,..., k. We note, however, that 2 log p(x; θ) θ j θ j = 2 p(x; θ) θ j θ j /p(x; θ) ψ j (x; θ)ψ j (x; θ) 37 / 1
38 Asymptotic Normality Proof This implies that 2 p(x; θ) = p(x; θ)[ 2 log p(x; θ) + ψ j (x; θ)ψ j (x; θ)] θ j θ j θ j θ j So, 0 = 2 p(x; θ 0 ) θ j θ j dµ(x) = p(x; θ 0 ) 2 log p(x; θ 0 ) θ j θ j dµ(x) + p(x; θ 0 )ψ j (x; θ 0 )ψ j (x; θ 0 )dµ(x) This implies that I j j(θ 0 ) = E θ0 [ 2 log p(x ;θ 0 ) θ j θ j ]. In matrix form, this says that I (θ 0 ) = E θ0 [ 2 log p(x ;θ 0 ) θ θ ]. So, E θ0 [J n (θ 0 )] = I (θ 0 ). 38 / 1
39 Asymptotic Normality Proof By the weak law of large numbers, we know that J n (θ) P E θ0 [ 2 log p(x ; θ) θ θ ] Define I 0 (θ) = E θ 0 [ 2 log p(x ;θ) θ θ ]. Note that I 0 (θ 0) = I (θ 0 ). The WLLN enables us to show that J n (θ) P I0 (θ) for all θ N. This is pointwise convergence. However, as we will see shortly, this will not suffice to prove our desired results. We will need something stronger. We will have to show that J n (θ) converges uniformly in probability to I0 (θ) for θ N. The proof follows the exact same way as it did when we proved uniform convergence in probability for consistency. 39 / 1
40 Asymptotic Normality Proof We know that 2 log p(x;θ) θ θ is continuous in θ N. This follows by assumption iv. By condition vii., we know that 2 log p(x;θ) θ θ f (x) for θ N and E θ0 [f (X )] <. By the same proof of Lemma 4.3, we know that a. I0 (θ) is continuous for θ N and b. sup θ N J n (θ) I0 (θ) P / 1
41 Lemma 4.7 This result is important because it allows us to prove the following lemma: Lemma 4.7 If θ n(x n ) is a consistent estimator for θ 0, then J n (θ n(x n )) P I (θ 0 ). 41 / 1
42 Proof of Lemma 4.7 Proof: By the triangle inequality, we know that J n (θ (X n )) I (θ 0 ) J n (θ (X n )) I 0 (θ (X n )) + I 0 (θ (X n )) I (θ 0 ) So, it suffices to prove that 1. J n (θ (X n )) I 0 (θ (X n )) P 0 and 2. I 0 (θ (X n )) I (θ 0 ) P 0. Since I0 (θ) is a continuous function of θ N, we know that I0 (θ (X n )) P I (θ 0 ). This is because a continuous function of a consistent estimator converges in probability to the function of the estimand. So, 2 is true. In order to prove 1, we first note that with arbitrarily large probability and n sufficiently large, θ n(x n ) N. When this happens, we have that J n (θn(x n )) I0 (θn(x n )) sup J n (θ) I0 (θ) P 0 θ N 42 / 1
43 Asymptotic Normality Proof Now, we have all the pieces required to establish asymptotic normality. By the mean value theorem, applied to each element of the score vector, we have that 0 = 1 n n i=1 ψ(x i ; ˆθ(X n )) = 1 n n ψ(x i ; θ 0 )+{ Jn(X n )} n(ˆθ(x n ) θ 0 ) i=1 Note that J n(x n ) is a k k random matrix where the jth row of the matrix is the jth row of J n evaluated at θ jn (X n) where θ jn (X n) is an intermediate value between ˆθ(X n ) and θ 0. θ jn (X n) may be different from row to row but it will be consistent for θ 0. Therefore, J n(x n ) P I (θ 0 ). 43 / 1
44 Asymptotic Normality Proof By assumption vi., we know that I (θ 0 ) is non-singular. The inversion of a non-singular matrix is a continuous function in θ. Since J n(x n ) P I (θ 0 ), we know that {J n(x n )} 1 P I (θ 0 ) 1. This also means that with sufficiently large probability, as n gets large, J n(x n ) is invertible. Therefore, we know that n(ˆθ(x n ) θ 0 ) = {J n(x n )} 1 1 n Employing Slutsky s theorem, we know that n ψ(x i ; θ 0 ) i=1 n(ˆθ(x n ) θ 0 ) D N(0, I (θ 0 ) 1 ) 44 / 1
Section 8: Asymptotic Properties of the MLE
2 Section 8: Asymptotic Properties of the MLE In this part of the course, we will consider the asymptotic properties of the maximum likelihood estimator. In particular, we will study issues of consistency,
More informationSection 8.2. Asymptotic normality
30 Section 8.2. Asymptotic normality We assume that X n =(X 1,...,X n ), where the X i s are i.i.d. with common density p(x; θ 0 ) P= {p(x; θ) :θ Θ}. We assume that θ 0 is identified in the sense that
More informationChapter 4: Asymptotic Properties of the MLE (Part 2)
Chapter 4: Asymptotic Properties of the MLE (Part 2) Daniel O. Scharfstein 09/24/13 1 / 1 Example Let {(R i, X i ) : i = 1,..., n} be an i.i.d. sample of n random vectors (R, X ). Here R is a response
More informationSection 9: Generalized method of moments
1 Section 9: Generalized method of moments In this section, we revisit unbiased estimating functions to study a more general framework for estimating parameters. Let X n =(X 1,...,X n ), where the X i
More informationThe Heine-Borel and Arzela-Ascoli Theorems
The Heine-Borel and Arzela-Ascoli Theorems David Jekel October 29, 2016 This paper explains two important results about compactness, the Heine- Borel theorem and the Arzela-Ascoli theorem. We prove them
More informationP n. This is called the law of large numbers but it comes in two forms: Strong and Weak.
Large Sample Theory Large Sample Theory is a name given to the search for approximations to the behaviour of statistical procedures which are derived by computing limits as the sample size, n, tends to
More informationProblem Set 2: Solutions Math 201A: Fall 2016
Problem Set 2: s Math 201A: Fall 2016 Problem 1. (a) Prove that a closed subset of a complete metric space is complete. (b) Prove that a closed subset of a compact metric space is compact. (c) Prove that
More informationFunctional Analysis HW #3
Functional Analysis HW #3 Sangchul Lee October 26, 2015 1 Solutions Exercise 2.1. Let D = { f C([0, 1]) : f C([0, 1])} and define f d = f + f. Show that D is a Banach algebra and that the Gelfand transform
More informationconverges as well if x < 1. 1 x n x n 1 1 = 2 a nx n
Solve the following 6 problems. 1. Prove that if series n=1 a nx n converges for all x such that x < 1, then the series n=1 a n xn 1 x converges as well if x < 1. n For x < 1, x n 0 as n, so there exists
More informationStatistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation
Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider
More informationLecture 4: September Reminder: convergence of sequences
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 4: September 6 In this lecture we discuss the convergence of random variables. At a high-level, our first few lectures focused
More informationChapter 3. Point Estimation. 3.1 Introduction
Chapter 3 Point Estimation Let (Ω, A, P θ ), P θ P = {P θ θ Θ}be probability space, X 1, X 2,..., X n : (Ω, A) (IR k, B k ) random variables (X, B X ) sample space γ : Θ IR k measurable function, i.e.
More informationFor iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.
Large Sample Theory Study approximate behaviour of ˆθ by studying the function U. Notice U is sum of independent random variables. Theorem: If Y 1, Y 2,... are iid with mean µ then Yi n µ Called law of
More informationMATH 202B - Problem Set 5
MATH 202B - Problem Set 5 Walid Krichene (23265217) March 6, 2013 (5.1) Show that there exists a continuous function F : [0, 1] R which is monotonic on no interval of positive length. proof We know there
More informationClassical regularity conditions
Chapter 3 Classical regularity conditions Preliminary draft. Please do not distribute. The results from classical asymptotic theory typically require assumptions of pointwise differentiability of a criterion
More informationMaximum Likelihood Estimation
Maximum Likelihood Estimation Assume X P θ, θ Θ, with joint pdf (or pmf) f(x θ). Suppose we observe X = x. The Likelihood function is L(θ x) = f(x θ) as a function of θ (with the data x held fixed). The
More informationReal Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi
Real Analysis Math 3AH Rudin, Chapter # Dominique Abdi.. If r is rational (r 0) and x is irrational, prove that r + x and rx are irrational. Solution. Assume the contrary, that r+x and rx are rational.
More informationMath 104: Homework 7 solutions
Math 04: Homework 7 solutions. (a) The derivative of f () = is f () = 2 which is unbounded as 0. Since f () is continuous on [0, ], it is uniformly continous on this interval by Theorem 9.2. Hence for
More informationExpectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 )
Expectation Maximization (EM Algorithm Motivating Example: Have two coins: Coin 1 and Coin 2 Each has it s own probability of seeing H on any one flip. Let p 1 = P ( H on Coin 1 p 2 = P ( H on Coin 2 Select
More informationProblem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function
Problem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function Solution. If we does not need the pointwise limit of
More informationMcGill University Math 354: Honors Analysis 3
Practice problems McGill University Math 354: Honors Analysis 3 not for credit Problem 1. Determine whether the family of F = {f n } functions f n (x) = x n is uniformly equicontinuous. 1st Solution: The
More informationPrinciples of Real Analysis I Fall VII. Sequences of Functions
21-355 Principles of Real Analysis I Fall 2004 VII. Sequences of Functions In Section II, we studied sequences of real numbers. It is very useful to consider extensions of this concept. More generally,
More informationEconomics 204 Fall 2011 Problem Set 2 Suggested Solutions
Economics 24 Fall 211 Problem Set 2 Suggested Solutions 1. Determine whether the following sets are open, closed, both or neither under the topology induced by the usual metric. (Hint: think about limit
More informationLecture Notes on Metric Spaces
Lecture Notes on Metric Spaces Math 117: Summer 2007 John Douglas Moore Our goal of these notes is to explain a few facts regarding metric spaces not included in the first few chapters of the text [1],
More informationThus f is continuous at x 0. Matthew Straughn Math 402 Homework 6
Matthew Straughn Math 402 Homework 6 Homework 6 (p. 452) 14.3.3, 14.3.4, 14.3.5, 14.3.8 (p. 455) 14.4.3* (p. 458) 14.5.3 (p. 460) 14.6.1 (p. 472) 14.7.2* Lemma 1. If (f (n) ) converges uniformly to some
More informationLecture 1: Introduction
Principles of Statistics Part II - Michaelmas 208 Lecturer: Quentin Berthet Lecture : Introduction This course is concerned with presenting some of the mathematical principles of statistical theory. One
More informationλ(x + 1)f g (x) > θ 0
Stat 8111 Final Exam December 16 Eleven students took the exam, the scores were 92, 78, 4 in the 5 s, 1 in the 4 s, 1 in the 3 s and 3 in the 2 s. 1. i) Let X 1, X 2,..., X n be iid each Bernoulli(θ) where
More informationLecture 32: Asymptotic confidence sets and likelihoods
Lecture 32: Asymptotic confidence sets and likelihoods Asymptotic criterion In some problems, especially in nonparametric problems, it is difficult to find a reasonable confidence set with a given confidence
More informationLebesgue s Differentiation Theorem via Maximal Functions
Lebesgue s Differentiation Theorem via Maximal Functions Parth Soneji LMU München Hütteseminar, December 2013 Parth Soneji Lebesgue s Differentiation Theorem via Maximal Functions 1/12 Philosophy behind
More informationSummer Jump-Start Program for Analysis, 2012 Song-Ying Li
Summer Jump-Start Program for Analysis, 01 Song-Ying Li 1 Lecture 6: Uniformly continuity and sequence of functions 1.1 Uniform Continuity Definition 1.1 Let (X, d 1 ) and (Y, d ) are metric spaces and
More informationENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM
c 2007-2016 by Armand M. Makowski 1 ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM 1 The basic setting Throughout, p, q and k are positive integers. The setup With
More informationLecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University
Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University February 7, 2007 2 Contents 1 Metric Spaces 1 1.1 Basic definitions...........................
More informationMath 328 Course Notes
Math 328 Course Notes Ian Robertson March 3, 2006 3 Properties of C[0, 1]: Sup-norm and Completeness In this chapter we are going to examine the vector space of all continuous functions defined on the
More informationUniversity of Pavia. M Estimators. Eduardo Rossi
University of Pavia M Estimators Eduardo Rossi Criterion Function A basic unifying notion is that most econometric estimators are defined as the minimizers of certain functions constructed from the sample
More informationEC 521 MATHEMATICAL METHODS FOR ECONOMICS. Lecture 1: Preliminaries
EC 521 MATHEMATICAL METHODS FOR ECONOMICS Lecture 1: Preliminaries Murat YILMAZ Boğaziçi University In this lecture we provide some basic facts from both Linear Algebra and Real Analysis, which are going
More informationReal Analysis Notes. Thomas Goller
Real Analysis Notes Thomas Goller September 4, 2011 Contents 1 Abstract Measure Spaces 2 1.1 Basic Definitions........................... 2 1.2 Measurable Functions........................ 2 1.3 Integration..............................
More informationx 0 + f(x), exist as extended real numbers. Show that f is upper semicontinuous This shows ( ɛ, ɛ) B α. Thus
Homework 3 Solutions, Real Analysis I, Fall, 2010. (9) Let f : (, ) [, ] be a function whose restriction to (, 0) (0, ) is continuous. Assume the one-sided limits p = lim x 0 f(x), q = lim x 0 + f(x) exist
More informationTheory of Maximum Likelihood Estimation. Konstantin Kashin
Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical
More informationMaximum Likelihood Estimation
University of Pavia Maximum Likelihood Estimation Eduardo Rossi Likelihood function Choosing parameter values that make what one has observed more likely to occur than any other parameter values do. Assumption(Distribution)
More informationLecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016
Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,
More informationLaplace s Equation. Chapter Mean Value Formulas
Chapter 1 Laplace s Equation Let be an open set in R n. A function u C 2 () is called harmonic in if it satisfies Laplace s equation n (1.1) u := D ii u = 0 in. i=1 A function u C 2 () is called subharmonic
More informationContinuous Functions on Metric Spaces
Continuous Functions on Metric Spaces Math 201A, Fall 2016 1 Continuous functions Definition 1. Let (X, d X ) and (Y, d Y ) be metric spaces. A function f : X Y is continuous at a X if for every ɛ > 0
More information1 Extremum Estimators
FINC 9311-21 Financial Econometrics Handout Jialin Yu 1 Extremum Estimators Let θ 0 be a vector of k 1 unknown arameters. Extremum estimators: estimators obtained by maximizing or minimizing some objective
More informationIntroductory Analysis I Fall 2014 Homework #9 Due: Wednesday, November 19
Introductory Analysis I Fall 204 Homework #9 Due: Wednesday, November 9 Here is an easy one, to serve as warmup Assume M is a compact metric space and N is a metric space Assume that f n : M N for each
More informationMetric Spaces and Topology
Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies
More informationMath 421, Homework #9 Solutions
Math 41, Homework #9 Solutions (1) (a) A set E R n is said to be path connected if for any pair of points x E and y E there exists a continuous function γ : [0, 1] R n satisfying γ(0) = x, γ(1) = y, and
More information7 Complete metric spaces and function spaces
7 Complete metric spaces and function spaces 7.1 Completeness Let (X, d) be a metric space. Definition 7.1. A sequence (x n ) n N in X is a Cauchy sequence if for any ɛ > 0, there is N N such that n, m
More informationSection 10: Role of influence functions in characterizing large sample efficiency
Section 0: Role of influence functions in characterizing large sample efficiency. Recall that large sample efficiency (of the MLE) can refer only to a class of regular estimators. 2. To show this here,
More informationTHE INVERSE FUNCTION THEOREM
THE INVERSE FUNCTION THEOREM W. PATRICK HOOPER The implicit function theorem is the following result: Theorem 1. Let f be a C 1 function from a neighborhood of a point a R n into R n. Suppose A = Df(a)
More informationClosest Moment Estimation under General Conditions
Closest Moment Estimation under General Conditions Chirok Han Victoria University of Wellington New Zealand Robert de Jong Ohio State University U.S.A October, 2003 Abstract This paper considers Closest
More informationA GENERAL THEOREM ON APPROXIMATE MAXIMUM LIKELIHOOD ESTIMATION. Miljenko Huzak University of Zagreb,Croatia
GLASNIK MATEMATIČKI Vol. 36(56)(2001), 139 153 A GENERAL THEOREM ON APPROXIMATE MAXIMUM LIKELIHOOD ESTIMATION Miljenko Huzak University of Zagreb,Croatia Abstract. In this paper a version of the general
More informationEstimation of Dynamic Regression Models
University of Pavia 2007 Estimation of Dynamic Regression Models Eduardo Rossi University of Pavia Factorization of the density DGP: D t (x t χ t 1, d t ; Ψ) x t represent all the variables in the economy.
More informationInference in non-linear time series
Intro LS MLE Other Erik Lindström Centre for Mathematical Sciences Lund University LU/LTH & DTU Intro LS MLE Other General Properties Popular estimatiors Overview Introduction General Properties Estimators
More informationExam 2 extra practice problems
Exam 2 extra practice problems (1) If (X, d) is connected and f : X R is a continuous function such that f(x) = 1 for all x X, show that f must be constant. Solution: Since f(x) = 1 for every x X, either
More informationLecture 35: December The fundamental statistical distances
36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations
Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research
More informationProbability and Measure
Probability and Measure Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Convergence of Random Variables 1. Convergence Concepts 1.1. Convergence of Real
More informationMath 5210, Definitions and Theorems on Metric Spaces
Math 5210, Definitions and Theorems on Metric Spaces Let (X, d) be a metric space. We will use the following definitions (see Rudin, chap 2, particularly 2.18) 1. Let p X and r R, r > 0, The ball of radius
More informationA glimpse into convex geometry. A glimpse into convex geometry
A glimpse into convex geometry 5 \ þ ÏŒÆ Two basis reference: 1. Keith Ball, An elementary introduction to modern convex geometry 2. Chuanming Zong, What is known about unit cubes Convex geometry lies
More informationd(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N
Problem 1. Let f : A R R have the property that for every x A, there exists ɛ > 0 such that f(t) > ɛ if t (x ɛ, x + ɛ) A. If the set A is compact, prove there exists c > 0 such that f(x) > c for all x
More informationNotes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed
18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,
More informationδ -method and M-estimation
Econ 2110, fall 2016, Part IVb Asymptotic Theory: δ -method and M-estimation Maximilian Kasy Department of Economics, Harvard University 1 / 40 Example Suppose we estimate the average effect of class size
More informationSet, functions and Euclidean space. Seungjin Han
Set, functions and Euclidean space Seungjin Han September, 2018 1 Some Basics LOGIC A is necessary for B : If B holds, then A holds. B A A B is the contraposition of B A. A is sufficient for B: If A holds,
More information9 Sequences of Functions
9 Sequences of Functions 9.1 Pointwise convergence and uniform convergence Let D R d, and let f n : D R be functions (n N). We may think of the functions f 1, f 2, f 3,... as forming a sequence of functions.
More information1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3
Index Page 1 Topology 2 1.1 Definition of a topology 2 1.2 Basis (Base) of a topology 2 1.3 The subspace topology & the product topology on X Y 3 1.4 Basic topology concepts: limit points, closed sets,
More informationconsists of two disjoint copies of X n, each scaled down by 1,
Homework 4 Solutions, Real Analysis I, Fall, 200. (4) Let be a topological space and M be a σ-algebra on which contains all Borel sets. Let m, µ be two positive measures on M. Assume there is a constant
More informationOptimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.
Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may
More informationAnalogy Principle. Asymptotic Theory Part II. James J. Heckman University of Chicago. Econ 312 This draft, April 5, 2006
Analogy Principle Asymptotic Theory Part II James J. Heckman University of Chicago Econ 312 This draft, April 5, 2006 Consider four methods: 1. Maximum Likelihood Estimation (MLE) 2. (Nonlinear) Least
More informationHomework 11. Solutions
Homework 11. Solutions Problem 2.3.2. Let f n : R R be 1/n times the characteristic function of the interval (0, n). Show that f n 0 uniformly and f n µ L = 1. Why isn t it a counterexample to the Lebesgue
More information5.1 Approximation of convex functions
Chapter 5 Convexity Very rough draft 5.1 Approximation of convex functions Convexity::approximation bound Expand Convexity simplifies arguments from Chapter 3. Reasons: local minima are global minima and
More informationSTAT 512 sp 2018 Summary Sheet
STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}
More informationANALYSIS WORKSHEET II: METRIC SPACES
ANALYSIS WORKSHEET II: METRIC SPACES Definition 1. A metric space (X, d) is a space X of objects (called points), together with a distance function or metric d : X X [0, ), which associates to each pair
More informationu xx + u yy = 0. (5.1)
Chapter 5 Laplace Equation The following equation is called Laplace equation in two independent variables x, y: The non-homogeneous problem u xx + u yy =. (5.1) u xx + u yy = F, (5.) where F is a function
More informationPhenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012
Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 202 BOUNDS AND ASYMPTOTICS FOR FISHER INFORMATION IN THE CENTRAL LIMIT THEOREM
More informationThe properties of L p -GMM estimators
The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion
More informationIf g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get
18:2 1/24/2 TOPIC. Inequalities; measures of spread. This lecture explores the implications of Jensen s inequality for g-means in general, and for harmonic, geometric, arithmetic, and related means in
More informationis a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.
Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable
More informationCommutative Banach algebras 79
8. Commutative Banach algebras In this chapter, we analyze commutative Banach algebras in greater detail. So we always assume that xy = yx for all x, y A here. Definition 8.1. Let A be a (commutative)
More informationPreliminary draft only: please check for final version
ARE211, Fall2012 CALCULUS4: THU, OCT 11, 2012 PRINTED: AUGUST 22, 2012 (LEC# 15) Contents 3. Univariate and Multivariate Differentiation (cont) 1 3.6. Taylor s Theorem (cont) 2 3.7. Applying Taylor theory:
More informationB. Appendix B. Topological vector spaces
B.1 B. Appendix B. Topological vector spaces B.1. Fréchet spaces. In this appendix we go through the definition of Fréchet spaces and their inductive limits, such as they are used for definitions of function
More informationGraduate Econometrics I: Maximum Likelihood I
Graduate Econometrics I: Maximum Likelihood I Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood
More informationExercise Solutions to Functional Analysis
Exercise Solutions to Functional Analysis Note: References refer to M. Schechter, Principles of Functional Analysis Exersize that. Let φ,..., φ n be an orthonormal set in a Hilbert space H. Show n f n
More information1. Let A R be a nonempty set that is bounded from above, and let a be the least upper bound of A. Show that there exists a sequence {a n } n N
Applied Analysis prelim July 15, 216, with solutions Solve 4 of the problems 1-5 and 2 of the problems 6-8. We will only grade the first 4 problems attempted from1-5 and the first 2 attempted from problems
More informationMaximum Likelihood Estimation
Chapter 8 Maximum Likelihood Estimation 8. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function
More informationClassical Estimation Topics
Classical Estimation Topics Namrata Vaswani, Iowa State University February 25, 2014 This note fills in the gaps in the notes already provided (l0.pdf, l1.pdf, l2.pdf, l3.pdf, LeastSquares.pdf). 1 Min
More informationImmerse Metric Space Homework
Immerse Metric Space Homework (Exercises -2). In R n, define d(x, y) = x y +... + x n y n. Show that d is a metric that induces the usual topology. Sketch the basis elements when n = 2. Solution: Steps
More informationM311 Functions of Several Variables. CHAPTER 1. Continuity CHAPTER 2. The Bolzano Weierstrass Theorem and Compact Sets CHAPTER 3.
M311 Functions of Several Variables 2006 CHAPTER 1. Continuity CHAPTER 2. The Bolzano Weierstrass Theorem and Compact Sets CHAPTER 3. Differentiability 1 2 CHAPTER 1. Continuity If (a, b) R 2 then we write
More information1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989),
Real Analysis 2, Math 651, Spring 2005 April 26, 2005 1 Real Analysis 2, Math 651, Spring 2005 Krzysztof Chris Ciesielski 1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer
More informationGeometry of log-concave Ensembles of random matrices
Geometry of log-concave Ensembles of random matrices Nicole Tomczak-Jaegermann Joint work with Radosław Adamczak, Rafał Latała, Alexander Litvak, Alain Pajor Cortona, June 2011 Nicole Tomczak-Jaegermann
More informationClosest Moment Estimation under General Conditions
Closest Moment Estimation under General Conditions Chirok Han and Robert de Jong January 28, 2002 Abstract This paper considers Closest Moment (CM) estimation with a general distance function, and avoids
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence
Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations
More information1 General problem. 2 Terminalogy. Estimation. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ).
Estimation February 3, 206 Debdeep Pati General problem Model: {P θ : θ Θ}. Observe X P θ, θ Θ unknown. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ). Examples: θ = (µ,
More informationMath 117: Topology of the Real Numbers
Math 117: Topology of the Real Numbers John Douglas Moore November 10, 2008 The goal of these notes is to highlight the most important topics presented in Chapter 3 of the text [1] and to provide a few
More informationECE 275B Homework # 1 Solutions Winter 2018
ECE 275B Homework # 1 Solutions Winter 2018 1. (a) Because x i are assumed to be independent realizations of a continuous random variable, it is almost surely (a.s.) 1 the case that x 1 < x 2 < < x n Thus,
More informationContinuity. Matt Rosenzweig
Continuity Matt Rosenzweig Contents 1 Continuity 1 1.1 Rudin Chapter 4 Exercises........................................ 1 1.1.1 Exercise 1............................................. 1 1.1.2 Exercise
More informationMeasure and Integration: Solutions of CW2
Measure and Integration: s of CW2 Fall 206 [G. Holzegel] December 9, 206 Problem of Sheet 5 a) Left (f n ) and (g n ) be sequences of integrable functions with f n (x) f (x) and g n (x) g (x) for almost
More informationMath 61CM - Solutions to homework 6
Math 61CM - Solutions to homework 6 Cédric De Groote November 5 th, 2018 Problem 1: (i) Give an example of a metric space X such that not all Cauchy sequences in X are convergent. (ii) Let X be a metric
More informationProblem set 1, Real Analysis I, Spring, 2015.
Problem set 1, Real Analysis I, Spring, 015. (1) Let f n : D R be a sequence of functions with domain D R n. Recall that f n f uniformly if and only if for all ɛ > 0, there is an N = N(ɛ) so that if n
More informationReal Analysis, 2nd Edition, G.B.Folland Elements of Functional Analysis
Real Analysis, 2nd Edition, G.B.Folland Chapter 5 Elements of Functional Analysis Yung-Hsiang Huang 5.1 Normed Vector Spaces 1. Note for any x, y X and a, b K, x+y x + y and by ax b y x + b a x. 2. It
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More information