Section 9: Generalized method of moments

Size: px
Start display at page:

Download "Section 9: Generalized method of moments"

Transcription

1 1 Section 9: Generalized method of moments In this section, we revisit unbiased estimating functions to study a more general framework for estimating parameters. Let X n =(X 1,...,X n ), where the X i s are i.i.d. with density p(x; θ 0 ) P= {p(x; θ) :θ Θ}. We assume that θ =(γ,λ), Θ=Γ Λ, where Γ R k and Λ is some appropriately defined space. We are interested in estimating γ. Suppose that there exists a p-dimensional (p k) vector of

2 2 estimating functions g(x; γ) = g 1 (x; γ) g 2 (x; γ)... g p (x; γ) such that E θ [g(x; γ)] = 0 for all θ Θ.

3 3 Economists usually consider situations in which p>k.we usually consider p = k. A generalized method of moments (GMM) estimator is one that minimizes a squared Euclidean distance of sample moments from their population counterparts. A GMM estimator ˆγ(X n )isthe values of γ which maximizes Q(γ; X n )= [ 1 n ] [ n 1 g(x i ; γ) Ŵ n n i=1 ] n g(x i ; γ) where Ŵ n P W, a non-random, positive semi-definite matrix. i=1 (1)

4 4 The maximum likelihood estimator is a specific example of a GMM estimator. Under regularity conditions, we know that the MLE, ˆγ(X n ), satisfies, 1 n n i=1 ψ(x i;ˆγ(x n )) = 0. So, if we take g(x; γ) =ψ(x; γ) (Herep = k) andŵ n = I, then Q(γ; X n )= [ 1 n ] [ n 1 ψ(x i ; γ) n i=1 ] n ψ(x i ; γ) has maximum value at zero which is obtained at the MLE. i=1

5 5 In cases where p = k, the GMM estimator can usually be found by solving the k equations and k unknown problem 1 n n g(x i ; γ) =0 i=1 That is, the quantity Q(γ; X n ) (which is less than or equal to zero) can be made identically equal to zero. Such estimators are referred to as M-estimators.

6 6 Example 9.1 Suppose X n =(X 1,...,X n ), where the X i s are i.i.d. with c.d.f. F 0 P= {F : F is continuously, differentiable}. Letγ = F 1 (0.5) and γ 0 = F 1 0 (0.5). Define the estimating function g(x; γ) =I(X γ) 0.5 Note that E F [g(x; γ)] = 0 for all F P. The corresponding M-estimator is the sample median.

7 7 Example 9.2 Assume that the data are i.i.d. random vectors (Y 1,X 1 ),...,(Y n,x n )withe θ0 [Y i X i ]=µ(x i ; γ 0 ). Let s assume that Y is a q-dimensional vector, and X is r-dimensional. More formally, we are assuming that (Y i,x i ) p(y, x; θ 0 ) P= {p(y, x; θ) : yp(y x; θ) =µ(x; γ)}. Example 9.2a: Linear Regression (Take q = r =1andk =2) E θ [Y X] =γ 1 X = γ 0 + γ 1 X

8 8 Example 9.2b: Multivariate Linear Regression E θ [Y X] =γ 1 X = γ 10 + γ 11 X γ 1r X r... γ q0 + γ q1 X γ qr X r Example 9.2c: Logistic Regression (Y is binary) E θ [Y X] = exp(γ 0 + γ 1 X γ r X r ) 1+exp(γ 0 + γ 1 X γ r X r )

9 9 Consider the following moment function: g(y i,x i ; γ) =A(X i ; γ)(y i µ(x i ; γ)) where A(X i ; γ) isk q matrix which is a function of γ and X i. Now, note that for all θ. E θ [g(y i,x i ; γ)] = E θ [E θ [g(y i,x i ; γ) X i ]] The solution to the following equation: = E θ [A(X i ; γ)e θ [Y i µ(x i ; γ) X i ]=0 1 n n A(X i ; γ)(y i µ(x i ; γ)) = 0 i=1 is called the GEE estimator.

10 10 Example 9.3 Suppose X n =(X 1,...,X n ), where X i =(Y i,w i,z i ) s are i.i.d. We assume Y is scalar, W is k-dimensional, and Z is p-dimensional (p k). Further, assume that Y = W γ 0 + ɛ, where Y =(Y 1,...,Y n ), W =(W 1,...,W n), ɛ =(ɛ 1,...,ɛ n ), ɛ i s are i.i.d. with variance σ 2 0,andE θ0 [ɛ i W i ]=0. Consider the following k-dimensional estimating function: g(x; γ) =W (Y W γ) Note that E θ [g(x; γ)] has mean zero because E θ [ɛ W ]=0. The corresponding M-estimator is the least squares estimator.

11 11 Suppose that E θ [ɛ i W i ] 0, but E θ [ɛ i Z i ] = 0. Then, consider the following p-dimensional estimating function: g(x; γ) =Z(Y W γ) Note that E θ [g(x; γ)] has mean zero because E θ [ɛ Z] =0. Z is called an instrumental variable. Economists try to find these variables, because they often deal with situations where W and ɛ are correlated. When p>k,weestimateγ by maximizing the quadratic form, given by (1).

12 12 As with MLE s we shall study the large sample properties of GMM estimators. This will include a study of consistency, asymptotic normality, and efficiency. We first note that Q(γ; X n ), given by (1), is minus a quadratic form, which is a random function of γ. It converges in probability to a deterministic function of γ: Q 0 (γ) = q 0 (γ) W q 0 (γ) 0 (2) where q 0 (γ) =E θ0 [g(x; γ)]. By the WLLN, pointwise convergence for each γ is straightforward.

13 13 We start with the issue of consistency. The approach we shall take is to try and mimic the proof we used to establish the consistency of the MLE. That is, we will consider Q(γ; X n ) to be an objective function which we would like to maximize. Our intuition says that the maximum of Q(γ; X n ) should converge to the maximum of Q 0 (γ). Just like we did previously, we would like to establish conditions under which Q 0 (γ) has a unique maximum at γ 0. We know that q 0 (γ 0 ) = 0. Since Q 0 (γ) is minus a quadratic form, we know that its maximum is achieved at γ 0. We must establish conditions under which this maximum is unique. In other words, we want to find conditions so that Q 0 (γ) 0whenγ γ 0.

14 14 Lemma 9.1: If W is positive semi-definite and W q 0 (γ) 0for γ γ 0,thenQ 0 (γ) = q 0 (γ) W q 0 (γ) has a unique maximum at γ 0. Proof: Since W is positive semi-definite, we know that there exists a possibly singular matrix R such that R R = W (see page 257 of Strang, 1980). If γ γ 0,thenW q 0 (γ) =R Rq 0 (γ) 0. This implies that Rq 0 (γ) 0. Hence, Q 0 (γ) = [Rq 0 (γ)] [Rq 0 (γ)] <Q 0 (γ 0 )=0

15 15 In some cases it may be difficult to show that we have a unique maximum at γ 0 for Q 0 (γ). This is especially true when we are doing M-estimation (p = k). That is, there may be many solutions to the k equations and k unknowns, especially when we have complicated non-linear equations. If W is positive definite, then a unique maximum is guaranteed if q 0 (γ) 0 whenever γ γ 0. By increasing the dimension p of the moment function, we might make this more likely to happen. This must be studied on a case by case basis.

16 16 Back to Example 9.1 q 0 (γ) =E F0 [I(X γ)] 0.5 =F 0 (γ) 0.5 Since F 0 is continuous, q 0 (γ) = 0 if and only if γ = γ 0. Back to Example 9.3 Suppose E θ0 [ɛ W ]=0andE θ0 [WW ] is finite and positive definite. When g(x; γ) =W (Y W γ), we know that q 0 (γ) = E θ0 [W (Y W γ)] = E θ0 [W (W γ 0 W γ + ɛ)] = E θ0 [WW ](γ 0 γ) This equals zero if and only if γ = γ 0.

17 17 Suppose E θ0 [ɛ W ] 0,E θ0 [ɛ Z] =0andE θ0 [ZW ] is finite and full rank. When g(x; γ) =Z(Y W γ), we know that q 0 (γ) = E θ0 [Z(Y W γ)] = E θ0 [Z(W γ 0 W γ + ɛ)] = E θ0 [ZW ](γ 0 γ) This equals zero if and only if γ = γ 0.

18 18 Theorem 9.2: Suppose that X n =(X 1,...,X n ), where the X i s are i.i.d. with density p(x; θ 0 ) P= {p(x; θ) :θ Θ}. Assume that θ =(γ,λ),θ=γ Λ, where Γ R k and Λ is some appropriately defined space. In addition, suppose that i. Ŵ n P W ii. W is a positive definite and q 0 (γ) =E θ0 [g(x; γ)] = 0 only if γ = γ 0 (unique maximum at γ = γ 0 ). iii. Γ is compact. iv. g(x; γ) is continuous in γ Γ, for all x X. v. g(x; γ) d(x) for all γ ΓandE θ0 [d(x)] <. Then ˆγ(X n ), the estimator which maximizes Q(γ; X n )givenby (1) converges in probability to γ 0. Proof: The proof is almost identical to that used to prove the consistency of the MLE. That is, if we show that the objective

19 19 function Q(γ; X n ) converges uniformly in probability to Q 0 (γ) given by (2) and the fact that Q 0 (γ) is uniquely maximized at γ 0, then we can use Theorem 8.2 to prove consistency. To use Theorem 8.2, we must show that a. Q(γ; X n ) is continuous in γ. b. Q 0 (γ) is continuous in γ. c. sup γ Γ Q(γ; X n ) Q 0 (γ) P 0. Condition a. is implied by the continuity of g(x; γ) inγ ( Condition iv. above). By the continuity of g(x; γ) inγ and the fact that g(x; γ) is dominated uniformly by an integrable function (Conditions iv. and v. above), we can use Lemma 8.3 to show that d. q 0 (γ) is continuous in γ. e. sup γ Γ ˆq(γ; X n ) q 0 (γ) P 0

20 20 where ˆq(γ; X n )= 1 n n i=1 g(x i; γ). Condition d. will imply Condition b. Therefore, we are left to show that Condition c. holds. Adding and subtracting terms, we know that Q(γ; X n ) Q 0 (γ) = ˆq(γ; X n ) Ŵ nˆq(γ; X n )+q 0 (γ) W q 0 (γ) = [ˆq(γ; X n ) q 0 (γ)] Ŵ n [ˆq(γ; X n ) q 0 (γ)] q 0 (γ) [Ŵn + Ŵ n][ˆq(γ; X n ) q 0 (γ)] q 0 (γ) [Ŵ n W ]q 0 (γ)

21 21 By the triangle inequality, we know that Q(γ; X n ) Q 0 (γ) [ˆq(γ; X n ) q 0 (γ)] Ŵ n [ˆq(γ; X n ) q 0 (γ)] + q 0 (γ) [Ŵn + Ŵ n][ˆq(γ; X n ) q 0 (γ)] + q 0 (γ) [Ŵn W ]q 0 (γ) ˆq(γ; X n ) q 0 (γ) 2 Ŵn + 2 q 0 (γ) Ŵn ˆq(γ; X n ) q 0 (γ) + q 0 (γ) 2 Ŵn W So, sup Q(γ; X n ) Q 0 (γ) sup ˆq(γ; X n ) q 0 (γ) 2 Ŵn + (3) γ Γ γ Γ 2sup q 0 (γ) Ŵ n sup ˆq(γ; X n ) q 0 (γ) +(4) γ Γ γ Γ sup q 0 (γ) 2 Ŵn W (5) γ Γ

22 22 Now we can show that each of the terms on the RHS converges in probability to zero, which will complete the proof. First, we know that Ŵn W P, which we assume is fixed and bounded. We also know that sup γ Γ ˆq(γ; X n ) q 0 (γ) P 0. Since q 0 (γ) isa continuous function on a compact set, then we know that sup γ Γ q 0 (γ) is bounded. These facts imply that (3), (4) and (5) converge in probability to zero. Thus, we have uniform convergence in probability of Q(γ; X n )toq 0 (γ).

23 23 Asymptotic Normality of the GMM Estimator After establishing the consistency of the GMM estimator, we are now in the position to prove asymptotic normality. The proof will follow that given for the MLE. Before we embark on this proof, we need some preliminaries. First, we will take γ 0 to be in the interior of a compact set, Γ. Since ˆγ(X n ) is consistent for γ 0,themaximumofQ(γ; X n ) will, with probability approaching one, be a local maximum. That is, Q(ˆγ(X n ); X n ) γ =0

24 24 Since Q(γ; X n )= ˆq(γ; X n ) Ŵ n ˆq(γ; X n ), we know that Q(γ; X n ) γ where So, = 2 ˆq(γ; X n) Ŵn ˆq(γ; X n )= 2 γ ˆD(γ; X n ) Ŵ nˆq(γ; X n ) ˆD(γ; X n )= ˆq(γ; X n) γ = 1 n n i=1 g(x i ; γ) γ 2 ˆD(ˆγ(X n ); X n ) Ŵ nˆq(ˆγ(x n ); X n )=0

25 25 Theorem 9.3 Suppose that X n =(X 1,...,X n ), where the X i s are i.i.d. with density p(x; θ 0 ) P= {p(x; θ) :θ Θ}. Assume that θ =(γ,λ),θ=γ Λ, where Γ R k and Λ is some appropriately defined space. In addition, suppose that following 8 regularity conditions hold: i. Ŵ n P W ii. γ 0 is in the interior of Γ, which is assumed to be compact. iii. W is a positive definite and q 0 (γ) =E θ0 [g(x; γ)] = 0 only if γ = γ 0. iv. g(x; γ) is continuous in γ Γ, for all x X. v. g(x; γ) d(x) for all γ ΓandE θ0 [d(x)] <.

26 26 vi. g(x; γ) is continuously differentiable in a neighborhood, N of γ 0. vii. g(x;γ) γ f(x) for all γ N and E θ0 [f(x)] <. viii. D WD is non-singular, where D = E θ0 [ g(x;γ 0) γ ] then, n(ˆγ(xn ) γ 0 ) D N(0, (D WD) 1 D W ΩW D(D WD) 1 ) where Ω = E θ0 [g(x; γ 0 )g(x; γ 0 ) ].

27 27 Proof: We know that with probability approaching one that the GMM estimator, ˆγ(X n ) satisfies the equation: ˆD(ˆγ(X n ); X n ) Ŵ nˆq(ˆγ(x n ); X n )=0 (6) Expanding ˆq(ˆγ(X n ); X n )aboutγ 0 yields ˆq(ˆγ(X n ); X n )=ˆq(γ 0 ; X n )+D n(x n )(ˆγ(X n ) γ 0 ) (7) where D (X n )isap k random matrix where the jth row is the jth row of ˆD(γ; X n ) evaluated at some intermediate value γjn between ˆγ(X n )andγ 0. γjn may be different from row to row, but it is still consistent for γ 0. By Conditions vi. and vii., we can invoke Lemma 8.3, to show sup γ N ˆD(γ; X n ) D 0 (γ) P 0

28 28 where D 0 (γ) =E θ0 [ g(x;γ) γ ]andd 0 (γ 0 )=D. Sinceγjn N with probability one, we know that Dn(X n ) P D. will be in Plugging (7) into (6), we have that or ˆD(ˆγ(X n ); X n ) Ŵ n {ˆq(γ 0 ; X n )+D n(x n )(ˆγ(X n ) γ 0 )} =0 ˆD(ˆγ(X n ); X n ) Ŵ n ˆq(γ 0 ; X n )+ ˆD(ˆγ(X n ); X n ) Ŵ n D n(x n )(ˆγ(X n ) γ 0 )} =0 This implies that n(ˆγ(xn ) γ 0 )= { ˆD(ˆγ(X n ); X n ) Ŵ n D n (X n)} 1 ˆD(ˆγ(Xn ); X n ) Ŵ n nˆq(γ 0 ; X n )

29 29 Note that nˆq(γ0 ; X n ) D N(0, Ω) ˆD(ˆγ(X n ); X n ) Ŵ n Dn(X n ) P D WD { ˆD(ˆγ(X n ); X n ) Ŵ n Dn(X n )} 1 P (D WD) 1 ˆD(ˆγ(X n ); X n ) P Ŵ n D W Using Slutsky s, these results imply that n(ˆγ(xn ) γ 0 ) D N(0, (D WD) 1 D W ΩW D(D WD) 1 )

30 30 Estimating the Asymptotic Variance Use Ŵ n for W. ˆD(ˆγ(X n ); X n )ford i=1 g(x i;ˆγ(x n ))g(x i ;ˆγ(X n )) for Ω. 1 n Special Case: p = k In the special case where the dimension of g(x; γ) isequaltok and D is a full rank square matrix, the asymptotic variance simplifies to be D 1 ΩD 1 or (D Ω 1 D) 1.

31 31 Back to Example 9.2 g(y,x; γ) =A(X; γ)(y µ(x; γ)) Here, D = E θ0 [ A(X; γ 0 ) µ(x; γ 0) γ + A(X; γ 0) γ (Y µ(x; γ 0 ))] = E θ0 [ A(X; γ 0 ) µ(x; γ 0) γ ] Ω = Var θ0 [E θ0 [g(y,x; γ 0 ) X]] + E θ0 [Var θ0 [g(y,x; γ 0 ) X]] = E θ0 [A(X; γ 0 )Var θ0 [Y X]A(X; γ 0 ) ] So, the asymptotic variance of the GEE estimator is given by E θ0 [A(X; γ) µ(x; γ 0 ) γ ] 1 E θ0 [A(X; γ 0 )Var θ0 [Y X]A(X; γ 0 ) ]E θ0 [A(X; γ) µ(x; γ 0 ) γ ] 1

32 32 An estimator for the asymptotic variance can be obtained by replacing the expectations by their empirical counterparts. That is, we replace E θ0 [A(X; γ 0 ) µ(x;γ 0) γ ]by 1 n n i=1 A(X i ;ˆγ(X n )) µ(x i;ˆγ(x n )) γ and E θ0 [A(X; γ 0 )Var θ0 [Y X]A(X; γ 0 ) ]by 1 n n A(X i ;ˆγ(X n ))(Y i µ(x i ;ˆγ(X n )))(Y i µ(x i ;ˆγ(X n ))) A(X i ;ˆγ(X n )) i=1

33 33 Why is this latter quantity a consistent estimator? We know that 1 n n A(X i ; γ 0 )(Y i µ(x i ; γ 0 ))(Y i µ(x i ; γ 0 )) A(X i ; γ 0 ) (8) i=1 converges to E θ0 [A(X; γ 0 )Var θ0 [Y X]A(X; γ 0 ) ]bythewlln.by additional smoothness conditions together with uniform bounding by an integrable function to show uniform convergence in probability, we can plug in a consistent estimator for γ 0 in (8) without altering the resulting probability limit.

34 34 Efficiency and GMM Estimators Let us first consider the problem where p>k. For a given moment function, we what to find the optimal choice of W.Thatis,what p p matrix W will minimize the asymptotic variance of n(ˆγ(xn ) γ 0 ). Note that for p = k, the choice of W is irrelevant. This is clear because the asymptotic variance does not involve W. Theorem 9.4: The optimal choice of W is to take W =Ω 1, where Ω is the covariance matrix of g.

35 35 Proof: Let Z be a random vector with mean zero and covariance matrix Ω. Note that (D WD) 1 D WZ has covariance matrix (D WD) 1 D W ΩW D(D WD) 1 which corresponds to the asymptotic variance of the GMM estimator. Also, note that (D WD) 1 D WZ = (D Ω 1 D) 1 D Ω 1 Z + {(D WD) 1 D W (D Ω 1 D) 1 D Ω 1 }Z Let A 1 =(D WD) 1 D W and A 0 =(D Ω 1 D) 1 D Ω 1. Then, we know that A 1 Z = A 0 Z +(A 1 A 0 )Z This implies that Var[A 1 Z] = Var[A 0 Z]+Var[(A 1 A 0 )Z] +A 0 Var[Z](A 1 A 0 )+(A 1 A 0 )Var[Z]A 0 = Var[A 0 Z]+Var[(A 1 A 0 )Z]

36 36 This implies that Var[A 1 Z] Var[A 0 Z]or (D WD) 1 D W ΩW D(D WD) 1 (D Ω 1 D) 1 Remember that when we are comparing covariance matrices, means that the difference between the matrices is positive semi-definite. Say for example that ˆγ n (W ) corresponds to a GMM estimator with weight matrix which converges in probability to W. Suppose that we are interested in estimating h(γ) whichh( ) mapsfromr k to R 1. By the multivariate delta method, we know that n(h(ˆγn (W )) h(γ 0 )) D N(0, h(γ 0) γ where Σ(W )=(D WD) 1 D W ΩW D(D WD) 1. Σ(W ) h(γ 0) ) γ

37 37 Since Σ(W ) Σ(Ω 1 ), we know that h(γ 0 ) γ (Σ(W ) Σ(Ω 1 )) h(γ 0) γ 0 This implies that the asymptotic variance of any real-valued function of the parameters is minimized by choosing ˆγ n (Ω 1 ).

38 38 For given g, the best we can do is to work with the objective function Q(γ; X n )= ˆq(γ; X n ) ˆΩ 1ˆq(γ; X n ) where ˆΩ P Ω. We already know that ˆΩ = 1 n i=1 g(x i;ˆγ(x n ))g(x i ;ˆγ(X n )) is a consistent estimator for Ω, where ˆγ(X n ) is a consistent estimator for γ 0. This suggests the following procedure for estimating ˆγ n (Ω 1 ). 1. Find a naive estimator for γ 0,sayˆγ n (I). 2. Compute ˆΩ using this naive estimator. 3. Compute ˆγ n (ˆΩ).

39 39 Global Efficiency of GMM Estimators Suppose that there are no nuisance parameters. For given g, we know the best W to use. But, what is the best g? First, we will demonstrate that the smallest asymptotic variance for n(ˆγ(xn ) γ 0 )isi 1 (γ 0 ), where I(γ 0 ) is the Fisher information matrix. That is, the MLE achieves the lowest variance. Recall, that the MLE can be viewed as a GMM estimator by letting g(x; γ) =ψ(x; γ).

40 40 The moment function g(x; γ) is assumed to have mean zero, i.e., E γ [g(x; γ)] = g(x; γ)p(x; γ)dµ(x) = 0 for all γ Γ Under suitable regularity conditions, we can interchange differentiation and integration. Therefore, we know that g(x; γ)p(x; γ)dµ(x) = {g(x; γ)p(x; γ)}dµ(x) =0 γ γ This implies that g(x; γ) p(x; γ)dµ(x)+ γ g(x; γ) p(x; γ) γ dµ(x) =0 and E γ [ g(x; γ) γ ]= E γ [g(x; γ)ψ(x; γ) ] D(γ)

41 41 Consider the following vector: g(x; γ 0 )+DI 1 (γ 0 )ψ(x; γ 0 ) Element by element, we can show that this is the residual from the projection of g(x; γ 0 ) onto the space spanned by the elements of the score vector. This random vector has covariance matrix E γ0 [(g(x; γ 0 )+DI 1 (γ 0 )ψ(x; γ 0 ))(g(x; γ 0 )+DI 1 (γ 0 )ψ(x; γ 0 )) ] which is equal to Ω DI 1 (γ 0 )D Since Ω DI 1 (γ 0 )D is a covariance matrix it must be positive semi-definite.

42 42 Note that (D WD) 1 D W ΩW D(D WD) 1 I 1 (γ 0 )=(D WD) 1 D W {Ω DI 1 (γ 0 )D }W D(D WD) 1 This matrix must be positive semi-definite. Since the difference between the asymptotic variance of the GMM estimator and I 1 (γ 0 ) is guaranteed to be positive semi-definite, we know that no GMM estimator can have smaller asymptotic variance than I 1 (γ 0 ).

43 43 Efficiency Results for GEE s For the class GEE estimators, the efficiency of depends on the choice of the matrix A(X; γ). Remember that the asymptotic variance for a GEE estimator is E θ0 [A(X; γ) µ(x; γ 0 ) γ ] 1 E θ0 [A(X; γ 0 )Var θ0 [Y X]A(X; γ 0 ) ]E θ0 [A(X; γ) µ(x; γ 0 ) γ ] 1 What choice of the A matrix will minimize this asymptotic variance? Theorem 9.4 The optimal choice of A(X; γ) is M(γ) Var θ0 [Y X] 1,whereM(γ) = µ(x;γ) γ.

44 44 Proof: The proof is the asymptotic version of the Gauss-Markov theorem. For simplicity, we suppress notation which is in parentheses or subscripted. Let H A = {E[AM]} 1 A. Then, Var[H A (Y µ)] = E[Var[H A (Y µ) X]] + Var[E[H A (Y µ) X]] = E[H A Var[Y X]H A] = {E[AM]} 1 E[AV ar[y X]A ]{E[AM]} 1 This is the asymptotic variance of the GEE estimator. The claim is that A opt = M Var[Y X] 1,inwhichcase H opt = {E[M Var[Y X] 1 M]} 1 M Var[Y X] 1 and the asymptotic variance is equal to {E[M Var[Y X] 1 M]} 1

45 45 Note that Var[H A (Y µ)] = Var[(H A H opt )(Y µ)+h opt (Y µ)] = Var[(H A H opt )(Y µ)] + Var[H opt (Y µ)] + E[(H A H opt )(Y µ)(y µ) H opt]+ E[H opt (Y µ)(y µ) (H A H opt ) ] = Var[(H A H opt )(Y µ)] + Var[H opt (Y µ)] This implies that Var[H A (Y µ)] Var[H opt (Y µ)] is positive semi-definite, which gives the desired result.

Section 8.2. Asymptotic normality

Section 8.2. Asymptotic normality 30 Section 8.2. Asymptotic normality We assume that X n =(X 1,...,X n ), where the X i s are i.i.d. with common density p(x; θ 0 ) P= {p(x; θ) :θ Θ}. We assume that θ 0 is identified in the sense that

More information

Chapter 4: Asymptotic Properties of the MLE

Chapter 4: Asymptotic Properties of the MLE Chapter 4: Asymptotic Properties of the MLE Daniel O. Scharfstein 09/19/13 1 / 1 Maximum Likelihood Maximum likelihood is the most powerful tool for estimation. In this part of the course, we will consider

More information

Section 10: Role of influence functions in characterizing large sample efficiency

Section 10: Role of influence functions in characterizing large sample efficiency Section 0: Role of influence functions in characterizing large sample efficiency. Recall that large sample efficiency (of the MLE) can refer only to a class of regular estimators. 2. To show this here,

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak. Large Sample Theory Large Sample Theory is a name given to the search for approximations to the behaviour of statistical procedures which are derived by computing limits as the sample size, n, tends to

More information

Section 8: Asymptotic Properties of the MLE

Section 8: Asymptotic Properties of the MLE 2 Section 8: Asymptotic Properties of the MLE In this part of the course, we will consider the asymptotic properties of the maximum likelihood estimator. In particular, we will study issues of consistency,

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Multivariate Regression

Multivariate Regression Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the

More information

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools Joan Llull Microeconometrics IDEA PhD Program Maximum Likelihood Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical

More information

Chapter 4: Asymptotic Properties of the MLE (Part 2)

Chapter 4: Asymptotic Properties of the MLE (Part 2) Chapter 4: Asymptotic Properties of the MLE (Part 2) Daniel O. Scharfstein 09/24/13 1 / 1 Example Let {(R i, X i ) : i = 1,..., n} be an i.i.d. sample of n random vectors (R, X ). Here R is a response

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

DA Freedman Notes on the MLE Fall 2003

DA Freedman Notes on the MLE Fall 2003 DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar

More information

1 Cricket chirps: an example

1 Cricket chirps: an example Notes for 2016-09-26 1 Cricket chirps: an example Did you know that you can estimate the temperature by listening to the rate of chirps? The data set in Table 1 1. represents measurements of the number

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix

More information

Lecture 4: September Reminder: convergence of sequences

Lecture 4: September Reminder: convergence of sequences 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 4: September 6 In this lecture we discuss the convergence of random variables. At a high-level, our first few lectures focused

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

GMM Estimation and Testing

GMM Estimation and Testing GMM Estimation and Testing Whitney Newey July 2007 Idea: Estimate parameters by setting sample moments to be close to population counterpart. Definitions: β : p 1 parameter vector, with true value β 0.

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Take-home final examination February 1 st -February 8 th, 019 Instructions You do not need to edit the solutions Just make sure the handwriting is legible The final solutions should

More information

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions. Large Sample Theory Study approximate behaviour of ˆθ by studying the function U. Notice U is sum of independent random variables. Theorem: If Y 1, Y 2,... are iid with mean µ then Yi n µ Called law of

More information

7 Influence Functions

7 Influence Functions 7 Influence Functions The influence function is used to approximate the standard error of a plug-in estimator. The formal definition is as follows. 7.1 Definition. The Gâteaux derivative of T at F in the

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

Math 328 Course Notes

Math 328 Course Notes Math 328 Course Notes Ian Robertson March 3, 2006 3 Properties of C[0, 1]: Sup-norm and Completeness In this chapter we are going to examine the vector space of all continuous functions defined on the

More information

Properties of the least squares estimates

Properties of the least squares estimates Properties of the least squares estimates 2019-01-18 Warmup Let a and b be scalar constants, and X be a scalar random variable. Fill in the blanks E ax + b) = Var ax + b) = Goal Recall that the least squares

More information

Classical regularity conditions

Classical regularity conditions Chapter 3 Classical regularity conditions Preliminary draft. Please do not distribute. The results from classical asymptotic theory typically require assumptions of pointwise differentiability of a criterion

More information

Notes on Random Vectors and Multivariate Normal

Notes on Random Vectors and Multivariate Normal MATH 590 Spring 06 Notes on Random Vectors and Multivariate Normal Properties of Random Vectors If X,, X n are random variables, then X = X,, X n ) is a random vector, with the cumulative distribution

More information

STA 302f16 Assignment Five 1

STA 302f16 Assignment Five 1 STA 30f16 Assignment Five 1 Except for Problem??, these problems are preparation for the quiz in tutorial on Thursday October 0th, and are not to be handed in As usual, at times you may be asked to prove

More information

Econometrics I, Estimation

Econometrics I, Estimation Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the

More information

Asymptotics for Nonlinear GMM

Asymptotics for Nonlinear GMM Asymptotics for Nonlinear GMM Eric Zivot February 13, 2013 Asymptotic Properties of Nonlinear GMM Under standard regularity conditions (to be discussed later), it can be shown that where ˆθ(Ŵ) θ 0 ³ˆθ(Ŵ)

More information

Composite Hypotheses and Generalized Likelihood Ratio Tests

Composite Hypotheses and Generalized Likelihood Ratio Tests Composite Hypotheses and Generalized Likelihood Ratio Tests Rebecca Willett, 06 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve

More information

Linear Models Review

Linear Models Review Linear Models Review Vectors in IR n will be written as ordered n-tuples which are understood to be column vectors, or n 1 matrices. A vector variable will be indicted with bold face, and the prime sign

More information

ECE 275A Homework 6 Solutions

ECE 275A Homework 6 Solutions ECE 275A Homework 6 Solutions. The notation used in the solutions for the concentration (hyper) ellipsoid problems is defined in the lecture supplement on concentration ellipsoids. Note that θ T Σ θ =

More information

Submitted to the Brazilian Journal of Probability and Statistics

Submitted to the Brazilian Journal of Probability and Statistics Submitted to the Brazilian Journal of Probability and Statistics Multivariate normal approximation of the maximum likelihood estimator via the delta method Andreas Anastasiou a and Robert E. Gaunt b a

More information

Large Sample Properties of Estimators in the Classical Linear Regression Model

Large Sample Properties of Estimators in the Classical Linear Regression Model Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in

More information

Analogy Principle. Asymptotic Theory Part II. James J. Heckman University of Chicago. Econ 312 This draft, April 5, 2006

Analogy Principle. Asymptotic Theory Part II. James J. Heckman University of Chicago. Econ 312 This draft, April 5, 2006 Analogy Principle Asymptotic Theory Part II James J. Heckman University of Chicago Econ 312 This draft, April 5, 2006 Consider four methods: 1. Maximum Likelihood Estimation (MLE) 2. (Nonlinear) Least

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

Lecture 1: Introduction

Lecture 1: Introduction Principles of Statistics Part II - Michaelmas 208 Lecturer: Quentin Berthet Lecture : Introduction This course is concerned with presenting some of the mathematical principles of statistical theory. One

More information

Statistics Ph.D. Qualifying Exam: Part II November 9, 2002

Statistics Ph.D. Qualifying Exam: Part II November 9, 2002 Statistics Ph.D. Qualifying Exam: Part II November 9, 2002 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. 1 2 3 4 5 6 7 8 9 10 11 12 2. Write your

More information

Review and continuation from last week Properties of MLEs

Review and continuation from last week Properties of MLEs Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that

More information

The loss function and estimating equations

The loss function and estimating equations Chapter 6 he loss function and estimating equations 6 Loss functions Up until now our main focus has been on parameter estimating via the maximum likelihood However, the negative maximum likelihood is

More information

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research Linear models Linear models are computationally convenient and remain widely used in applied econometric research Our main focus in these lectures will be on single equation linear models of the form y

More information

A General Overview of Parametric Estimation and Inference Techniques.

A General Overview of Parametric Estimation and Inference Techniques. A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying

More information

STAT 730 Chapter 4: Estimation

STAT 730 Chapter 4: Estimation STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum

More information

10. Linear Models and Maximum Likelihood Estimation

10. Linear Models and Maximum Likelihood Estimation 10. Linear Models and Maximum Likelihood Estimation ECE 830, Spring 2017 Rebecca Willett 1 / 34 Primary Goal General problem statement: We observe y i iid pθ, θ Θ and the goal is to determine the θ that

More information

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. Rebecca Barter May 5, 2015 Linear Regression Review Linear Regression Review

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Joint work with Nottingham colleagues Simon Preston and Michail Tsagris.

Joint work with Nottingham colleagues Simon Preston and Michail Tsagris. /pgf/stepx/.initial=1cm, /pgf/stepy/.initial=1cm, /pgf/step/.code=1/pgf/stepx/.expanded=- 10.95415pt,/pgf/stepy/.expanded=- 10.95415pt, /pgf/step/.value required /pgf/images/width/.estore in= /pgf/images/height/.estore

More information

A Few Notes on Fisher Information (WIP)

A Few Notes on Fisher Information (WIP) A Few Notes on Fisher Information (WIP) David Meyer dmm@{-4-5.net,uoregon.edu} Last update: April 30, 208 Definitions There are so many interesting things about Fisher Information and its theoretical properties

More information

2. Variance and Higher Moments

2. Variance and Higher Moments 1 of 16 7/16/2009 5:45 AM Virtual Laboratories > 4. Expected Value > 1 2 3 4 5 6 2. Variance and Higher Moments Recall that by taking the expected value of various transformations of a random variable,

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 7 Maximum Likelihood Estimation 7. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function

More information

Linear Regression. Junhui Qian. October 27, 2014

Linear Regression. Junhui Qian. October 27, 2014 Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Immerse Metric Space Homework

Immerse Metric Space Homework Immerse Metric Space Homework (Exercises -2). In R n, define d(x, y) = x y +... + x n y n. Show that d is a metric that induces the usual topology. Sketch the basis elements when n = 2. Solution: Steps

More information

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Theory of Maximum Likelihood Estimation. Konstantin Kashin Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical

More information

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Put your solution to each problem on a separate sheet of paper. Problem 1. (5166) Assume that two random samples {x i } and {y i } are independently

More information

δ -method and M-estimation

δ -method and M-estimation Econ 2110, fall 2016, Part IVb Asymptotic Theory: δ -method and M-estimation Maximilian Kasy Department of Economics, Harvard University 1 / 40 Example Suppose we estimate the average effect of class size

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Next is material on matrix rank. Please see the handout

Next is material on matrix rank. Please see the handout B90.330 / C.005 NOTES for Wednesday 0.APR.7 Suppose that the model is β + ε, but ε does not have the desired variance matrix. Say that ε is normal, but Var(ε) σ W. The form of W is W w 0 0 0 0 0 0 w 0

More information

ECE 4400:693 - Information Theory

ECE 4400:693 - Information Theory ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential

More information

6.1 Variational representation of f-divergences

6.1 Variational representation of f-divergences ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016

More information

Lecture Notes 3 Convergence (Chapter 5)

Lecture Notes 3 Convergence (Chapter 5) Lecture Notes 3 Convergence (Chapter 5) 1 Convergence of Random Variables Let X 1, X 2,... be a sequence of random variables and let X be another random variable. Let F n denote the cdf of X n and let

More information

2 Statistical Estimation: Basic Concepts

2 Statistical Estimation: Basic Concepts Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation:

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan Monte-Carlo MMD-MA, Université Paris-Dauphine Xiaolu Tan tan@ceremade.dauphine.fr Septembre 2015 Contents 1 Introduction 1 1.1 The principle.................................. 1 1.2 The error analysis

More information

ECO Class 6 Nonparametric Econometrics

ECO Class 6 Nonparametric Econometrics ECO 523 - Class 6 Nonparametric Econometrics Carolina Caetano Contents 1 Nonparametric instrumental variable regression 1 2 Nonparametric Estimation of Average Treatment Effects 3 2.1 Asymptotic results................................

More information

Missing dependent variables in panel data models

Missing dependent variables in panel data models Missing dependent variables in panel data models Jason Abrevaya Abstract This paper considers estimation of a fixed-effects model in which the dependent variable may be missing. For cross-sectional units

More information

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN

STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN Massimo Guidolin Massimo.Guidolin@unibocconi.it Dept. of Finance STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN SECOND PART, LECTURE 2: MODES OF CONVERGENCE AND POINT ESTIMATION Lecture 2:

More information

Information in a Two-Stage Adaptive Optimal Design

Information in a Two-Stage Adaptive Optimal Design Information in a Two-Stage Adaptive Optimal Design Department of Statistics, University of Missouri Designed Experiments: Recent Advances in Methods and Applications DEMA 2011 Isaac Newton Institute for

More information

Regression and Statistical Inference

Regression and Statistical Inference Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04

More information

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the

More information

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

LECTURE 2 LINEAR REGRESSION MODEL AND OLS SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another

More information

Stat 206: Sampling theory, sample moments, mahalanobis

Stat 206: Sampling theory, sample moments, mahalanobis Stat 206: Sampling theory, sample moments, mahalanobis topology James Johndrow (adapted from Iain Johnstone s notes) 2016-11-02 Notation My notation is different from the book s. This is partly because

More information

Multivariate Distributions

Multivariate Distributions IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate

More information

Measuring the Sensitivity of Parameter Estimates to Estimation Moments

Measuring the Sensitivity of Parameter Estimates to Estimation Moments Measuring the Sensitivity of Parameter Estimates to Estimation Moments Isaiah Andrews MIT and NBER Matthew Gentzkow Stanford and NBER Jesse M. Shapiro Brown and NBER May 2017 Online Appendix Contents 1

More information

The high order moments method in endpoint estimation: an overview

The high order moments method in endpoint estimation: an overview 1/ 33 The high order moments method in endpoint estimation: an overview Gilles STUPFLER (Aix Marseille Université) Joint work with Stéphane GIRARD (INRIA Rhône-Alpes) and Armelle GUILLOU (Université de

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Asymptotic Theory. L. Magee revised January 21, 2013

Asymptotic Theory. L. Magee revised January 21, 2013 Asymptotic Theory L. Magee revised January 21, 2013 1 Convergence 1.1 Definitions Let a n to refer to a random variable that is a function of n random variables. Convergence in Probability The scalar a

More information

Measure-Transformed Quasi Maximum Likelihood Estimation

Measure-Transformed Quasi Maximum Likelihood Estimation Measure-Transformed Quasi Maximum Likelihood Estimation 1 Koby Todros and Alfred O. Hero Abstract In this paper, we consider the problem of estimating a deterministic vector parameter when the likelihood

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 8 Maximum Likelihood Estimation 8. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function

More information

M- and Z- theorems; GMM and Empirical Likelihood Wellner; 5/13/98, 1/26/07, 5/08/09, 6/14/2010

M- and Z- theorems; GMM and Empirical Likelihood Wellner; 5/13/98, 1/26/07, 5/08/09, 6/14/2010 M- and Z- theorems; GMM and Empirical Likelihood Wellner; 5/13/98, 1/26/07, 5/08/09, 6/14/2010 Z-theorems: Notation and Context Suppose that Θ R k, and that Ψ n : Θ R k, random maps Ψ : Θ R k, deterministic

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

University of Pavia. M Estimators. Eduardo Rossi

University of Pavia. M Estimators. Eduardo Rossi University of Pavia M Estimators Eduardo Rossi Criterion Function A basic unifying notion is that most econometric estimators are defined as the minimizers of certain functions constructed from the sample

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

1 General problem. 2 Terminalogy. Estimation. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ).

1 General problem. 2 Terminalogy. Estimation. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ). Estimation February 3, 206 Debdeep Pati General problem Model: {P θ : θ Θ}. Observe X P θ, θ Θ unknown. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ). Examples: θ = (µ,

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Yingying Dong and Arthur Lewbel California State University Fullerton and Boston College July 2010 Abstract

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 6: Bias and variance (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 49 Our plan today We saw in last lecture that model scoring methods seem to be trading off two different

More information

Lecture Notes 15 Prediction Chapters 13, 22, 20.4.

Lecture Notes 15 Prediction Chapters 13, 22, 20.4. Lecture Notes 15 Prediction Chapters 13, 22, 20.4. 1 Introduction Prediction is covered in detail in 36-707, 36-701, 36-715, 10/36-702. Here, we will just give an introduction. We observe training data

More information

F (z) =f(z). f(z) = a n (z z 0 ) n. F (z) = a n (z z 0 ) n

F (z) =f(z). f(z) = a n (z z 0 ) n. F (z) = a n (z z 0 ) n 6 Chapter 2. CAUCHY S THEOREM AND ITS APPLICATIONS Theorem 5.6 (Schwarz reflection principle) Suppose that f is a holomorphic function in Ω + that extends continuously to I and such that f is real-valued

More information

NUCLEAR NORM PENALIZED ESTIMATION OF INTERACTIVE FIXED EFFECT MODELS. Incomplete and Work in Progress. 1. Introduction

NUCLEAR NORM PENALIZED ESTIMATION OF INTERACTIVE FIXED EFFECT MODELS. Incomplete and Work in Progress. 1. Introduction NUCLEAR NORM PENALIZED ESTIMATION OF IERACTIVE FIXED EFFECT MODELS HYUNGSIK ROGER MOON AND MARTIN WEIDNER Incomplete and Work in Progress. Introduction Interactive fixed effects panel regression models

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

Regression #4: Properties of OLS Estimator (Part 2)

Regression #4: Properties of OLS Estimator (Part 2) Regression #4: Properties of OLS Estimator (Part 2) Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #4 1 / 24 Introduction In this lecture, we continue investigating properties associated

More information