ML and REML Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

Similar documents
20. REML Estimation of Variance Components. Copyright c 2018 (Iowa State University) 20. Statistics / 36

Likelihood Ratio Test of a General Linear Hypothesis. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 42

Scheffé s Method. opyright c 2012 Dan Nettleton (Iowa State University) Statistics / 37

General Linear Test of a General Linear Hypothesis. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 35

Constraints on Solutions to the Normal Equations. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 41

Distributions of Quadratic Forms. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 31

Estimating Estimable Functions of β. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 17

Estimation of the Response Mean. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 27

Estimable Functions and Their Least Squares Estimators. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 51

The Gauss-Markov Model. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 61

The Aitken Model. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 41

ANOVA Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 32

Linear Mixed-Effects Models. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 34

Preliminaries. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics / 38

Miscellaneous Results, Solving Equations, and Generalized Inverses. opyright c 2012 Dan Nettleton (Iowa State University) Statistics / 51

ANOVA Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 32

Xβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X =

3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No.

2. A Review of Some Key Linear Models Results. Copyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

The Multivariate Normal Distribution. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 36

11. Linear Mixed-Effects Models. Copyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

3. The F Test for Comparing Reduced vs. Full Models. opyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

The Multivariate Normal Distribution. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 36

Preliminary Linear Algebra 1. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 100

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

Stat 579: Generalized Linear Models and Extensions

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Determinants. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 25

Lecture 15. Hypothesis testing in the linear model

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

When is the OLSE the BLUE? Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 40

BIOS 2083 Linear Models c Abdus S. Wahed

14 Multiple Linear Regression

Regression Estimation Least Squares and Maximum Likelihood

THE ANOVA APPROACH TO THE ANALYSIS OF LINEAR MIXED EFFECTS MODELS

Generalized Linear Models

Chapter 8: Hypothesis Testing Lecture 9: Likelihood ratio tests

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

STAT 100C: Linear models

Chapter 3 Best Linear Unbiased Estimation

General Linear Model: Statistical Inference

Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = The errors are uncorrelated with common variance:

STAT 100C: Linear models

Maximum Likelihood Estimation

The matrix will only be consistent if the last entry of row three is 0, meaning 2b 3 + b 2 b 1 = 0.

3 Multiple Linear Regression

2.7 Estimation with linear Restriction

Lecture 1: Linear Models and Applications

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

11 Hypothesis Testing

Linear Models and Estimation by Least Squares

Quick Review on Linear Multiple Regression

[y i α βx i ] 2 (2) Q = i=1

Generalized Linear Models Introduction

The outline for Unit 3

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Mixed-Models. version 30 October 2011

F & B Approaches to a simple model

STA 2201/442 Assignment 2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

Statistics 135 Fall 2008 Final Exam

Point Estimation. Maximum likelihood estimation for a binomial distribution. CSE 446: Machine Learning

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

Applied Matrix Algebra Lecture Notes Section 2.2. Gerald Höhn Department of Mathematics, Kansas State University

21. Best Linear Unbiased Prediction (BLUP) of Random Effects in the Normal Linear Mixed Effects Model

Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Graduate Econometrics I: Unbiased Estimation

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

Master s Written Examination

Some General Types of Tests

MLES & Multivariate Normal Theory

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)

Chapter 12 REML and ML Estimation

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Some properties of Likelihood Ratio Tests in Linear Mixed Models

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Hypothesis Testing. A rule for making the required choice can be described in two ways: called the rejection or critical region of the test.

Linear Algebra Review

Statistical Inference

Generalized Linear Mixed-Effects Models. Copyright c 2015 Dan Nettleton (Iowa State University) Statistics / 58

Lecture 6: Linear models and Gauss-Markov theorem

Comparing two independent samples

Chapter 3: Maximum Likelihood Theory

The General Linear Model. Monday, Lecture 2 Jeanette Mumford University of Wisconsin - Madison

Maximum Likelihood Estimation

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Sensitivity of GLS estimators in random effects models

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

MLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Properties of Matrices and Operations on Matrices

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

Economics 582 Random Effects Estimation

Chapter 1. Linear Regression with One Predictor Variable

Transcription:

ML and REML Variance Component Estimation Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 58

Suppose y = Xβ + ε, where ε N(0, Σ) for some positive definite, symmetric matrix Σ. Furthermore, suppose each element of Σ is a known function of an unknown vector of parameters θ Θ. opyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 2 / 58

For example, consider Σ = m σj 2 Z j Z j + σei, 2 j=1 where Z 1,..., Z m are known matrices, θ = [θ 1,..., θ m, θ m+1 ] = [σ 2 1,..., σ 2 m, σ 2 e], and Θ = {θ : θ j > 0, j = 1,..., m + 1}. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 3 / 58

The likelihood function is L(β, θ y) = (2π) n/2 Σ 1/2 e 1 2 (y Xβ) Σ 1 (y Xβ) and the log likelihood function is l(β, θ y) = n 2 log(2π) 1 2 log Σ 1 2 (y Xβ) Σ 1 (y Xβ). Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 4 / 58

Based on previous results, (y Xβ) Σ 1 (y Xβ) is minimized over β R p, for any fixed θ, by ˆβ(θ) (X Σ 1 X) X Σ 1 y. opyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 5 / 58

Thus, the profile log likelihood for θ is l (θ y) = sup β R p l(β, θ y) = n 2 log(2π) 1 2 log Σ 1 2 (y Xˆβ(θ)) Σ 1 (y Xˆβ(θ)). Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 6 / 58

In the general case, numerical methods are used to find the maximizer of l (θ y) over θ Θ. Let ˆθ MLE = arg max{l (θ y) : θ Θ}. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 7 / 58

Let ˆΣ MLE be the matrix Σ with ˆθ MLE in place of θ. The MLE of an estimable Cβ is then given by Cˆβ MLE = C(X ˆΣ 1 MLE X) X ˆΣ 1 MLE y. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 8 / 58

We can write Σ as σ 2 V, where σ 2 > 0, V is PD and symmetric, σ 2 is known function of θ, and each entry of V is a known function of θ, e.g., Σ = m σj 2 Z j Z j + σei 2 j=1 = σ 2 e [ m σ 2 j σ 2 j=1 e ] Z j Z j + I opyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 9 / 58

If we let ˆσ 2 MLE and ˆV MLE denote σ 2 and V with ˆθ MLE in place of θ, then It follows that ˆΣ MLE = ˆσ MLE 2 ˆV MLE and ˆΣ 1 MLE = 1 ˆV 1 ˆσ MLE 2 MLE. Cˆβ MLE = C(X ˆΣ 1 MLE X) X ˆΣ 1 MLE ( ) y = C X 1 ˆV 1 ˆσ MLE 2 MLE X X 1 ˆV 1 ˆσ MLE 2 MLE y = C(X ˆV 1 MLE X) X ˆV 1 MLE y. opyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 10 / 58

Note that Cˆβ MLE = C(X ˆV 1 MLE X) X ˆV 1 MLE y is the GLS estimator under the Aitken model with ˆV MLE in place of V. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 11 / 58

We have seen by example that the MLE of the variance component vector θ can be biased. For example, for the case of Σ = σ 2 I, where θ = [σ 2 ], the MLE of σ 2 is (y Xˆβ) (y Xˆβ) n with expectation n r n σ2. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 12 / 58

This MLE for σ 2 is often criticized for failing to account for the loss of degrees of freedom needed to estimate β. [ ] (y Xˆβ) (y Xˆβ) E = n r n n σ2 < σ 2 [ (y Xβ) ] (y Xβ) = E n Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 13 / 58

A familiar special case: Suppose y 1,..., y n i.i.d. N(µ, σ 2 ). Then [ n i=1 E (y i µ) 2 ] = σ 2, but n [ n i=1 E (y i ȳ) 2 ] = n 1 n n σ2. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 14 / 58

REML is an approach that produces unbiased estimators for these special cases and produces less biased estimators than ML estimators in general. Depending on whom you ask, REML stands for REsidual Maximum Likelihood or REstricted Maximum Likelihood. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 15 / 58

The REML method: Find n rank(x) = n r linearly independent vectors a 1,..., a n r such that a i X = 0 for all i = 1,..., n r. Find the maximum likelihood estimate of θ using w 1 a 1 y,..., w n r a n ry as data. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 16 / 58

A = [a 1,..., a n r ] w = w 1. w n r = a 1 y. a n ry = A y. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 17 / 58

If a X = 0, then a y is known as an error contrast. Thus, w 1,..., w n r comprise a set of n r error contrasts. Because (I P X )X = X P X X = X X = 0, the elements of (I P X )y = y P X y = y ŷ are each error contrasts. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 18 / 58

Because rank(i P X ) = n rank(x) = n r, there exists a set of n r linearly independent rows of I P X that can be used in step 1 of the REML method to get a 1,..., a n r. If we do use a subset of rows of I P X to get a 1,..., a n r, the error contrasts w 1 = a 1 y,..., w n r = a n ry will be a subset of the elements of (I P X )y = y ŷ, the residual vector. This is why it makes sense to call the procedure Residual Maximum Likelihood. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 19 / 58

Note that w = A y = A (Xβ + ε) = A Xβ + A ε = 0 + A ε = A ε. Thus, w = A ε N(A 0, A ΣA) d = N(0, A ΣA), and the distribution of w depends on θ but not β. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 20 / 58

The log likelihood function in this case is l w (θ w) = n r 2 log(2π) 1 2 log A ΣA 1 2 w (A ΣA) 1 w. An MLE for θ, say ˆθ, can be found in the general case using numerical methods to obtain the REML estimate of θ. Let ˆθ = arg max{l w (θ y) : θ Θ}. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 21 / 58

Once a REML estimate of θ (and thus Σ) has been obtained, the BLUE of an estimable Cβ if Σ were unknown can be approximated by Cˆβ REML = C(X ˆΣ 1 X) X ˆΣ 1 y, where ˆΣ is Σ with ˆθ (the REML estimate of θ) in place of θ. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 22 / 58

Suppose A and B are each n (n r) matrices satisfying rank(a) = rank(b) = n r and A X = B X = 0. Let Prove that w = A y and v = B y. ˆθ maximizes l w (θ w) over θ Θ iff ˆθ maximizes l v (θ v) over θ Θ. opyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 23 / 58

Proof: A X = 0 = X A = 0 = each column of A N (X ). dim(n (X )) = n rank(x) = n r. the n r LI columns of A form a basis for N (X ). B X = 0 = X B = 0 = columns of B N (X ). an (n r) (n r) matrix M AM = B. opyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 24 / 58

Note that M, an (n r) (n r) matrix, is nonsingular n r = rank(b) = rank(am) rank(m) n r = rank(m) = n r. v (B ΣB) 1 v = y B(M A ΣAM) 1 B y = y AMM 1 (A ΣA) 1 (M ) 1 M A y = y A(A ΣA) 1 A y = w (A ΣA) 1 w. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 25 / 58

Also B ΣB = M A ΣAM = M A ΣA M = M 2 A ΣA. opyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 26 / 58

Now note that l v (θ v) = n r log(2π) 1 2 2 log B ΣB 1 2 v (B ΣB) 1 v = n r log(2π) 1 2 2 log( M 2 A ΣA ) 1 2 w (A ΣA) 1 w = 1 2 log( M 2 ) n r log(2π) 1 2 2 log( A ΣA ) = log M + l w (θ w). 1 2 w (A ΣA) 1 w Because M is free of θ, the result follows. opyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 27 / 58

Consider the special case y = Xβ + ε, ε N(0, σ 2 I). Prove that the REML estimator of σ 2 is ˆσ 2 = y (I P X )y. n r Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 28 / 58

REML Theorem: Suppose y = Xβ + ε, where ε N(0, Σ), Σ a PD, symmetric matrix whose entries are known functions of an unknown parameter vector θ Θ. Furthermore, suppose rank(x) = r and X is any n r matrix consisting of any set of r LI columns of X. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 35 / 58

REML Theorem: Let A be any n (n r) matrix satisfying rank(a) = n r and A X = 0. Let w = A y N(0, A ΣA). Then ˆθ maximizes l w (θ w) over θ Θ ˆθ maximizes, over θ Θ, g(θ) = 1 2 log Σ 1 2 log X Σ 1 X 1 2 (y Xˆβ(θ)) Σ 1 (y Xˆβ(θ)), where ˆβ(θ) = (X Σ 1 X) X Σ 1 y. opyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 36 / 58

We have previously shown: 1. an n (n r) matrix Q QQ = I P X, Q X = 0, and Q Q = I, n m and 2. Any n (n r) matrix A of rank n r satisfying A X = 0 leads to the same REML estimator ˆθ. Thus, without loss of generality, we may assume the matrix A in the REML Theorem is Q. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 37 / 58

We will now prove 3 lemmas and then use those lemmas to prove the REML Theorem. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 38 / 58

Lemma R1: Let G = Σ 1 X( X Σ 1 X) 1 and suppose A is and n (n r) matrix satisfying A A = m m I and AA = I P X. Then [A, G] 2 = X X 1. opyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 39 / 58

Proof of Lemma R1: [A, G] 2 = [A, G] [A, G] = [A, G] [A, G] A A A G = G A G G = I G A = I G G G AA G A G G G (by our result on the determinant of a partitioned matrix) Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 40 / 58

= G G G (I P X )G = G P X G = G P X G = ( X Σ X) 1 X Σ 1 X( X X) 1 X Σ 1 X( X Σ 1 X) 1 = [( X Σ X) 1 ( X Σ 1 X)]( X X) 1 [( X Σ 1 X)( X Σ 1 X) 1 ] = ( X X) 1 = X X 1. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 41 / 58

Lemma R2: A ΣA = X Σ 1 X Σ X X 1, where A, X, and Σ are as previously defined. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 42 / 58

Proof of Lemma R2: Again define G = Σ 1 X( X Σ 1 X) 1. Show that (1) A ΣG = 0, and (2) G ΣG = ( X Σ 1 X) 1. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 43 / 58

Next show that [A, G] 2 Σ = A ΣA X Σ 1 X 1. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 45 / 58

Now use Lemma R1 to complete the proof of Lemma R2. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 47 / 58

Lemma R3: A(A ΣA) 1 A = Σ 1 Σ 1 X( X Σ 1 X) 1 X Σ 1, where A, Σ, and X are as defined previously. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 49 / 58

Proof of Lemma R3: [A, G] 1 Σ 1 ([A, G] ) 1 = ([A, G] Σ[A, G]) 1 [ ] 1 [ A ΣA A ΣG A ΣA 0 = = G ΣA G ΣG 0 G ΣG [ ] 1 A ΣA 0 = 0 ( X Σ 1 X) 1 [ ] (A ΣA) 1 0 =. 0 X Σ 1 X ] 1 opyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 50 / 58

Now multiplying on the left by [A, G] and on the right by [A, G] yields [ ] [ ] (A Σ 1 ΣA) 1 0 A = [A, G] 0 X Σ 1 X G = A(A ΣA) 1 A + G X Σ 1 XG. (3) opyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 51 / 58

Now G X Σ 1 XG = Σ 1 X( X Σ 1 X) 1 X Σ 1 X( X Σ 1 X) 1 X Σ 1 = Σ 1 X( X Σ 1 X) 1 X Σ 1. (4) Combining (3) and (4) yields A(A ΣA) 1 A = Σ 1 Σ 1 X( X Σ 1 X) 1 X Σ 1. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 52 / 58

Now use Lemmas R2 and R3 to prove the REML Theorem. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 53 / 58

l w (θ w) = n r log(2π) 1 2 2 log A ΣA 1 2 w (A ΣA) 1 w = constant 1 1 2 log( X Σ 1 X Σ X X 1 ) 1 2 y A(A ΣA) 1 A y = constant 2 1 2 log Σ 1 2 log X Σ 1 X 1 2 y (Σ 1 Σ 1 X( X Σ 1 X) 1 X Σ 1 )y opyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 54 / 58

= constant 2 1 2 log Σ 1 2 log X Σ 1 X 1 2 y Σ 1/2 (I Σ 1/2 X( X Σ 1 X) 1 X Σ 1/2 )Σ 1/2 y = constant 2 1 2 log Σ 1 2 log X Σ 1 X 1 2 y Σ 1/2 (I P Σ 1/2 X )Σ 1/2 y. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 55 / 58

Now C(Σ 1/2 X) = C(Σ 1/2 X) P Σ 1/2 X = P Σ 1/2 X y Σ 1/2 (I P Σ 1/2 X )Σ 1/2 y = y Σ 1/2 (I P Σ 1/2 X )Σ 1/2 y = (I P Σ 1/2 X )Σ 1/2 y 2 Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 56 / 58

= Σ 1/2 y Σ 1/2 X(X Σ 1 X) X Σ 1 y 2 = Σ 1/2 y Σ 1/2 Xˆβ(θ) 2 = Σ 1/2 (y Xˆβ(θ)) 2 = (y Xˆβ(θ)) Σ 1/2 Σ 1/2 (y Xˆβ(θ)) = (y Xˆβ(θ)) Σ 1 (y Xˆβ(θ)). Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 57 / 58

we have l w (θ w) = constant 2 1 log Σ 2 1 2 log X Σ 1 X 1 2 (y Xˆβ(θ)) Σ 1 (y Xˆβ(θ)) = constant 2 + g(θ). ˆθ maximizes l w (θ w) over θ Θ ˆθ maximizes g(θ) over θ Θ. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 58 / 58