Questions and Answers on Maximum Likelihood

Similar documents
Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn

Chapter 6 Principles of Data Reduction

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

Topic 9: Sampling Distributions of Estimators

1.010 Uncertainty in Engineering Fall 2008

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Topic 9: Sampling Distributions of Estimators

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

Topic 9: Sampling Distributions of Estimators

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Simulation. Two Rule For Inverting A Distribution Function

32 estimating the cumulative distribution function

Lecture 19: Convergence

Statistical Theory MT 2008 Problems 1: Solution sketches

Lecture 7: Properties of Random Samples

Kurskod: TAMS11 Provkod: TENB 21 March 2015, 14:00-18:00. English Version (no Swedish Version)

MATH 472 / SPRING 2013 ASSIGNMENT 2: DUE FEBRUARY 4 FINALIZED

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Statistical Theory MT 2009 Problems 1: Solution sketches

Estimation for Complete Data

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Expectation and Variance of a random variable

Exponential Families and Bayesian Inference

Lecture 33: Bootstrap

of the matrix is =-85, so it is not positive definite. Thus, the first

NANYANG TECHNOLOGICAL UNIVERSITY SYLLABUS FOR ENTRANCE EXAMINATION FOR INTERNATIONAL STUDENTS AO-LEVEL MATHEMATICS

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Lecture 11 and 12: Basic estimation theory

AMS570 Lecture Notes #2

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Unbiased Estimation. February 7-12, 2008

PRACTICE PROBLEMS FOR THE FINAL

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Lecture 23: Minimal sufficiency

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

IE 230 Seat # Name < KEY > Please read these directions. Closed book and notes. 60 minutes.

Math 10A final exam, December 16, 2016

TAMS24: Notations and Formulas

Random Variables, Sampling and Estimation

Department of Mathematics

PRACTICE PROBLEMS FOR THE FINAL

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

x = Pr ( X (n) βx ) =

Properties and Hypothesis Testing

AAEC/ECON 5126 FINAL EXAM: SOLUTIONS

AMS 216 Stochastic Differential Equations Lecture 02 Copyright by Hongyun Wang, UCSC ( ( )) 2 = E X 2 ( ( )) 2

Stat 400: Georgios Fellouris Homework 5 Due: Friday 24 th, 2017

Stat 421-SP2012 Interval Estimation Section

Lecture 3. Properties of Summary Statistics: Sampling Distribution

7.1 Convergence of sequences of random variables

Solutions: Homework 3

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Introductory statistics

Asymptotic Results for the Linear Regression Model

Parameter, Statistic and Random Samples

7.1 Convergence of sequences of random variables

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Maximum Likelihood Estimation

Properties of Point Estimators and Methods of Estimation

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

Mathematical Statistics - MS

This section is optional.

Maximum Likelihood Estimation and Complexity Regularization

Lecture 12: September 27

Lecture 2: Monte Carlo Simulation

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

English Version P (1 X < 1.5) P (X 1) = c[x3 /3 + x 2 ] 1.5. = c c[x 3 /3 + x 2 ] 2 1

Logit regression Logit regression

Probability and statistics: basic terms

STAT 515 fa 2016 Lec Sampling distribution of the mean, part 2 (central limit theorem)

Homework for 2/3. 1. Determine the values of the following quantities: a. t 0.1,15 b. t 0.05,15 c. t 0.1,25 d. t 0.05,40 e. t 0.

Sample questions. 8. Let X denote a continuous random variable with probability density function f(x) = 4x 3 /15 for

Lecture Chapter 6: Convergence of Random Sequences

Statistical Inference Based on Extremum Estimators

6.041/6.431 Spring 2009 Final Exam Thursday, May 21, 1:30-4:30 PM.

f(x i ; ) L(x; p) = i=1 To estimate the value of that maximizes L or equivalently ln L we will set =0, for i =1, 2,...,m p x i (1 p) 1 x i i=1

Convergence of random variables. (telegram style notes) P.J.C. Spreij

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2

Last Lecture. Wald Test

5. Likelihood Ratio Tests

Machine Learning Brett Bernstein

Bayesian Methods: Introduction to Multi-parameter Models

3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense,

Stat410 Probability and Statistics II (F16)

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Elements of Statistical Methods Lots of Data or Large Samples (Ch 8)

Limit Theorems. Convergence in Probability. Let X be the number of heads observed in n tosses. Then, E[X] = np and Var[X] = np(1-p).

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 901 Lecture 13: Maximum Likelihood Estimation

Lecture 9: September 19

BHW #13 1/ Cooper. ENGR 323 Probabilistic Analysis Beautiful Homework # 13

Transcription:

Questios ad Aswers o Maximum Likelihood L. Magee Fall, 2008 1. Give: a observatio-specific log likelihood fuctio l i (θ) = l f(y i x i, θ) the log likelihood fuctio l(θ y, X) = l i(θ) a data set (x i, y i ), i = 1,..., a value for the maximum likelihood estimator ˆθ of the parameter vector θ briefly describe how you would compute (a) the egative Hessia estimator of the variace of ˆθ (b) the outer product of gradiet (OPG) estimator of the variace of ˆθ (c) a misspecificatio-cosistet variace estimator that follows from iterpretig the ML estimator as a method of momets estimator 2. The radom variable y has a probability desity fuctio f(y) = (1 θ) + 2θy for 0 < y < 1 = 0 otherwise for 1 < θ < 1. distributio. There are observatios y i, i = 1,...,, draw idepedetly from this (a) (i) Write the cumulative distributio fuctio of y. (ii) Derive the expected value of y. (iii) Suggest a method of momets estimator for θ based o the sample mea ȳ. (b) (i) Write the log likelihood fuctio for θ. (ii) Write the first-order coditio for the ML estimator of θ. 3. y 1,..., y are idepedet draws from a expoetial distributio. The probability desity fuctio of each y i is f(y i θ) = θ 1 exp( y i /θ), where y i > 0 ad θ > 0. The expoetial distributio has the property E(y i ) = θ. (a) Derive (i) the observatio-specific log likelihood fuctio l i (θ) (ii) the log likelihood fuctio l(θ) (iii) the maximum likelihood (ML) estimator of θ, ˆθ. 1

(b) Derive the followig estimators of the variace of ˆθ, showig their geeral formulas as part of your aswer. (i) the egative Hessia variace estimator (ii) the Iformatio matrix variace estimator (iii) the outer product of gradiet (OPG) variace estimator (iv) the misspecificatio-cosistet variace estimator that follows from iterpretig the ML estimator as a method of momets estimator 4. Give observatios o the scalar x i, i = 1,...,, each y i is idepedetly draw accordig to the coditioal pdf f(y i x i, θ) = (x i θ) 1 exp( y i x i θ ) where y i > 0, x i > 0, ad θ > 0. θ is a ukow scalar parameter. (a) Write the observatio-specific log likelihood fuctio l i (θ) (b) Write log likelihood fuctio l(θ) = i l i(θ) (c) Derive the maximum likelihood (ML) estimator of θ. (d) I this model, E(y i x i, θ) = x i θ. Usig this fact, suggest aother cosistet estimator of θ that is differet from the ML estimator i (c). No explaatio is required. 5. (16 marks: 4 for each part) Let y i, i = 1,..., be idepedetly-observed o-egative itegers draw from a Poisso distributio Prob(y i θ) = θy i e θ, y i = 0, 1, 2,... y i! The Poisso distributio has the property E(y i θ) = θ. (Aside:! is kow as the factorial operator. y i!, or y i factorial, is defied as y i! = 1 2... (y i 1) y i. I the curret questio, this term serves as a ormalizig costat, ad has o effect o the derivatios of the maximum likelihood estimator or its variace estimators, much like the 2π term i the deomiator of the ormal pdf.) (a) Write the observatio-specific log likelihood fuctio l i (θ) (b) Write log likelihood fuctio l(θ) = i l i(θ) (c) Derive ˆθ, the maximum likelihood (ML) estimator of θ. (d) Derive a estimator of the variace of ˆθ usig ay oe of the four stadard methods. 2

Aswers 1. (a) the egative Hessia estimator: ˆVa = ( 2 l ) 1, evaluated at θ = ˆθ (b) the OPG estimator: ˆVb = ( ( l i )( l i ) ) 1, where the l i s are evaluated at θ = ˆθ (c) misspecificatio-cosistet estimator: give the defiitios i (a) ad (b), it ca be writte as ˆV c = ˆV 1 a ˆV b ˆV a, or ˆV c = ( 2 l ) 1 ( ( l i )( l i ) )( 2 l ) 1 2. (a) (i) The probability desity fuctio f(y) = 0 whe y < 0 ad f(y) = 0 whe y > 1. Therefore whe y < 0, the cdf is F (y) = y f(s)ds = 0, ad whe y > 1, F (y) = 1. Whe 0 < y < 1, F (y) = y (ii) E(y) = 1 0 yf(y)dy = 0 ((1 θ) + 2θs)ds = ((1 θ)s + θs 2 ) y s=0 = (1 θ)y + θy 2, 0 < y < 1 1 0 y((1 θ) + 2θy)dy = ((1/2)(1 θ)y 2 + (2/3)θy 3 ) 1 y=0 = (1/2)(1 θ) + (2/3)θ = (1/2) + (1/6)θ (iii) From (ii), E(y) = (1/2) + (1/6)θ, which gives a populatio momet coditio E(y ((1/2) + (1/6)θ)) = 0 The sample momet coditio is 1 (y i ((1/2) + (1/6)ˆθ)) = 0 which ca be writte as ȳ (1/2) (1/6)ˆθ = 0, ad the estimator is ˆθ = 6ȳ 3 (b) (i) l(θ) = l(1 θ + 2θy i). (ii) There is o closed-form solutio. The first-order coditio is l(θ)/ = 0 2y i 1 1 θ + 2θy i = 0 at θ = ˆθ 3

3. (a) (i) l i (θ) = l(f(y i θ) = l(θ) y i /θ (ii) l(θ) = l i(θ) = l(θ) y i/θ (iii) ˆθ is the value of θ that solves l/ = 0. Therefore l/ = θ + y i θ 2 ˆθ = y i ˆθ ˆθ = y i 2 = ȳ (b) (i) 2 l/ 2 = θ 2 2 y i θ 3 Evaluatig this at ˆθ = ȳ ad subbig out y i = ȳ gives 2 l(ˆθ)/ 2 = ṋ θ 2 2(ȳ) ˆθ 3 = ˆθ 2 The egative Hessia variace estimator is ˆV 1 = ( 2 l(ˆθ)/ 2 ) 1 = ˆθ 2 (ii) The Iformatio matrix is mius oe times the expected value of the secod derivative matrix derived i part (i). The expoetial desity assumptio implies E(y i ) = θ, so E 2 l/ 2 = 2 θ θ 3 θ 2 = 2θ θ 3 θ 2 = θ 2 The Iformatio matrix variace estimator is the iverse of the Iformatio matrix, evaluated at ˆθ ˆV 2 = ( ṋ θ 2 ) 1 = ˆθ 2 (iii) Evaluate the gradiet, or first derivative vector of l i, at ˆθ : l i (ˆθ)/ = 1ˆθ + y i ˆθ = y i ˆθ 2 ˆθ 2 = y i ȳ ˆθ 2 For otatioal coveiece, use ˆσ 2 = 1 (y i ȳ) 2, eve though there is o σ 2 4

parameter i the model. The the OPG is ( l i(ˆθ) )( l i(ˆθ) ) = ( l i(ˆθ) )2 = (y i ȳ) 2 ˆθ 4 = ˆσ2 ˆθ 4 (iv) The outer product of gradiet (OPG) variace estimator is the iverse of this OPG (Aside: ˆV 3 = ˆθ 4 ˆσ 2 ˆV3 has the odd feature that ˆσ 2 appears i the deomiator rather tha the umerator. But it turs out that for the expoetial distributio, Var(y i ) = θ 2. Sice plim(ˆσ 2 ) =Var(y i ), the as, ˆσ 2 ad ˆθ 2 both coverge to θ 2. So as, the ˆV 3 becomes close to θ4 = θ2 θ 2, the same as ˆV 1 ad ˆV 2. This equivalece depeds o the assumptio that y i has a expoetial distributio.) ˆV 4 = ( 2ˆl 2 ) 1 ( = ˆV 1 ( ˆV 3 ) 1 ˆV1 = ( ˆθ 2 )(ˆσ2 ˆθ 2 )( ˆθ 4 ) = ˆσ2 ( l i(ˆθ) 4. (a) l i (θ) = l(x i θ) y i x i θ (b) l(θ) = l i(θ) = i l(x iθ) i ( y i x i θ ) (c) l(θ)/ = ( l(x i θ)/) = θ + ( y i ( 1 x i θ 2 ) = 0 whe ˆθ + i )( l i(ˆθ) ) )( 2ˆl 2 ) 1 ( ( y i x i θ )/) ( y i x i ) = 0 ˆθ = 1 i (d) Sice E(y i x i, θ) = x i θ, the Em(y i, x i, θ) = 0 where m = y i x i θ. This populatio momet ( y i x i ) coditio leads to the sample momet coditio 1 i (y i x i ˆθ) = 0. Solvig for ˆθ gives ˆθ = i y i/ i x i = ȳ/ x. (Aother choice of momet coditio is Ex i (y i x i θ) = 0, which leads to OLS, ˆθ = i x iy i / i x2 i.) 5. (a) l i (θ) = l(prob(y i θ)) = y i l θ θ l(y i!) (b) l(θ) = i l i(θ) = i y i l θ θ i l(y i!) 5

(c) ˆθ is the value of θ satisfyig the first-order coditio l/ = 0. l/ = i y i θ = 0 at ˆθ = i y i (d) The egative Hessia variace estimator is V (ˆθ) = 2 l 2 = i y i V (ˆθ) = = ȳ ( ) 1 2 l 2 evaluated at θ = ˆθ, ad θ 2, therefore i y ) 1 i ) = ˆθ 2 ˆθ 2 ( ( i y i = ˆθ 2 ˆθ = ˆθ 6