Likelihoods. P (Y = y) = f(y). For example, suppose Y has a geometric distribution on 1, 2,... with parameter p. Then the pmf is

Similar documents
Mathematical statistics

2.6.3 Generalized likelihood ratio tests

HT Introduction. P(X i = x i ) = e λ λ x i

Math 152. Rumbos Fall Solutions to Assignment #12

IEOR 165 Lecture 13 Maximum Likelihood Estimation

Generalized Linear Models Introduction

Loglikelihood and Confidence Intervals

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

F & B Approaches to a simple model

Answer Key for STAT 200B HW No. 7

Notes on the Multivariate Normal and Related Topics

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Chapter 8.8.1: A factorization theorem

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Statistics 3858 : Maximum Likelihood Estimators

Hypothesis Testing: The Generalized Likelihood Ratio Test

MAS3301 Bayesian Statistics

Statistics & Data Sciences: First Year Prelim Exam May 2018

Mathematical statistics

Math 181B Homework 1 Solution

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Hypothesis Testing. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA

8. Hypothesis Testing

MLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22

STAT 730 Chapter 4: Estimation

2.3 Methods of Estimation

Review. December 4 th, Review

Estimation MLE-Pandemic data MLE-Financial crisis data Evaluating estimators. Estimation. September 24, STAT 151 Class 6 Slide 1

Mathematical statistics

Likelihood inference in the presence of nuisance parameters

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Review of Discrete Probability (contd.)

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

SDS 321: Practice questions

Masters Comprehensive Examination Department of Statistics, University of Florida

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

ST417 Introduction to Bayesian Modelling. Conjugate Modelling (Poisson-Gamma)

Stat 5102 Lecture Slides Deck 3. Charles J. Geyer School of Statistics University of Minnesota

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

Lecture 3 September 1

Exercises and Answers to Chapter 1

Chapter 2. Discrete Distributions

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Math 70 Homework 8 Michael Downs. and, given n iid observations, has log-likelihood function: k i n (2) = 1 n

GOV 2001/ 1002/ E-2001 Section 3 Theories of Inference

Master s Written Examination

ECE531 Lecture 10b: Maximum Likelihood Estimation

1. Fisher Information

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

Statistics and Econometrics I

Likelihood Ratio tests

Lecture 1 October 9, 2013

Example: An experiment can either result in success or failure with probability θ and (1 θ) respectively. The experiment is performed independently

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

1 General problem. 2 Terminalogy. Estimation. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ).

Summary of Chapters 7-9

Estimation, Inference, and Hypothesis Testing

MATH5745 Multivariate Methods Lecture 07

Generalized Linear Models

SOLUTION FOR HOMEWORK 6, STAT 6331

Multivariate Analysis and Likelihood Inference

Theory of Inference: Homework 4

Exam 2. Jeremy Morris. March 23, 2006

The Expectation-Maximization Algorithm

A Very Brief Summary of Statistical Inference, and Examples

ECE 275A Homework 7 Solutions

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Horse Kick Example: Chi-Squared Test for Goodness of Fit with Unknown Parameters

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER.

Answer Key for STAT 200B HW No. 8

Computer Intensive Methods in Mathematical Statistics

Statistical Inference

Stat 5102 Final Exam May 14, 2015

Math 494: Mathematical Statistics

Chapter 3: Maximum Likelihood Theory

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Maximum Likelihood Estimation

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

SOLUTION FOR HOMEWORK 8, STAT 4372

Topic 12 Overview of Estimation

Generalized Linear Models. Kurt Hornik

DA Freedman Notes on the MLE Fall 2003

Stat 579: Generalized Linear Models and Extensions

Link lecture - Lagrange Multipliers

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Nuisance parameters and their treatment


Contents 1. Contents

Gov 2001: Section 4. February 20, Gov 2001: Section 4 February 20, / 39

STAT 514 Solutions to Assignment #6

David Giles Bayesian Econometrics

POLI 8501 Introduction to Maximum Likelihood Estimation

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn

Analysis of Extra Zero Counts using Zero-inflated Poisson Models

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

Transcription:

Likelihoods The distribution of a random variable Y with a discrete sample space (e.g. a finite sample space or the integers) can be characterized by its probability mass function (pmf): P (Y = y) = f(y). For example, suppose Y has a geometric distribution on 1, 2,... with parameter p. Then the pmf is P (Y = y) = (1 p) y 1 p. You can check that y=1 P (Y = y) = 1 using properties of geometric series. 1

The distribution of a random variable Y with a continuous sample space (e.g. (0, 1) or (, )) can be characterized by its probability density function (pdf) f(y), which tells use the probability that Y lies in an interval (a, b): P (a Y b) = b a f(y)dy. For example, if Y is standard normal, then the density is f(y) = 1 2π exp ( (y µ) 2 /σ 2), where µ and σ 2 are parameters. 2

Suppose we observe independent data Y 1,..., Y n, where the probability density or mass function of Y j is f j. The joint density of Y 1,..., Y n is n f j (Y j ). i=1 Suppose the f j are identical functions, and include a parameters θ, so we can write f j (Y ) = f(y ; θ). Taking the logarithm of the joint density yields the log-likelihood function log f(y j ; θ). i 3

The log-likelihood function log f(y j ; θ). i is a function of θ, since we observe the Y j and can substitute their numerical values into the expression above. For example, suppose the Y j follow exponential distributions f j (y, λ) = λ 1 exp( y/λ). If we want to estimate λ from the data, one principle is to maximize the log-likelihood function. This estimate ˆλ is called the maximum likelihood estimate (MLE). 4

As an example, consider the exponential distribution. The log-likelihood is L(λ) = n log λ j Y j /λ = n log λ nȳ /λ To maximize this as function of λ, we calculate its derivative and solve L (λ) = 0 yielding the MLE L (λ) = n/λ + nȳ /λ 2 ˆλ = Y j /n = Ȳ. 5

To be rigorous, we should check that this is a local maximum (L (ˆλ) < 0) rather than a local minimum (L (ˆλ) > 0). L (λ) = nλ 2 2nȲ /λ 3. L (ˆλ) = n/ȳ 2 2n/Ȳ 3 = n/ȳ 2 0. The inequality is strict as long as not all Y j are zero. 6

The curvature of L(θ) around ˆθ tells us how precisely we are able to estimate θ using ˆθ. This program simulates exponential data and plots the likelihood function L(λ). The λ region such that L(λ) is within one unit of the peak value are shown in red. ## Sample sizes. N = c(5,10,20,40,80) ## A grid of lambda values. G = seq(0.1, 5, by=0.05) for (k in 1:length(N)) { ## The sample size. n = N[k] ## Replications. for (j in 1:10) { X = rexp(n) 7

} } L = -n*log(g) - sum(x)/g plot(g, L, t= l, xlab= Lambda, ylab= log-likelihood, main= ) title(main=sprintf( Sample size %d, n)) ii = which(l >= (max(l)-1)) lines(g[ii], L[ii], col= red ) scan()

The curvature in L ( θ) is measured by the Fisher information L (θ). As a general principal, the sampling variance of the MLE ˆθ is approximately the negative inverse of the Fisher information: 1/L (ˆθ) For the exponential example, we would get varˆλ Ȳ 2 /n. Since the mean of the exponential distribution is λ and its variance is λ 2, we expect Ȳ 2 ˆσ 2 Y, so this will give similar results to the usual varȳ = σ2 /n formula. 8

As another example, suppose we observe data following a logistic distribution with an unknown mean µ. The density is exp( (y µ)) (1 + exp( (y µ))) 2. The log-likelihood from an independent sample Y 1,..., Y n is L(µ) = Y + nµ 2 i log (1 + exp( (Y i µ))), where Y = i Y i. 9

The derivative of the log-likelihood is L (µ) = n 2 i exp( (Y i µ)) 1 + exp( (Y i µ)) 10

In the logistic example, it will not be possible to solve L (µ) = 0 symbolically. We can still use the computer to find the MLE numerically. One approach is Newton s method. Begin with a reasonable estimate of µ, say µ 0 = Ȳ. Then construct a linear approximation to L (µ) at µ 0 : L (µ) L (µ 0 ) + (µ µ 0 )L (µ 0 ). This linear expression can be equated to zero and solved for µ, yielding µ 1 µ 0 L (µ 0 )/L (µ 0 ). Then µ 1 can be used to produce µ 2, and so on. This process converges rapidly to a point that is usually a local maximum of L. 11

In the logistic example L (µ) = 2 i exp( (Y i µ)) (1 + exp( (Y i µ))) 2 The following program simulates samples from a logistic distribution, and uses Newton s method to calculate the MLE. The approximate standard errors 1/L (ˆµ) are also stored, and the proportion of the time that the true value is within two standard errors of the estimate (which should be 0.95) is recorded. 12

## The population mean. mu = 1 ## The sample size. n = 10 ## The number of simulation replications. nrep = 1e3 Q = array(0, c(nrep,3)) for (k in 1:1e3) { ## Generate the logistic data. U = runif(n) Y = mu + log(u/(1-u)) ## Starting value. m = mean(y) 13

## Newton s method. while (TRUE) { V = exp(-(y-m)) ## The first derivative of the log-likelihood. D1 = n - 2*sum(V/(1+V)) ## Check for convergence. if (abs(d1) < 1e-10) { break } ## The second derivative of the log-likelihood. D2 = -2*sum(V/(1+V)^2) } ## Newton s step. m = m - D1/D2 ## The approximate standard error. SE = sqrt(-1/d2)

} Q[k,] = c(m, SE, 1*(abs(m-mu) < 2*SE))