MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

Similar documents
Model comparison and selection

Introduction to Estimation Methods for Time Series models Lecture 2

simple if it completely specifies the density of x

Statistical Inference

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Ch. 5 Hypothesis Testing

A Very Brief Summary of Statistical Inference, and Examples

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

Akaike Information Criterion

Generalized Linear Models

Likelihood-based inference with missing data under missing-at-random

An Extended BIC for Model Selection

F & B Approaches to a simple model

Lecture 3 September 1

ML and REML Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

Practical Econometrics. for. Finance and Economics. (Econometrics 2)

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Summary of Chapters 7-9

7. Estimation and hypothesis testing. Objective. Recommended reading

parameter space Θ, depending only on X, such that Note: it is not θ that is random, but the set C(X).

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Review. December 4 th, Review

Greene, Econometric Analysis (6th ed, 2008)

Topic 19 Extensions on the Likelihood Ratio

STAT 740: Testing & Model Selection

Central Limit Theorem ( 5.3)

Likelihood Ratio tests

DA Freedman Notes on the MLE Fall 2003

STAT 100C: Linear models

Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1

1 Hypothesis Testing and Model Selection

Likelihood-Based Methods

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Introduction Large Sample Testing Composite Hypotheses. Hypothesis Testing. Daniel Schmierer Econ 312. March 30, 2007

Regression Estimation Least Squares and Maximum Likelihood

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

The outline for Unit 3

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

A Very Brief Summary of Statistical Inference, and Examples

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Model Fitting and Model Selection. Model Fitting in Astronomy. Model Selection in Astronomy.

Likelihood inference in the presence of nuisance parameters

Hypothesis Testing. A rule for making the required choice can be described in two ways: called the rejection or critical region of the test.

Model selection in penalized Gaussian graphical models

Elements of statistics (MATH0487-1)

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

20. REML Estimation of Variance Components. Copyright c 2018 (Iowa State University) 20. Statistics / 36

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Statistics & Data Sciences: First Year Prelim Exam May 2018

Inference using structural equations with latent variables

Lecture 6: Model Checking and Selection

Bayesian Asymptotics

Math 494: Mathematical Statistics

Comment about AR spectral estimation Usually an estimate is produced by computing the AR theoretical spectrum at (ˆφ, ˆσ 2 ). With our Monte Carlo

Introductory Econometrics

Statistics 202: Data Mining. c Jonathan Taylor. Model-based clustering Based in part on slides from textbook, slides of Susan Holmes.

Stat 5102 Lecture Slides Deck 3. Charles J. Geyer School of Statistics University of Minnesota

BTRY 4090: Spring 2009 Theory of Statistics

2.6.3 Generalized likelihood ratio tests

Ch 2: Simple Linear Regression

An Information Criteria for Order-restricted Inference

Maximum-Likelihood Estimation: Basic Ideas

2014/2015 Smester II ST5224 Final Exam Solution

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Model Fitting, Bootstrap, & Model Selection

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Review and continuation from last week Properties of MLEs

Math 494: Mathematical Statistics

Likelihood based Statistical Inference. Dottorato in Economia e Finanza Dipartimento di Scienze Economiche Univ. di Verona


Some General Types of Tests

Chapter 7. Hypothesis Testing

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Multivariate Analysis and Likelihood Inference

4.5.1 The use of 2 log Λ when θ is scalar

Maximum Likelihood Estimation

Covariance function estimation in Gaussian process regression

Lecture 15. Hypothesis testing in the linear model

More on nuisance parameters

Statistical Inference

Lecture 32: Asymptotic confidence sets and likelihoods

Loglikelihood and Confidence Intervals

Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants

Inference in non-linear time series

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Answer Key for STAT 200B HW No. 7

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

5.2 Fisher information and the Cramer-Rao bound

Stat 5101 Lecture Notes

Bias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

MS&E 226: Small Data

Composite Likelihood

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Lecture 10: Generalized likelihood ratio test

Transcription:

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30

INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ) is the maximized log likelihood and k is the dimension of the model parameter space. Copyright c 2012 (Iowa State University) Statistics 511 2 / 30

AIC = 2l(ˆθ) + 2k can be used to determine which of multiple models is best for a given data set. Small values of AIC are preferred. The +2k portion of AIC can be viewed as a penalty for model complexity. Copyright c 2012 (Iowa State University) Statistics 511 3 / 30

Schwarz s Bayesian Information Criterion is given by BIC = 2l(ˆθ) + k log(n). BIC is the same as AIC except the penalty for model complexity is greater for BIC (when n 8) and grows with n. Copyright c 2012 (Iowa State University) Statistics 511 4 / 30

AIC and BIC can each be used to compare models even if they are not nested (i.e., even if one is not a special case of the other as in our reduced vs. full model comparison discussed previously). However, if REML likelihoods are used, compared models must have the same model for the response mean. Different models for the mean would yield different error contrasts and therefore different datasets for computation of maximized REML likelihoods. Copyright c 2012 (Iowa State University) Statistics 511 5 / 30

LARGE n THEORY FOR MLEs Suppose θ is a k 1 parameter vector. Let l(θ) denote the log likelihood function. Under regularity conditions discussed in, e.g., Shao, J.(2003) Mathematical Statistics, 2 nd Ed. Springer, New York; we have the following. Copyright c 2012 (Iowa State University) Statistics 511 6 / 30

There is an estimator ˆθ that solves the likelihood equations l(θ) θ = 0 and is a (weakly) consistent estimator of θ, i.e., lim Pr[ ˆθ θ > ε] = 0 for any ε > 0. n For sufficiently large n, where ˆθ N(θ, I 1 (θ)), Copyright c 2012 (Iowa State University) Statistics 511 7 / 30

[( ) ( ) ] l(θ) l(θ) I(θ) = E θ θ [ 2 ] l(θ) = E θ θ [ { 2 }] l(θ) = E. θ i θ j i,j {1,...,k} Copyright c 2012 (Iowa State University) Statistics 511 8 / 30

I(θ) is known as the Fisher Information matrix. I(θ) can be approximated by the observed Fisher Information matrix, which is given by Î(ˆθ) 2 l(θ) θ θ θ=ˆθ. I(θ) and Î(ˆθ) may depend on unknown nuisance parameters. In such cases, nuisance parameters are replaced by consistent estimators. Copyright c 2012 (Iowa State University) Statistics 511 9 / 30

A Simple Example Suppose y 1,..., y n i.i.d. N(µ, σ 2 ). If we are interested in inference for µ, we can take θ = µ and treat σ 2 as a nuisance parameter. It is straightforward to show that ȳ is the unique solution to the likelihood equation. Furthermore, it is straightforward to show that I(θ) = Î(ˆθ) = n/σ 2. Copyright c 2012 (Iowa State University) Statistics 511 10 / 30

A Simple Example (continued) Thus, we have and ˆθ = ȳ N(θ = µ, I 1 (θ) = σ 2 /n) ȳ N(µ, s 2 /n) for sufficiently large n, where s 2 = n i=1 (y i ȳ ) 2. n 1 Copyright c 2012 (Iowa State University) Statistics 511 11 / 30

WALD TESTS AND CONFIDENCE INTERVALS Suppose for large n that ˆΣ 1/2 (ˆθ θ) N(0, I) and (ˆθ θ) ˆΣ 1 (ˆθ θ) χ 2 k. Copyright c 2012 (Iowa State University) Statistics 511 12 / 30

For example, suppose ˆθ is the MLE of θ and Î(ˆθ) is the observed information matrix. Then under regularity conditions, we have and for sufficiently large n. [Î(ˆθ)] 1/2 (ˆθ θ) N(0, I) (ˆθ θ) Î(ˆθ)(ˆθ θ) χ 2 k Copyright c 2012 (Iowa State University) Statistics 511 13 / 30

An approximate 100(1 α)% confidence interval for θ i is ˆθ i ± z 1 α/2 ˆΣ ii, where z 1 α/2 is the 1 α/2 quantile of the N(0, 1) distribution and ˆΣ ii is element (i, i) of ˆΣ. Copyright c 2012 (Iowa State University) Statistics 511 14 / 30

An approximate p-value for testing H 0 : θ = θ 0 is Pr[χ 2 k (ˆθ θ 0 ) ˆΣ 1 (ˆθ θ0 )], where χ 2 k is a central χ2 random variable with k degrees of freedom. Copyright c 2012 (Iowa State University) Statistics 511 15 / 30

MULTIVARIATE DELTA METHOD Suppose g is a function from IR k to IR m, i.e., g 1 (θ) for θ IR k g 2 (θ), g(θ) =. g m (θ) for some functions g 1,..., g m. Suppose g is differentiable with derivative matrix g 1 (θ) g θ 1 m (θ) θ 1 D...... g 1 (θ) θ k g m (θ) θ k Copyright c 2012 (Iowa State University) Statistics 511 16 / 30

Now suppose ˆθ has mean θ and variance Σ. Then Taylor s Theorem implies which implies and g(ˆθ) g(θ) + D (ˆθ θ) E[g(ˆθ)] g(θ) + D E(ˆθ θ) = g(θ) Var[g( ˆθ)] Var[g(θ) + D ( ˆθ θ)] = D ΣD. Copyright c 2012 (Iowa State University) Statistics 511 17 / 30

If ˆθ N(θ, Σ), it follows that g(ˆθ) N(g(θ), D ΣD). In practice, we often need to estimate D by replacing θ in D with ˆθ to obtain ˆD. Similarly, we often need to replace Σ with an estimate ˆΣ. Copyright c 2012 (Iowa State University) Statistics 511 18 / 30

LIKELIHOOD RATIO BASED INFERENCE Suppose we wish to test the null hypothesis that a reduced model provides an adequate fit to a dataset relative to a more general full model that includes the reduced model as a special case. Copyright c 2012 (Iowa State University) Statistics 511 20 / 30

Define Λ as Reduced Model Maximized Likelihood. Full Model Maximized Likelihood Λ is known as the likelihood ratio. 2 log Λ is known as the likelihood ratio test statistic. Tests based on 2 log Λ are called likelihood ratio tests. Copyright c 2012 (Iowa State University) Statistics 511 21 / 30

Under the regularity conditions in Shao (2003) mentioned previously, the likelihood ratio test statistic 2 log Λ is approximately distributed as central χ 2 k f k r under the null hypothesis, where k f and k r are the dimensions of the parameter space under the full and reduced models, respectively. This approximation can be reasonable if n is sufficiently large. Copyright c 2012 (Iowa State University) Statistics 511 22 / 30

Likelihood Ratio Tests and Confidence Regions for a Subvector of the Full Model Parameter Vector θ Suppose θ is k 1 vector and is partitioned into vectors θ[ 1 k 1 ] 1 and θ 2 k 2 1, where k = k 1 + k 2 θ1 and θ =. θ 2 Consider a test of H 0 : θ 1 = θ 10. Copyright c 2012 (Iowa State University) Statistics 511 23 / 30

Suppose ([ ]) ˆθ is the MLE of θ and ˆθ 2 (θ 1 ) maximizes θ1 l over θ 2 for any fixed value of θ 1. θ 2 [ ([ Then 2 l(ˆθ) l θ 10 ˆθ 2 (θ 10 ) ])] is approximately χ 2 k 1 under the null hypothesis by our previous result when n is sufficiently large. Copyright c 2012 (Iowa State University) Statistics 511 24 / 30

Also, Pr { 2 [ ([ l(ˆθ) l θ 1 ˆθ 2 (θ 1 ) ])] } χ 2 k 1,1 α 1 α which implies { ([ ]) θ 1 Pr l l(ˆθ) 1 } ˆθ 2 (θ 1 ) 2 χ2 k 1,1 α 1 α. Copyright c 2012 (Iowa State University) Statistics 511 25 / 30

Thus, the set of values of θ 1 that, when maximizing over θ 2, yield a maximized likelihood within 1 2 χ2 k 1,1 α of the likelihood maximized over all θ, form a 100(1 α)% confidence region for θ 1. Such a confidence region is known as a profile likelihood confidence region because ([ ]) θ 1 l ˆθ 2 (θ 1 ) is the profile log likelihood for θ 1. Copyright c 2012 (Iowa State University) Statistics 511 26 / 30

A Warning The normal and χ 2 approximations mentioned in these notes may be crude if sample sizes are not sufficiently large. The regularity conditions mentioned in these notes do not hold if the true parameter falls on the boundary of the parameter space. Thus, as an example, testing H 0 : σ 2 u = 0 is not covered by the methods presented here. Copyright c 2012 (Iowa State University) Statistics 511 30 / 30