Lecture 32: Asymptotic confidence sets and likelihoods

Similar documents
Lecture 28: Asymptotic confidence sets

Lecture 17: Likelihood ratio and asymptotic tests

Chapter 7. Confidence Sets Lecture 30: Pivotal quantities and confidence sets

Stat 710: Mathematical Statistics Lecture 31

Lecture 26: Likelihood ratio tests

Stat 710: Mathematical Statistics Lecture 27

Stat 710: Mathematical Statistics Lecture 40

Stat 710: Mathematical Statistics Lecture 12

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Chapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic

Lecture 20: Linear model, the LSE, and UMVUE

Multivariate Analysis and Likelihood Inference

simple if it completely specifies the density of x

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Statistics Ph.D. Qualifying Exam

Chapter 6. Hypothesis Tests Lecture 20: UMP tests and Neyman-Pearson lemma

Lecture 16: Sample quantiles and their asymptotic properties

Some General Types of Tests

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

STAT 512 sp 2018 Summary Sheet

ST5215: Advanced Statistical Theory

A Very Brief Summary of Statistical Inference, and Examples

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology

parameter space Θ, depending only on X, such that Note: it is not θ that is random, but the set C(X).

Central Limit Theorem ( 5.3)

Chapter 8: Hypothesis Testing Lecture 9: Likelihood ratio tests

Review and continuation from last week Properties of MLEs

Chapter 9: Interval Estimation and Confidence Sets Lecture 16: Confidence sets and credible sets

Stat 5101 Lecture Notes

Lecture 21. Hypothesis Testing II

Chapter 4: Asymptotic Properties of the MLE (Part 2)

Asymptotic Statistics-VI. Changliang Zou

Stochastic Convergence, Delta Method & Moment Estimators

Lecture 1: Introduction

Mathematical statistics

Ch. 5 Hypothesis Testing

Lecture 23: UMPU tests in exponential families

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

ST5215: Advanced Statistical Theory

Lecture 21: Convergence of transformations and generating a random variable

System Identification, Lecture 4

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

Chapter 4: Asymptotic Properties of the MLE

System Identification, Lecture 4

Lecture 7 Introduction to Statistical Decision Theory

Lecture 17: Minimal sufficiency

1. Point Estimators, Review

Chapter 2: Resampling Maarten Jansen

Composite Hypotheses and Generalized Likelihood Ratio Tests

Testing Algebraic Hypotheses

Math 181B Homework 1 Solution

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

Information in Data. Sufficiency, Ancillarity, Minimality, and Completeness

2014/2015 Smester II ST5224 Final Exam Solution

Master s Written Examination

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Probability and Statistics qualifying exam, May 2015

Statistics 3858 : Maximum Likelihood Estimators

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

ECE 275B Homework # 1 Solutions Version Winter 2015

Economics 583: Econometric Theory I A Primer on Asymptotics: Hypothesis Testing

MAS223 Statistical Inference and Modelling Exercises

Loglikelihood and Confidence Intervals

Empirical Likelihood

McGill University. Faculty of Science. Department of Mathematics and Statistics. Part A Examination. Statistics: Theory Paper

Exercises Chapter 4 Statistical Hypothesis Testing

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

TUTORIAL 8 SOLUTIONS #

Lecture 3: Random variables, distributions, and transformations

ECE 275B Homework # 1 Solutions Winter 2018

The loss function and estimating equations

Lecture 3 September 1

Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments

STAT 135 Lab 2 Confidence Intervals, MLE and the Delta Method

Theoretical Statistics. Lecture 1.

Asymptotic Statistics-III. Changliang Zou

Lecture 8: Information Theory and Statistics

A Few Notes on Fisher Information (WIP)

3.0.1 Multivariate version and tensor product of experiments

Brief Review on Estimation Theory

Introduction to Estimation Methods for Time Series models Lecture 2

Qualifying Exam in Probability and Statistics.

MIT Spring 2015

SOLUTION FOR HOMEWORK 4, STAT 4352

14.30 Introduction to Statistical Methods in Economics Spring 2009

Lecture 3. Inference about multivariate normal distribution

Lecture 2: Basic Concepts of Statistical Decision Theory

STAT 461/561- Assignments, Year 2015

Lecture 6: Linear models and Gauss-Markov theorem

Nonlinear Programming Models

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

7 Influence Functions

One-Sample Numerical Data

Problem Selected Scores

3. (a) (8 points) There is more than one way to correctly express the null hypothesis in matrix form. One way to state the null hypothesis is

Chapter 4. Theory of Tests. 4.1 Introduction

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Transcription:

Lecture 32: Asymptotic confidence sets and likelihoods Asymptotic criterion In some problems, especially in nonparametric problems, it is difficult to find a reasonable confidence set with a given confidence coefficient or confidence level 1 α. A common approach is to find a confidence set whose confidence coefficient or confidence level is nearly 1 α when the sample size n is large. A confidence set C(X) for θ has asymptotic confidence level 1 α if liminf n P(θ C(X)) 1 α for any P P (Definition 2.14). If lim n P(θ C(X)) = 1 α for any P P, then C(X) is a 1 α asymptotically correct confidence set. Note that asymptotic correctness is not the same as having limiting confidence coefficient 1 α (Definition 2.14). UW-Madison (Statistics) Stat 710, Lecture 32 Jan 2018 1 / 12

Asymptotically pivotal quantities A known Borel function of (X,θ), R n (X,θ), is said to be asymptotically pivotal iff the limiting distribution of R n (X,θ) does not depend on P. Like a pivotal quantity in constructing confidence sets ( 7.1.1) with a given confidence coefficient or confidence level, an asymptotically pivotal quantity can be used in constructing asymptotically correct confidence sets. 1/2 Most asymptotically pivotal quantities are of the form V n ( θ n θ), where θ n is an estimator of θ that is asymptotically normal, i.e., V 1/2 n ( θ n θ) d N k (0,I k ), V n is a consistent estimator of the asymptotic covariance matrix V n. The resulting 1 α asymptotically correct confidence sets are C(X) = {θ : V 1/2 n ( θ n θ) 2 χ 2 k,α }, where χk,α 2 is the (1 α)th quantile of the chi-square distribution χ2 k. If θ is real-valued (k = 1), then C(X) is a confidence interval. When k > 1, C(X) is an ellipsoid. UW-Madison (Statistics) Stat 710, Lecture 32 Jan 2018 2 / 12

Asymptotically pivotal quantities A known Borel function of (X,θ), R n (X,θ), is said to be asymptotically pivotal iff the limiting distribution of R n (X,θ) does not depend on P. Like a pivotal quantity in constructing confidence sets ( 7.1.1) with a given confidence coefficient or confidence level, an asymptotically pivotal quantity can be used in constructing asymptotically correct confidence sets. 1/2 Most asymptotically pivotal quantities are of the form V n ( θ n θ), where θ n is an estimator of θ that is asymptotically normal, i.e., V 1/2 n ( θ n θ) d N k (0,I k ), V n is a consistent estimator of the asymptotic covariance matrix V n. The resulting 1 α asymptotically correct confidence sets are C(X) = {θ : V 1/2 n ( θ n θ) 2 χ 2 k,α }, where χk,α 2 is the (1 α)th quantile of the chi-square distribution χ2 k. If θ is real-valued (k = 1), then C(X) is a confidence interval. When k > 1, C(X) is an ellipsoid. UW-Madison (Statistics) Stat 710, Lecture 32 Jan 2018 2 / 12

Example 7.20 (Functions of means) Suppose that X 1,...,X n are i.i.d. random vectors having a c.d.f. F on R d and that the unknown parameter of interest is θ = g(µ), where µ = E(X 1 ) and g is a known differentiable function from R d to R k, k d. From the CLT, Theorem 1.12, θ n = g( X) satisfies V 1/2 n ( θ n θ) d N k (0,I k ) V n = [ g(µ)] τ Var(X 1 ) g(µ)/n A consistent estimator of the asymptotic covariance matrix V n is V n = [ g( X)] τ S 2 g( X)/n. Thus, C(X) = {θ : V 1/2 n ( θ n θ) 2 χ 2 k,α }, is a 1 α asymptotically correct confidence set for θ. UW-Madison (Statistics) Stat 710, Lecture 32 Jan 2018 3 / 12

Example 7.22 (Linear models) Consider linear model X = Z β + ε, where ε has i.i.d. components with mean 0 and variance σ 2. Assume that Z is of full rank and that the conditions in Theorem 3.12 hold. It follows from Theorem 1.9(iii) and Theorem 3.12 that for the LSE β, V 1/2 n ( β β) d N p (0,I p ) V n = σ 2 (Z τ Z ) 1 A consistent estimator for V n is V n = (n p) 1 SSR(Z τ Z ) 1 (see 5.5.1). Thus, a 1 α asymptotically correct confidence set for β is C(X) = {β : ( β β) τ (Z τ Z )( β β) χ 2 p,αssr/(n p)}. Note that this confidence set is different from the one in Example 7.9 derived under the normality assumption on ε. UW-Madison (Statistics) Stat 710, Lecture 32 Jan 2018 4 / 12

Discussions The method of using asymptotically pivotal quantities can also be applied to parametric problems. Note that in a parametric problem where the unknown parameter θ is multivariate, a confidence set for θ with a given confidence coefficient may be difficult or impossible to obtain. Asymptotically correct confidence sets for θ can also be constructed by inverting acceptance regions of asymptotic tests for testing H 0 : θ = θ 0 versus some H 1. Asymptotic efficiency Typically, in a given problem there exist many different asymptotically pivotal quantities that lead to different 1 α asymptotically correct confidence sets for θ. Intuitively, if two asymptotic confidence sets are constructed using two different estimators, θ 1n and θ 2n, and if θ 1n is asymptotically more efficient than θ 2n ( 4.5.1), then the confidence set based on θ 1n should be better than the one based on θ 2n in some sense. UW-Madison (Statistics) Stat 710, Lecture 32 Jan 2018 5 / 12

Discussions The method of using asymptotically pivotal quantities can also be applied to parametric problems. Note that in a parametric problem where the unknown parameter θ is multivariate, a confidence set for θ with a given confidence coefficient may be difficult or impossible to obtain. Asymptotically correct confidence sets for θ can also be constructed by inverting acceptance regions of asymptotic tests for testing H 0 : θ = θ 0 versus some H 1. Asymptotic efficiency Typically, in a given problem there exist many different asymptotically pivotal quantities that lead to different 1 α asymptotically correct confidence sets for θ. Intuitively, if two asymptotic confidence sets are constructed using two different estimators, θ 1n and θ 2n, and if θ 1n is asymptotically more efficient than θ 2n ( 4.5.1), then the confidence set based on θ 1n should be better than the one based on θ 2n in some sense. UW-Madison (Statistics) Stat 710, Lecture 32 Jan 2018 5 / 12

Proposition 7.4 Let C j (X) = {θ : V 1/2 jn ( θ jn θ) 2 χk,α 2 }, j = 1,2, be the confidence sets based on θ jn satisfying V 1/2 jn ( θ jn θ) d N k (0,I k ), where V jn is consistent for V jn, j = 1,2. If Det(V 1n ) < Det(V 2n ) for sufficiently large n, where Det(A) is the determinant of A, then Proof P ( vol(c 1 (X)) < vol(c 2 (X)) ) 1. The result follows from the consistency of V jn and the fact that the volume of the ellipsoid C j (X) is equal to vol(c j (X)) = πk/2 (χk,α 2 )k/2 [Det( V jn )] 1/2. Γ(1 + k/2) UW-Madison (Statistics) Stat 710, Lecture 32 Jan 2018 6 / 12

Proposition 7.4 Let C j (X) = {θ : V 1/2 jn ( θ jn θ) 2 χk,α 2 }, j = 1,2, be the confidence sets based on θ jn satisfying V 1/2 jn ( θ jn θ) d N k (0,I k ), where V jn is consistent for V jn, j = 1,2. If Det(V 1n ) < Det(V 2n ) for sufficiently large n, where Det(A) is the determinant of A, then Proof P ( vol(c 1 (X)) < vol(c 2 (X)) ) 1. The result follows from the consistency of V jn and the fact that the volume of the ellipsoid C j (X) is equal to vol(c j (X)) = πk/2 (χk,α 2 )k/2 [Det( V jn )] 1/2. Γ(1 + k/2) UW-Madison (Statistics) Stat 710, Lecture 32 Jan 2018 6 / 12

Asymptotic efficiency If θ 1n is asymptotically more efficient than θ 2n ( 4.5.1), then Det(V 1n ) Det(V 2n ). Hence, Proposition 7.4 indicates that a more efficient estimator of θ results in a better confidence set in terms of volume. If θ n is asymptotically efficient (optimal in the sense of having the smallest asymptotic covariance matrix; see Definition 4.4), then the corresponding confidence set C(X) is asymptotically optimal (in terms of volume) among the confidence sets of the same form as C(X). Parametric likelihoods In parametric problems, it is shown in 4.5 that MLE s or RLE s are asymptotically efficient. Thus, we study more closely the asymptotic confidence sets based on MLE s and RLE s or, more generally, based on likelihoods. Consider the case where P = {P θ : θ Θ} is a parametric family dominated by a σ-finite measure, where Θ R k. Consider θ = (ϑ,ϕ) and confidence sets for ϑ with dimension r. UW-Madison (Statistics) Stat 710, Lecture 32 Jan 2018 7 / 12

Asymptotic efficiency If θ 1n is asymptotically more efficient than θ 2n ( 4.5.1), then Det(V 1n ) Det(V 2n ). Hence, Proposition 7.4 indicates that a more efficient estimator of θ results in a better confidence set in terms of volume. If θ n is asymptotically efficient (optimal in the sense of having the smallest asymptotic covariance matrix; see Definition 4.4), then the corresponding confidence set C(X) is asymptotically optimal (in terms of volume) among the confidence sets of the same form as C(X). Parametric likelihoods In parametric problems, it is shown in 4.5 that MLE s or RLE s are asymptotically efficient. Thus, we study more closely the asymptotic confidence sets based on MLE s and RLE s or, more generally, based on likelihoods. Consider the case where P = {P θ : θ Θ} is a parametric family dominated by a σ-finite measure, where Θ R k. Consider θ = (ϑ,ϕ) and confidence sets for ϑ with dimension r. UW-Madison (Statistics) Stat 710, Lecture 32 Jan 2018 7 / 12

Parametric likelihoods Let l(θ) be the likelihood function based on the observation X = x. The acceptance region of the LR test defined in 6.4.1 with Θ 0 = {θ : ϑ = ϑ 0 } is A(ϑ 0 ) = {x : l(ϑ 0, ϕ ϑ0 ) e c α/2 l( θ)}, where l( θ) = sup θ Θ l(θ), l(ϑ, ϕ ϑ ) = sup ϕ l(ϑ,ϕ), and c α is a constant related to the significance level α. Under the conditions of Theorem 6.5, if c α is chosen to be χ 2 r,α, the (1 α)th quantile of the chi-square distribution χ 2 r, then C(X) = {ϑ : l(ϑ, ϕ ϑ ) e c α/2 l( θ)} is a 1 α asymptotically correct confidence set. Note that this confidence set and the one given by C(X) = {θ : V 1/2 n ( θ n θ) 2 χ 2 k,α } are generally different. UW-Madison (Statistics) Stat 710, Lecture 32 Jan 2018 8 / 12

Parametric likelihoods In many cases l(ϑ,ϕ) is a convex function of ϑ and, therefore, C(X) based on LR tests is a bounded set in R k ; in particular, C(X) is a bounded interval when k = 1. In 6.4.2 we discussed two asymptotic tests closely related to the LR test: Wald s test and Rao s score test. When Θ 0 = {θ : ϑ = ϑ 0 }, Wald s test has acceptance region A(ϑ 0 ) = {x : ( ϑ ϑ 0 ) τ {C τ [I n ( θ)] 1 C} 1 ( ϑ ϑ 0 ) χ 2 r,α}, where θ = ( ϑ, ϕ) is an MLE or RLE of θ = (ϑ,ϕ), I n (θ) is the Fisher information matrix based on X, C τ = ( I r 0 ), and 0 is an r (k r) matrix of 0 s. By Theorem 4.17, the confidence set obtained by inverting A(ϑ) is C(X) = {θ : V 1/2 n ( ϑ ϑ) 2 χ 2 k,α } with V n = C τ [I n ( θ)] 1 C. UW-Madison (Statistics) Stat 710, Lecture 32 Jan 2018 9 / 12

Parametric likelihoods In many cases l(ϑ,ϕ) is a convex function of ϑ and, therefore, C(X) based on LR tests is a bounded set in R k ; in particular, C(X) is a bounded interval when k = 1. In 6.4.2 we discussed two asymptotic tests closely related to the LR test: Wald s test and Rao s score test. When Θ 0 = {θ : ϑ = ϑ 0 }, Wald s test has acceptance region A(ϑ 0 ) = {x : ( ϑ ϑ 0 ) τ {C τ [I n ( θ)] 1 C} 1 ( ϑ ϑ 0 ) χ 2 r,α}, where θ = ( ϑ, ϕ) is an MLE or RLE of θ = (ϑ,ϕ), I n (θ) is the Fisher information matrix based on X, C τ = ( I r 0 ), and 0 is an r (k r) matrix of 0 s. By Theorem 4.17, the confidence set obtained by inverting A(ϑ) is C(X) = {θ : V 1/2 n ( ϑ ϑ) 2 χ 2 k,α } with V n = C τ [I n ( θ)] 1 C. UW-Madison (Statistics) Stat 710, Lecture 32 Jan 2018 9 / 12

Parametric likelihoods When Θ 0 = {θ : ϑ = ϑ 0 }, Rao s score test has acceptance region A(ϑ 0 ) = {x : [s n (ϑ 0, ϕ ϑ0 )] τ [I n (ϑ 0, ϕ ϑ0 )] 1 s n (ϑ 0, ϕ ϑ0 ) χ 2 r,α}, where s n (θ) = logl(θ)/ θ. The confidence set obtained by inverting A(ϑ) is also 1 α asymptotically correct. Example 7.23 Let X 1,...,X n be i.i.d. binary random variables with p = P(X i = 1). Since confidence sets for p with a given confidence coefficient are usually randomized ( 7.2.3), asymptotically correct confidence sets may be considered when n is large. The likelihood ratio for testing H 0 : p = p 0 versus H 1 : p p 0 is λ(y ) = p Y 0 (1 p 0) n Y / p Y (1 p) n Y, where Y = n i=1 X i and p = Y /n is the MLE of p. UW-Madison (Statistics) Stat 710, Lecture 32 Jan 2018 10 / 12

Parametric likelihoods When Θ 0 = {θ : ϑ = ϑ 0 }, Rao s score test has acceptance region A(ϑ 0 ) = {x : [s n (ϑ 0, ϕ ϑ0 )] τ [I n (ϑ 0, ϕ ϑ0 )] 1 s n (ϑ 0, ϕ ϑ0 ) χ 2 r,α}, where s n (θ) = logl(θ)/ θ. The confidence set obtained by inverting A(ϑ) is also 1 α asymptotically correct. Example 7.23 Let X 1,...,X n be i.i.d. binary random variables with p = P(X i = 1). Since confidence sets for p with a given confidence coefficient are usually randomized ( 7.2.3), asymptotically correct confidence sets may be considered when n is large. The likelihood ratio for testing H 0 : p = p 0 versus H 1 : p p 0 is λ(y ) = p Y 0 (1 p 0) n Y / p Y (1 p) n Y, where Y = n i=1 X i and p = Y /n is the MLE of p. UW-Madison (Statistics) Stat 710, Lecture 32 Jan 2018 10 / 12

Example 7.23 (continued) The confidence set based on LR tests is equal to C 1 (X) = {p : p Y (1 p) n Y e c α/2 p Y (1 p) n Y }. When 0 < Y < n, p Y (1 p) n Y is strictly convex and equals 0 if p = 0 or 1 and, hence, C 1 (X) = [p,p] with 0 < p < p < 1. When Y = 0, (1 p) n is strictly decreasing and, therefore, C 1 (X) = (0,p] with 0 < p < 1. Similarly, when Y = n, C 1 (X) = [p,1) with 0 < p < 1. The confidence set obtained by inverting acceptance regions of Wald s tests is simply C 2 (X) = [ p z 1 α/2 p(1 p)/n, p + z 1 α/2 p(1 p)/n ], since I n (p) = n/[p(1 p)] and (χ 2 1,α )1/2 = z 1 α/2, the (1 α/2)th quantile of N(0, 1). UW-Madison (Statistics) Stat 710, Lecture 32 Jan 2018 11 / 12

Example 7.23 (continued) The confidence set based on LR tests is equal to C 1 (X) = {p : p Y (1 p) n Y e c α/2 p Y (1 p) n Y }. When 0 < Y < n, p Y (1 p) n Y is strictly convex and equals 0 if p = 0 or 1 and, hence, C 1 (X) = [p,p] with 0 < p < p < 1. When Y = 0, (1 p) n is strictly decreasing and, therefore, C 1 (X) = (0,p] with 0 < p < 1. Similarly, when Y = n, C 1 (X) = [p,1) with 0 < p < 1. The confidence set obtained by inverting acceptance regions of Wald s tests is simply C 2 (X) = [ p z 1 α/2 p(1 p)/n, p + z 1 α/2 p(1 p)/n ], since I n (p) = n/[p(1 p)] and (χ 2 1,α )1/2 = z 1 α/2, the (1 α/2)th quantile of N(0, 1). UW-Madison (Statistics) Stat 710, Lecture 32 Jan 2018 11 / 12

Example 7.23 (continued) Note that and [s n (p)] 2 [I n (p)] 1 = s n (p) = Y p n Y 1 p = Y pn p(1 p) (Y pn)2 p(1 p) p 2 (1 p) 2 = n( p p) 2 n p(1 p). Hence, the confidence set obtained by inverting acceptance regions of Rao s score tests is C 3 (X) = {p : n( p p) 2 p(1 p)χ 2 1,α }. It can be shown (exercise) that C 3 (X) = [p,p + ] with p ± = 2Y + χ 2 1,α ± χ 2 1,α [4n p(1 p) + χ 2 1,α ] 2(n + χ 2 1,α ). UW-Madison (Statistics) Stat 710, Lecture 32 Jan 2018 12 / 12