A Very Brief Summary of Statistical Inference, and Examples

Similar documents
A Very Brief Summary of Statistical Inference, and Examples

simple if it completely specifies the density of x

Ch. 5 Hypothesis Testing

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Spring 2012 Math 541B Exam 1

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Exercises Chapter 4 Statistical Hypothesis Testing

DA Freedman Notes on the MLE Fall 2003

Mathematical statistics

Central Limit Theorem ( 5.3)

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Math 494: Mathematical Statistics

A Very Brief Summary of Bayesian Inference, and Examples

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ )

Chapter 7. Hypothesis Testing

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

Institute of Actuaries of India

Mathematical statistics

Introduction to Estimation Methods for Time Series models Lecture 2

Hypothesis Testing - Frequentist

Hypothesis Testing: The Generalized Likelihood Ratio Test

parameter space Θ, depending only on X, such that Note: it is not θ that is random, but the set C(X).

Mathematical statistics

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

Lecture 17: Likelihood ratio and asymptotic tests

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX

Master s Written Examination

Loglikelihood and Confidence Intervals

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

f(y θ) = g(t (y) θ)h(y)

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

2014/2015 Smester II ST5224 Final Exam Solution

STAT 512 sp 2018 Summary Sheet

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Composite Hypotheses and Generalized Likelihood Ratio Tests

Theory of Statistical Tests

Statistics GIDP Ph.D. Qualifying Exam Theory Jan 11, 2016, 9:00am-1:00pm

Chapter 3 : Likelihood function and inference

Likelihood inference in the presence of nuisance parameters

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Lecture 26: Likelihood ratio tests

Statistics. Statistics

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program

Statistics 135 Fall 2008 Final Exam

Maximum-Likelihood Estimation: Basic Ideas

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Statistics Ph.D. Qualifying Exam

Greene, Econometric Analysis (6th ed, 2008)

Lecture 21: October 19

Chapter 9: Hypothesis Testing Sections

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

Interval Estimation. Chapter 9

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Brief Review on Estimation Theory

Some General Types of Tests

Hypothesis Testing. BS2 Statistical Inference, Lecture 11 Michaelmas Term Steffen Lauritzen, University of Oxford; November 15, 2004

Lecture 32: Asymptotic confidence sets and likelihoods

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources

Limiting Distributions

Stat 5102 Lecture Slides Deck 3. Charles J. Geyer School of Statistics University of Minnesota

Applied Asymptotics Case studies in higher order inference

Topic 15: Simple Hypotheses

Hypothesis testing: theory and methods

Hypothesis Testing (May 30, 2016)

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

Session 3 The proportional odds model and the Mann-Whitney test

Topic 19 Extensions on the Likelihood Ratio

McGill University. Faculty of Science. Department of Mathematics and Statistics. Part A Examination. Statistics: Theory Paper

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

Review Quiz. 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Econ 583 Homework 7 Suggested Solutions: Wald, LM and LR based on GMM and MLE

Chapter 4. Theory of Tests. 4.1 Introduction

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.

SUFFICIENT STATISTICS

Chapter 3. Point Estimation. 3.1 Introduction

4.5.1 The use of 2 log Λ when θ is scalar

Econ 583 Final Exam Fall 2008

Final Examination a. STA 532: Statistical Inference. Wednesday, 2015 Apr 29, 7:00 10:00pm. Thisisaclosed bookexam books&phonesonthefloor.

Stat 710: Mathematical Statistics Lecture 12

Testing Statistical Hypotheses

Statistics and econometrics

Lecture 14 More on structural estimation

557: MATHEMATICAL STATISTICS II HYPOTHESIS TESTING: EXAMPLES

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

STA 732: Inference. Notes 2. Neyman-Pearsonian Classical Hypothesis Testing B&D 4

Problem 1 (20) Log-normal. f(x) Cauchy

Lecture 7 Introduction to Statistical Decision Theory

TUTORIAL 8 SOLUTIONS #

Basic Concepts of Inference

Transcription:

A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random variables X 1, X 2,..., X n with a distribution (model) f(x 1, x 2,..., x n ; θ), where θ is unknown. In frequentist analysis, θ Θ is an unknown constant. as 1. Likelihood and Sufficiency (Fisherian) Likelihood approach: We define the likelihood of θ given the data L(θ) = L(θ, x) = f(x 1, x 2,..., x n ; θ). Often: X 1, X 2,..., X n are independent, identically distributed (i.i.d.); then L(θ, x) = n i=1 f(x i, θ). We summarize information about θ: find a minimal sufficient statistic t(x); from the Factorization Theorem: T = t(x) is sufficient for θ if and only if there exists functions g(t, θ) and h(x) such that for all x and θ f(x, θ) = g(t(x), θ)h(x). Moveover T = t(x) is minimal sufficient when it holds that f(x, θ) f(y, θ) is constant in θ t(x) = t(y) Example Let X 1,..., X n be a random sample from the truncated exponential distribution, where f Xi (x i ) = e θ xi, x i > θ or, using the indicator function notation, f Xi (x i ) = e θ xi I (θ, ) (x i ). Show that Y 1 = min(x i ) is sufficient for θ. Let T = T (X 1,..., X n ) = Y 1. We need the pdf f T (t) of the smallest order statistic. Calculate the cdf for X i, Now F (x) = x θ e θ z dz = e θ [e θ e x ] = 1 e θ x. n P (T > t) = (1 F (t)) = (1 F (t)) n. i=1 1

Differentiating gives that f T (t) equals n[1 F (t)] n 1 f(t) = ne (θ t)(n 1) e θ t = ne n(θ t), t > θ. So the conditional density of X 1,... X n given T = t is e θ x1 e θ x2... e θ xn ne n(θ t) = e xi ne nt, x i t, i = 1,... n, which does not depend on θ for each fixed t = min(x i ). Note that since x i t, i = 1,..., n, neither the expression nor the range space depends on θ, so the first order statistic, X (1) = min(x i ), is a sufficient statistic for θ. With Alternatively, use the Factorization Theorem: f X (x) = n e θ xi I (θ, ) (x i ) i=1 = I (θ, ) (x (1) ) n i=1 e θ xi = e nθ I (θ, ) (x (1) )e n i=1 xi. g(t, θ) = e nθ I (θ, ) (t) and h(x) = e n i=1 xi we see that X (1) is minimal sufficient. 2. Point Estimation Estimate θ by a function t(x 1,..., x n ) of the data; often by maximumlikelihood: ˆθ = arg max L(θ), θ or by method of moments. Neither are unbiased in general, but the m.l.e. is asymptotically unbiased and asymptotically efficient; under some regularity assumptions, ˆθ N (θ, In 1 (θ)), where [ ( l(θ, ) ] 2 X) I n (θ) = E θ is the Fisher information (matrix). Under more regularity, ( 2 ) l(θ, X) I n (θ) = E θ 2. If x is a random sample, then I n (θ) = ni 1 (θ) ; often we abbreviate I 1 (θ) = I(θ). 2

Note that if c was large, then by the CLT each observation X j is approximately normal, because, in distribution, X j = Y 1 + Y 2 + + Y k, with Y i, i = 1,..., k, being i.i.d. Gamma(c/k, β). Recall also that there is an iterative method to compute m.l.e.s, related to the Newton-Raphson method. 3. Hypothesis Testing For simple null hypothesis and simple alternative, the Neyman-Pearson Lemma says that the most powerful tests are likelihood-ratio tests. These can sometimes be generalized to one-sided alternatives in such a way as to yield uniformly most powerful tests. Example Suppose as above that X 1,..., X n is a random sample from the truncated exponential distribution, where f Xi (x i ; θ) = e θ xi, x i > θ. Find a UMP test of size α for testing H 0 : θ = θ 0 against H 1 : θ > θ 0 : Let θ 1 > θ 0, then the LR is f(x, θ 1 ) f(x, θ 0 ) = e n(θ1 θ0) I (θ1, )(x (1) ), which increases with x (1), so we reject H 0 if the smallest observation, T = X (1), is large. Under H 0, we have calculated that the pdf of f T is So for t > θ, under H 0, P (T > t) = ne n(θ0 y1), y 1 > θ 0. For a test at level α, choose t such that t ne n(θ0 s) ds = e n(θ0 t). t = θ 0 ln α n. The LR test rejects H 0 if X (1) > θ 0 ln α n. The test is the same for all θ > θ 0, so it is UMP. For a general null hypothesis and a general alternative, H 0 : θ Θ 0 versus H 1 : θ Θ 1 = Θ\Θ 0, the (generalized) LR test uses the likelihood ratio statistic T = max L(θ; X) θ Θ max L(θ; X). θ Θ 0 4

We reject H 0 for large values of T ; we use the chisquare asymptotics 2 log T χ 2 p, where p = dimθ dimθ 0 (which is valid for nested models only). Alternatively, we could use score tests, which are based on the score function l/ θ, with asymptotics in terms of normal distribution N (0, I n (θ)). Or, we could use a Wald test, which is based on the asymptotic normality of the m.l.e.; often ˆθ N ( θ, In 1 (θ) ) if θ is the true parameter. Recall: In a random sample, I n (θ) = ni 1 (θ). An important example is Pearson s Chisquare test, which we derived as a score test, and we saw that it is asymptotically equivalent to the generalized likelihood ratio test Example Let X 1,..., X n be a random sample from a geometric distribution with parameter p; P (X i = k) = p(1 p) k, k = 0, 1, 2,.... Then and so that and L(p) = p n (1 p) (x i) l(p) = n ln p + nx ln(1 p), ( 1 l/ p(p) = n p x ) 1 p ( 2 l/ p 2 (p) = n 1 ) p 2 x (1 p) 2. We know that EX 1 = (1 p)/p. Calculate the information I(p) = n p 2 (1 p). Suppose n = 20, x = 3, H 0 : p 0 = 0.15 and H 1 : p 0 0.15. Then l/ p(0.15) = 62.7 and I(0.15) = 1045.8. The test statistic is then Z = 62.7/ 1045.8 = 1.9388. Compare to 1.96 for a test at level α = 0.05: do not reject H 0. Example continued. For a generalized LRT the test statistic is based on 2 log[l(ˆθ)/l(θ 0 )] χ 2 1, and the test statistic is thus 2(l(ˆθ) l(θ 0 ) = 2n(ln(ˆθ) ln(θ 0 ) + x(ln(1 ˆθ) ln(1 θ 0 )). We calculate that the m.l.e. is ˆθ = 1 1 + x. 5

Suppose again n = 20, x = 3, H 0 : θ 0 = 0.15 and H 1 : θ 0 0.15. The m.l.e. is ˆθ = 0.25, and l(0.25) = 44.98; l(0.15) = 47.69. Calculate χ 2 = 2(47.69 44.98) = 5.4 and compare to chisquare distribution with 1 degree of freedom: 3.84 at 5 percent level, so reject H 0. 4. Confidence Regions If we can find a pivot, a function t(x, θ) of a sufficient statistics whose distribution does not depend on θ, then we can find confidence regions in a straightforward manner. Otherwise we may have to resort to approximate confidence regions, for example using the approximate normality of the m.l.e. Recall that confidence intervals are equivalent to hypothesis tests with simple null hypothesis and one- or two-sided alternatives Not to forget about: Profile likelihood Often θ = (ψ, λ) where ψ contains the parameters of interest; then we may base our inference on the profile likelihood for ψ, L P (ψ) = L(ψ, ˆλ ψ ). Again we can use a (generalized) LRT or score test; if ψ scalar, H 0 : ψ = ψ 0, H + 1 : ψ > ψ 0, H 1 : ψ < ψ 0, use test statistic T = l(ψ 0, ˆλ 0 ; X), ψ where ˆλ 0 is the MLE for λ when H 0 true. Large positive values of T indicate H + 1 ; large negative values indicate H 1, and T l ψ I 1 λ,λ I ψ,λl λ N(0, 1/I ψ,ψ ), where I ψ,ψ = (I ψ,ψ I 2 ψ,λ I 1 λ,λ ) 1 is the top left element of I 1. We estimate the parameters by substituting the null hypothesis values; calculate the practical standardized form of T as Z = T Var(T ) l ψ(ψ, ˆλ ψ )[I ψ,ψ (ψ, ˆλ ψ )] 1/2, which is approximately standard normal. 6

Bias and variance approximations: the delta method The delta method is useful if we cannot calculate mean and variance directly. Suppose T = g(s) where ES = β and Var S = V. Taylor expansion gives T = g(s) g(β) + (S β)g (β). Taking the mean and variance of the r.h.s.: ET g(β), Var T [g (β)] 2 V. This also works for vectors S, β, with T still a scalar. If ( g (β) ) i = g/ β i, and g (β) the matrix of second derivatives, then Var T [g (β)] T V g (β) and ET g(β) + 1 2 trace[g (β)v ]. Exponential family For distributions in the exponential family, many calculations have been standardized, see the lecture notes. 7