Modern Likelihood-Frequentist Inference. Donald A Pierce, OHSU and Ruggero Bellio, Univ of Udine

Similar documents
Modern Likelihood-Frequentist Inference. Summary

Theory and Methods of Statistical Inference. PART I Frequentist theory and methods

Theory and Methods of Statistical Inference

Theory and Methods of Statistical Inference. PART I Frequentist likelihood methods

Conditional Inference by Estimation of a Marginal Distribution

The formal relationship between analytic and bootstrap approaches to parametric inference

FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE

A NOTE ON LIKELIHOOD ASYMPTOTICS IN NORMAL LINEAR REGRESSION

CONVERTING OBSERVED LIKELIHOOD FUNCTIONS TO TAIL PROBABILITIES. D.A.S. Fraser Mathematics Department York University North York, Ontario M3J 1P3

Australian & New Zealand Journal of Statistics

Some History of Optimality

A simple analysis of the exact probability matching prior in the location-scale model

DEFNITIVE TESTING OF AN INTEREST PARAMETER: USING PARAMETER CONTINUITY

Approximate Inference for the Multinomial Logit Model

PRINCIPLES OF STATISTICAL INFERENCE

ASSESSING A VECTOR PARAMETER

BARTLETT IDENTITIES AND LARGE DEVIATIONS IN LIKELIHOOD THEORY 1. By Per Aslak Mykland University of Chicago

Chapter 1 Likelihood-Based Inference and Finite-Sample Corrections: A Brief Overview

hoa: An R Package Bundle for Higher Order Likelihood Inference

Marginal Posterior Simulation via Higher-order Tail Area Approximations

Integrated likelihoods in models with stratum nuisance parameters

Bootstrap and Parametric Inference: Successes and Challenges

Applied Asymptotics Case studies in higher order inference

Marginal posterior simulation via higher-order tail area approximations

Improved Inference for First Order Autocorrelation using Likelihood Analysis

Likelihood based Statistical Inference. Dottorato in Economia e Finanza Dipartimento di Scienze Economiche Univ. di Verona

Research Article Inference for the Sharpe Ratio Using a Likelihood-Based Approach

Principles of Statistical Inference

Recent advances on Bayesian inference for P (X < Y )

Principles of Statistical Inference

ASYMPTOTICS AND THE THEORY OF INFERENCE

A Very Brief Summary of Statistical Inference, and Examples

Interval Estimation for Parameters of a Bivariate Time Varying Covariate Model

Likelihood and p-value functions in the composite likelihood context

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

ANCILLARY STATISTICS: A REVIEW

Likelihood inference in the presence of nuisance parameters

The Surprising Conditional Adventures of the Bootstrap

COMBINING p-values: A DEFINITIVE PROCESS. Galley

ANCILLARY STATISTICS: A REVIEW

Flat and multimodal likelihoods and model lack of fit in curved exponential families

In Defence of Score Intervals for Proportions and their Differences

Testing Goodness Of Fit Of The Geometric Distribution: An Application To Human Fecundability Data

Staicu, A-M., & Reid, N. (2007). On the uniqueness of probability matching priors.

arxiv: v2 [stat.me] 8 Jun 2016

Stat 5101 Lecture Notes

Accurate directional inference for vector parameters

Approximating models. Nancy Reid, University of Toronto. Oxford, February 6.

Approximations of marginal tail probabilities for a class of smooth functions with applications to Bayesian and conditional inference

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti

arxiv: v1 [stat.me] 5 Apr 2013

Statistica Sinica Preprint No: SS

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Statistics for the LHC Lecture 2: Discovery

Pseudo-score confidence intervals for parameters in discrete statistical models

Error analysis for efficiency

Integrated likelihoods in survival models for highlystratified

Prequential Analysis

Testing Statistical Hypotheses

By Koen Jochmans and Taisuke Otsu University of Cambridge and London School of Economics

Parametrization-invariant Wald tests

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

Accurate directional inference for vector parameters

UNIVERSITÄT POTSDAM Institut für Mathematik

A Closer Look at the Hill Estimator: Edgeworth Expansions and Confidence Intervals

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Detection Theory. Composite tests

Lecture 21. Hypothesis Testing II

A Course in Statistical Theory

Likelihood based Statistical Inference. Dottorato in Economia e Finanza Dipartimento di Scienze Economiche Univ. di Verona

Fundamental Probability and Statistics

Mathematical statistics

Testing Statistical Hypotheses

Aspects of Likelihood Inference

Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling

Semiparametric Regression

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland

PARAMETER CURVATURE REVISITED AND THE BAYES-FREQUENTIST DIVERGENCE.

Foundations of Statistical Inference

Approximation of Sampling Variances and Confidence Intervals for Maximum Likelihood Estimates of Variance Components. Abstract

Department of Statistical Science FIRST YEAR EXAM - SPRING 2017

Sample size calculations for logistic and Poisson regression models

Exercises Chapter 4 Statistical Hypothesis Testing

Lecture 32: Asymptotic confidence sets and likelihoods

Psychology 282 Lecture #4 Outline Inferences in SLR

ADJUSTED PROFILE LIKELIHOODS FOR THE WEIBULL SHAPE PARAMETER

Default priors and model parametrization

Confidence intervals for the variance component of random-effects linear models

Session 3 The proportional odds model and the Mann-Whitney test

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

SEQUENTIAL TESTS FOR COMPOSITE HYPOTHESES

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

Fiducial and Confidence Distributions for Real Exponential Families

Unobservable Parameter. Observed Random Sample. Calculate Posterior. Choosing Prior. Conjugate prior. population proportion, p prior:

Likelihood Inference in the Presence of Nuisance Parameters

BFF Four: Are we Converging?

Maximum-Likelihood Estimation: Basic Ideas

HANDBOOK OF APPLICABLE MATHEMATICS

Covariance modelling for longitudinal randomised controlled trials

Likelihood Inference in the Presence of Nuisance Parameters

Transcription:

Modern Likelihood-Frequentist Inference Donald A Pierce, OHSU and Ruggero Bellio, Univ of Udine

Shortly before 1980, important developments in frequency theory of inference were in the air. Strictly, this was about new asymptotic methods, but with the capacity leading to what has been called Neo-Fisherian theory of inference. A complement to the Neyman-Pearson theory, emphasizing likelihood and conditioning for the reduction of data for inference, rather than direct focus on optimality, e.g. UMP tests 2

3

A few years after that, this pathbreaking paper led the way to remarkable further development of MODERN LIKELIHOOD ASYMPTOTICS That paper was difficult, so Dawn Peters and I had some success interpreting it in an invited RSS discussion paper 4

HIGHLY ABRIDGED REFERENCE LIST Barndorff-Nielsen, O. E. (1986). Inference on full or partial parameters based on the standardized signed likelihood ratio. Biometrika 73, 307-322. Barndorff-Nielsen, O.E. and Cox, D.R. (1994). Inference and Asymptotics, Chapman and Hall, London. Brazzale, A. R., Davison, A. C. and Reid, N. (2007). Applied Asymptotics: Case Studies in Small-Sample Statistics. Cambridge Univ. Press, Cambridge. Durbin, J. (1980). Approximations for densities of sufficient estimators. Biometrika 67, 311-333. Efron, B. and Hinkley, D. V. (1978). Assessing the accuracy of the maximum likelihood estimator: Observed versus expected Fisher information. Biometrika 65, 457-482. Pace, L. and Salvan, A. (1997). Principles of Statistical Inference from a Neo-Fisherian Perspective. World Scientific Publishing Co., Singapore. 5

Pierce, D. A. and Peters, D. (1992). Practical use of higher-order asymptotics for multiparameter exponential families. J. Roy. Statist. Soc. B. 54, 701-737. Pierce, D. A. and Bellio, R. (2006). Effects of the reference set on frequentist inferences. Biometrika 93, 425-438. Pierce, D.A. and Bellio, R. (in prep). Modern likelihood-frequentist inference. (with more complete reference list). Reid, N. (1996). Likelihood and higher-order approximations to tail areas: A review and annotated bibliography. Canad. J. Statist. 24, 141-166. Reid, N. (2003). The 2000 Wald Lectures: Asymptotics and the theory of inference. Ann. Statist. 31, 1695-1731. Severini, T. A. (2000). Likelihood Methods in Statistics. Oxford Univ. Press, Oxford. 6

Skovgaard, I. M. (1996). An explicit large-deviation approximation to one-parameter tests. Bernoulli 2, 145-165. Skovgaard, I. M. (2001). Likelihood asymptotics. Scand. J. Statist. 28, 3-32. Skovgaard, I. M. (1985). A second-order investigation of asymptotic ancillarity. Ann. Statist. 13, 534-551. THIS LIST CONTAINS SEVERAL MAJOR BOOKS Inference and Asymptotics (1994) Barndorff-Nielsen & Cox Principles of Statistical Inference from a Neo-Fisherian Perspective (1997) Pace & Salvan Likelihood Methods in Statistics (2000) Severini 7

Salvan (Univ Padua) and Pace & Bellio (Univ Udine) made it possible for me to visit 2-4 months/year from 2000 to 2014 to study Likelihood Asymptotics In 2012 they arranged for me a Fellowship at Padua, work under which led to the paper in progress discussed today This is based on the idea that the future of Likelihood Asymptotics will depend on: (a) development of generic computational tools and (b) concise and transparent exposition amenable to statistical theory courses. 8

Starting point is a simple and accurate likelihood ratio approximation to the distribution of the (multidimensional) maximum likelihood estimator Next step is to transform & marginalize from this to the distribution of the signed LR statistic (sqrt of usual 2 1 statistic) --- requiring only a Jacobian and Laplace approximation to the integration This result is expressed as an adjustment to the first-order N(0,1) distribution of the LR: If that approximation is poor but not terrible this mops up most of the error (Rob Kass) This is not hard to fathom---accessible to a graduate level theory course---if one need not be distracted by arcane details 9

A central concept in what follows involves observed and expected (Fisher) information. The observed information is defined as minus the second derivative of the loglikelihood at its maximum ˆ ( ; ) j l y ˆ The expected information (more usual Fisher info) is defined as i( ) E{ l ( ; Y )} And we will write iˆ i( ˆ ) 10

The MLE is sufficient if and only if iˆ ˆj, and under regularity this occurs only for exponential families without restriction on the parameter (full rank case) Inferentially it is unwise and not really necessary to use the average information With methods indicated here, it is feasible to condition on an ancillary statistic such as ˆ ˆ ˆ1 a j / i ( meaning actually i ˆj ) This is key part of what is called Neo-Fisherian Inference 11

Remarks on ancillary conditioning: Neo-Fisherian Inference To Fisher, optimality of inference involved sufficiency, more strongly than in the Neyman-Pearson theory But generally the MLE is not a sufficient statistic Thus to Fisher, and many others, the resolution of that was conditioning on something like a ˆj / iˆ to render the MLE sufficient beyond 1 st order. 12

Indeed, Skovgaard (1985) showed that in general (, a) is to O (1 / n ) sufficient, and conditioning on a ˆj / iˆ P (among other choices) leads in that order to: (a) no loss of information, (b) the MLE being sufficient The LR approximation to the distribution of the MLE (usually * but less usefully called the p formula, or the magic formula as by Efron in his Fisher Lecture) is then pr * ( ˆ a; ) ˆ 1/2 j( ) pr( y; ) p/2 (2 ) pr( y ; ˆ ) pr ˆ a O n 1 ( ; ) 1 ( ) ˆ 13

Though this took some years to emerge, in restrospect it becomes fairly simple: ( ˆ ˆ p a; ) p( a; ) p( ˆ a; ˆ ) p( ˆ a; ˆ ) p( y a; ) p( ˆ a; ) p( ˆ a; ˆ ) since the first term is nearly unity p( y a; ˆ ) p( ˆ a; ˆ ) py ( ; ) p( ˆ a; ˆ ) and with Edgeworth expansion to the final term py ( ; ˆ ) 1/2 py ( ; ) j( ˆ ) ˆ py ( ; ˆ /2 ) (2 ) p O n O n p * ( ˆ a; ) 1/2 this having relative error (1/ ) for all = + ( ) 14

The Jacobian and marginalization to be applied to involve rather arcane sample space derivatives * p ( ˆ ) 2 l( ˆ ) 1/2 C ˆ, { ( ˆ; ˆ, ) ( ; ˆ, )} / ˆ j j u lp a lp a j ˆT approximations* to which are taken care of by the software we provide. 1/2 The result is an inferential quantity that is standard normal to 2 nd order * 1 1 r r r log( C ) r log u / r r NP INF modifying the usual 1 st order standard normal LR quantity r ( ˆ ) 2{ ( ˆ sign l ; y) l( ; y)} 15

It was almost prohibitively difficult to differentiate the likelihood with respect to MLEs while holding fixed an ancillary statistic The approximations* to sample space derivatives referred to came in a breakthrough by Skovgaard, making the theory practical Skovgaard s approximation uses projections involving covariances of likelihood quantities computed without holding fixed an ancillary Our software uses simulation for these covariances, NOT involving model fitting in simulation trials 16

To use the generic software, the user specifies an R- function for computing the likelihood. The choices made render the routines fairly generally applicable. Since higher-order inference depends on more than the likelihood function, one defines the extra-likelihood aspects of the model by providing another R-function that generates a dataset. The interest parameter is defined by one further R- function. We illustrate this with a Weibull example, and interest parameter the survival function at a given time and covariate 17

18

For testing this at the Wald-based 95% lower confidence limit, the results are r r * 1.66 ( P 0.048) Wald 2.10 P 0.018 1.95 ( P 0.025) This is typical for settings with few nuisance parameters, when there are several the adjustment can be much larger 19