Principles of Statistical Inference

Similar documents
Principles of Statistical Inference

BFF Four: Are we Converging?

DEFNITIVE TESTING OF AN INTEREST PARAMETER: USING PARAMETER CONTINUITY

Approximate Inference for the Multinomial Logit Model

Applied Asymptotics Case studies in higher order inference

Measuring nuisance parameter effects in Bayesian inference

ASSESSING A VECTOR PARAMETER

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Default priors and model parametrization

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Bootstrap and Parametric Inference: Successes and Challenges

ASYMPTOTICS AND THE THEORY OF INFERENCE

A Very Brief Summary of Statistical Inference, and Examples

The formal relationship between analytic and bootstrap approaches to parametric inference

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Bayesian and frequentist inference

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

The Surprising Conditional Adventures of the Bootstrap

1. Introduction and non-bayesian inference

Conditional Inference by Estimation of a Marginal Distribution

Nancy Reid SS 6002A Office Hours by appointment

Likelihood and p-value functions in the composite likelihood context

TEACHING INDEPENDENCE AND EXCHANGEABILITY

Statistical Methods for Particle Physics (I)

A Very Brief Summary of Statistical Inference, and Examples

Likelihood inference in the presence of nuisance parameters

BTRY 4830/6830: Quantitative Genomics and Genetics

Political Science 236 Hypothesis Testing: Review and Bootstrapping

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Accurate directional inference for vector parameters

Time Series and Dynamic Models

Accurate directional inference for vector parameters

Approximating models. Nancy Reid, University of Toronto. Oxford, February 6.

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Bayesian Inference: What, and Why?

Parametric Techniques Lecture 3

A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling

On the Theoretical Foundations of Mathematical Statistics. RA Fisher

Formal Epistemology: Lecture Notes. Horacio Arló-Costa Carnegie Mellon University

Constructing Ensembles of Pseudo-Experiments

Bayesian Statistics. State University of New York at Buffalo. From the SelectedWorks of Joseph Lucke. Joseph F. Lucke

My talk concerns estimating a fixed but unknown, continuously valued parameter, linked to data by a statistical model. I focus on contrasting

Parametric Techniques

Probability is related to uncertainty and not (only) to the results of repeated experiments

Confidence Distribution

Studentization and Prediction in a Multivariate Normal Setting

STAT 425: Introduction to Bayesian Analysis

An Improved Specification Test for AR(1) versus MA(1) Disturbances in Linear Regression Models

Stat 5101 Lecture Notes

Confidence intervals fundamental issues

Lecture Notes 1: Decisions and Data. In these notes, I describe some basic ideas in decision theory. theory is constructed from

Likelihood Inference in the Presence of Nuisance Parameters

Australian & New Zealand Journal of Statistics

Likelihood Inference in the Presence of Nuisance Parameters

Nancy Reid SS 6002A Office Hours by appointment

Modern likelihood inference for the parameter of skewness: An application to monozygotic

A BAYESIAN MATHEMATICAL STATISTICS PRIMER. José M. Bernardo Universitat de València, Spain

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Likelihood and Asymptotic Theory for Statistical Inference

Nuisance parameters and their treatment

Better Bootstrap Confidence Intervals

Ch. 5 Hypothesis Testing

Noninformative Priors for the Ratio of the Scale Parameters in the Inverted Exponential Distributions

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

New Bayesian methods for model comparison

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

CS 540: Machine Learning Lecture 2: Review of Probability & Statistics

Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests

The Fundamental Principle of Data Science

Part III. A Decision-Theoretic Approach and Bayesian testing

Math 494: Mathematical Statistics

Bayesian Model Specification: Toward a Theory of Applied Statistics

Modern Likelihood-Frequentist Inference. Donald A Pierce, OHSU and Ruggero Bellio, Univ of Udine

Theory of Probability Sir Harold Jeffreys Table of Contents

Statistics: Learning models from data

Improved Inference for First Order Autocorrelation using Likelihood Analysis

ANCILLARY STATISTICS: A REVIEW

ICES REPORT Significance testing without truth

ANCILLARY STATISTICS: A REVIEW

Probability and Statistics

Two examples of the use of fuzzy set theory in statistics. Glen Meeden University of Minnesota.

Computing Likelihood Functions for High-Energy Physics Experiments when Distributions are Defined by Simulators with Nuisance Parameters

Statistical Models with Uncertain Error Parameters (G. Cowan, arxiv: )

A Very Brief Summary of Bayesian Inference, and Examples

Model comparison and selection

Why Try Bayesian Methods? (Lecture 5)

A primer on Bayesian statistics, with an application to mortality rate estimation

More on nuisance parameters

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

One-parameter models

Review: Statistical Model

Bayesian Inference in Astronomy & Astrophysics A Short Course

Likelihood and Asymptotic Theory for Statistical Inference

Statistical Methods in Particle Physics. Lecture 2

Chapter 1 Likelihood-Based Inference and Finite-Sample Corrections: A Brief Overview

2. I will provide two examples to contrast the two schools. Hopefully, in the end you will fall in love with the Bayesian approach.

CMPT Machine Learning. Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th

Bayesian Methods for Machine Learning

Transcription:

Principles of Statistical Inference Nancy Reid and David Cox August 30, 2013

Introduction Statistics needs a healthy interplay between theory and applications theory meaning Foundations, rather than theoretical analysis of specific techniques Foundations? suggests a solid base, on which rests a large structure must be continually tested against new applications The notion that this can be captured in a simple framework, much less a set of mathematical axioms, seems dangerously naive Fisher 1956 Foundations in Statistics depend on, and must be tested and revised, in the light of experience and assessed by relevance to the very wide variety of contexts in which statistical considerations arise Principles of Statistical Inference Reid & Cox, WSC2013 2

... introduction I I A formal theory of inference is just a small part of the challenges of statistical work Equally important are aspects of I I I I I Principles of Statistical Inference study design types of measurement formulation of research questions connecting these questions to statistical models... Reid & Cox, WSC2013 3

Outline 1. the role of probability 2. some classical principles: sufficiency, ancillarity, likelihood 3. some less classical compromises: pivotals, strong matching, asymptotics, bootstrap 4. some thoughts on the future Principles of Statistical Inference Reid & Cox, WSC2013 4

Role of probability central to most formulations of statistical issues but not all, e.g. algorithmic approaches popular especially in machine learning Breiman 1999 theory of probability has been liberated from discussions of its meaning via Kolmogorov s axioms except possibly the modification needed for quantum mechanics, and notions of upper and lower probability statisticians do not have this luxury! we continue to be engaged by the distinction between probability representing physical haphazard variability Jeffreys chances probability encapsulating, directly or indirectly, aspects of the uncertainty of knowledge Principles of Statistical Inference Reid & Cox, WSC2013 5

Probability and empirical variability four related, but different, approaches 1. the data are regarded as a random sample from a hypothetical infinite population; frequencies within this are probabilities; some aspect of these represent the target of inference 2. the data form part of a long real or somewhat hypothetical process of repetition under constant conditions; limiting frequencies in this repetition are the probabilities of interest; some aspect... 3. either 1., 2. or both, plus an explicit, idealized, description of the physical, biological,... processes that generated the data 4. either 1., 2. or both, used only to describe the randomization in experimental design or in sampling an existing population; leading to the so-called design approach to analysis Principles of Statistical Inference Reid & Cox, WSC2013 6

... empirical variability probabilities represent features of the real world, of course in somewhat idealized form, and, given suitable data, are subject to empirical test and improvement conclusions of statistical analysis are to be expressed in terms of interpretable parameters describing such a probabilistic representation of the system under study enhanced understanding of the data generating process as in epidemics, for example Principles of Statistical Inference Reid & Cox, WSC2013 7

Probability as uncertain knowledge probability as measuring strength of belief in some uncertain proposition is sharply different; how do we address this 1. consider that probability measures rational, supposedly impersonal, degree of belief, given relevant information Jeffreys 1939 1961 2. consider that probability measures a particular person s degree of belief, subject typically to some constraints of self-consistency F.P. Ramsey 1926, de Finetti 1937, Savage 1956 seems intimately linked with personal decision making 3. avoid the need for a different version of probability by appeal to a notion of calibration Principles of Statistical Inference Reid & Cox, WSC2013 8

... uncertain knowledge we may avoid the need for a different version of probability by appeal to a notion of calibration as assessed by behaviour of a procedure under hypothetical repetition as with other measuring devices within this scheme of repetition, probability is defined as a hypothetical frequency the precise specification of the assessment process may requiring some notion of conditioning Good 1949 the formal accept-reject paradigm of Neyman-Pearson theory would be an instance of decision analysis and as such outside the immediate discussion Principles of Statistical Inference Reid & Cox, WSC2013 9

A brief assessment 1. even if a frequency view of probability is not used directly as a basis for inference it is unacceptable if a procedure using probability in another sense is poorly calibrated such a procedure, used repeatedly, gives misleading conclusions Bayesian Analysis 1(3) 2006; Wasserman BA 3(3) 2008 2. standard accounts of probability assume total ordering of probabilities can we regard a probability, p = 0.3, say, found from careful investigation of a real-world effect as equivalent to the same p derived from personal judgment, based on scant or no direct evidence? Principles of Statistical Inference Reid & Cox, WSC2013 10

... assessment 3. a great attraction of Bayesian arguments is that all calculations are by the rules of probability theory however, personalistic approaches merge seamlessly what may be highly personal assessments with evidence from data, possibly collected with great care this is surely unacceptable for the careful discussion of the meaning of data and the presentation of those conclusions in the scientific literature even if essential for personal decision making this is in no way to deny the role of personal judgement and experience in interpreting data; it is the merging that may be unacceptable Principles of Statistical Inference Reid & Cox, WSC2013 11

... assessment 4. another attraction of Bayesian arguments, in principle at least, is the assimilation of external evidence most applications of objective approaches use some form of reference prior representing vague knowledge this is increasingly questionable as the dimension of the parameter space increases important questions concerning higher order matching as the number of parameters increases seem open 5. a view that does not accommodate some form of model checking, even if very informally, is inadequate Principles of Statistical Inference Reid & Cox, WSC2013 12

Classical Principles sufficiency: if f (y; θ) f 1 (s; θ)f 2 (t s), inference about θ should be based on s ancillary: if f (y; θ) f 1 (s t; θ)f 2 (t), inference about θ should be based on s, given t relevant subsets \? likelihood: inference should be based on the likelihood function likelihood as equivalence class of functions of θ for direct use of the likelihood function, but inference constructed from the likelihood function seems to be widely accepted Principles of Statistical Inference Reid & Cox, WSC2013 13

... classical principles sufficiency and ancillary become more difficult for inference about parameters of interest, in the presence of nuisance parameters asymptotic theory provides a means to incorporate these notions in an approximate sense but the details continue to be somewhat cumbersome Bayesian methods, being based on the observed data, can avoid this consideration at the expense of specification of prior probabilities which is much more difficult in the nuisance parameter setting Principles of Statistical Inference Reid & Cox, WSC2013 14

... classical principles a pivotal quantity is a function of the data, y, and parameter of interest ψ, with a known distribution inversion of a pivotal quantity, using its known distribution, gives a p-value function of ψ, or a confidence distribution for ψ; i.e. a set of nested confidence regions at any desired confidence level the standardized maximum likelihood estimator is an example of such a pivotal quantity. (ˆθ θ)/ˆσ θ N(0, I ) 2{l(ˆθ) l(θ)}. χ 2 d ˆσ 2 θ = l (ˆθ) 1 Principles of Statistical Inference Reid & Cox, WSC2013 15

Some insight from asymptotic theory we can construct pivotal quantities to a higher order of approximation for example, 2{l(ˆθ) l(θ)}/{1 + B(θ)/n}. χ 2 d O(n 2 ) M(a) m-4m-4m -3M-3M -2M-2M -1M-1M 0M0M 1M1M 2M2M M(c) m-4m-4m -3M-3M -2M-2M -1M-1M 0M0M 1M1M 2M2M Bartlett 1937 (a) (c) -4-4 -3-3 -2-2 -1-1 0 0 1 1 2 2-4 -3-2 -1 0 1 2-4 -3-2 -1 0 1 2 Principles of Statistical Inference Reid & Cox, WSC2013 16

... asymptotic theory we can do better, if the parameter of interest ψ is scalar; nuisance parameters λ vector r(ψ) = [2{l p ( ˆψ) l p (ψ)}] 1/2. N(0, 1) r (ψ) = r(ψ) + 1 r(ψ) log { } Q(ψ) r(ψ). N(0, 1) O(n 1/2 ) O(n 3/2 ) a large deviation result; leads to very accurate inferences Brazzale et al. 2008 θ = (ψ, λ) l p (ψ) = l(ψ, ˆλ ψ ) Q(ψ) = r(ψ) + o p (1) Principles of Statistical Inference Reid & Cox, WSC2013 17

... asymptotic theory Principles of Statistical Inference Reid & Cox, WSC2013 18

... asymptotic theory leads to insight about the point of departure between Bayesian and frequentist methods Pierce & Peters, 1994; Fraser et al 2010 because, the corresponding Bayesian pivot is r (ψ) = r(ψ) + 1 { Q π } r(ψ) log (ψ) r(ψ) any prior π(θ) for which Q(ψ) = QB π (ψ) ensures Bayesian inference calibrated for ψ Principles of Statistical Inference Reid & Cox, WSC2013 19

... asymptotic theory leads to insight about the point of departure between Bayesian and frequentist methods Pierce & Peters, 1994; Fraser et al 2010 because, the corresponding Bayesian pivot is r (ψ) = r(ψ) + 1 { Q π } r(ψ) log (ψ) r(ψ) any prior which is calibrated for ψ is not likely to be calibrated for other components of θ Principles of Statistical Inference Reid & Cox, WSC2013 20

... asymptotic theory leads to insight about the point of departure between Bayesian and frequentist methods Pierce & Peters, 1994; Fraser et al 2010 because, the corresponding Bayesian pivot is r (ψ) = r(ψ) + 1 { Q π } r(ψ) log (ψ) r(ψ) adjustments for high-dimensional nuisance parameters are the most important ingredient these adjustments are built into Q(ψ) Principles of Statistical Inference Reid & Cox, WSC2013 21

... asymptotic theory leads to insight about the point of departure between Bayesian and frequentist methods Pierce & Peters 1994; Fraser et al 2010 because, the corresponding Bayesian pivot is r (ψ) = r(ψ) + 1 { Q π } r(ψ) log (ψ) r(ψ) can be replicated by bootstrap sampling under (ψ, ˆλ ψ ) DiCiccio & Young 2008 ; Fraser & Rousseau 2008 Principles of Statistical Inference Reid & Cox, WSC2013 22

Foundations and Applications Foundations must be continually tested against (new) applications in spite of the likelihood principle, a great deal of applied work considers the distribution of quantities based on the likelihood function (maximum likelihood estimator, likelihood ratio statistic, etc.) in spite of theorems on coherence and exchangeability, a great deal of applied work with Bayesian methods uses what are hoped to be non-influential priors the question is whether or not there really are non-influential non-informative priors are the perpetual motion machine of statistics Wasserman 2012 Principles of Statistical Inference Reid & Cox, WSC2013 23

... foundations and applications Are we ready for Big Data Will statistical principles be helpful? Are the classical principles enough? Inferential giants : assessment of sampling bias, inference about tails, resampling inference, change point detection, reproducibility of analyses, causal inference for observational data, efficient inference for temporal streams National Academies Press http://www.nap.edu/catalog.php?record_id=18374 Principles of Statistical Inference Reid & Cox, WSC2013 24

... foundations and applications Are we ready for Big Data Will statistical principles be helpful? Are the classical principles enough? Inferential giants : assessment of sampling bias, inference about tails, resampling inference, change point detection, reproducibility of analyses, causal inference for observational data, efficient inference for temporal streams National Academies Press http://www.nap.edu/catalog.php?record_id=18374 Principles of Statistical Inference Reid & Cox, WSC2013 25

Conclusion from our abstract: statistical theory serves to provide a systematic way of approaching new problems, and to give a common language for summarizing results ideally this foundation and common language ensures that the statistical aspects of one study or of several studies on closely related phenomena can, in broad terms, be readily understood by the non-specialist However, the continuing distinction between Bayesians and frequentists is a source of ambiguity and potential confusion this does not bely the utility of methods that combine probabilities using Bayes theorem Math Stat, the course students love to hate, is more important than ever! Principles of Statistical Inference Reid & Cox, WSC2013 26