Confidence distributions in statistical inference

Similar documents
Confidence Distribution

BFF Four: Are we Converging?

A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling

A Very Brief Summary of Statistical Inference, and Examples

Confidence intervals for the parameter of Poisson. distribution in presence of background

Fiducial Inference and Generalizations

Principles of Statistical Inference

Principles of Statistical Inference

A Very Brief Summary of Statistical Inference, and Examples

Confidence Distribution, the Frequentist Distribution Estimator of a Parameter: A Review

Stat 5101 Lecture Notes

Signal Significance in the Presence of Systematic and Statistical Uncertainties

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Math 494: Mathematical Statistics

A simple analysis of the exact probability matching prior in the location-scale model

Statistics of Small Signals

Statistical Inference

Spring 2012 Math 541B Exam 1

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Subject CS1 Actuarial Statistics 1 Core Principles

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

My talk concerns estimating a fixed but unknown, continuously valued parameter, linked to data by a statistical model. I focus on contrasting

Final Examination. STA 215: Statistical Inference. Saturday, 2001 May 5, 9:00am 12:00 noon

Symmetric Probability Theory

Introduction to Bayesian Methods

Chapter 5. Bayesian Statistics

Multivariate Histogram Comparison

Classical and Bayesian inference

Statistical Methods in Particle Physics

On Generalized Fiducial Inference

Invariant HPD credible sets and MAP estimators

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

2. A Basic Statistical Toolbox

Some Curiosities Arising in Objective Bayesian Analysis

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics for Particle Physics. Kyle Cranmer. New York University. Kyle Cranmer (NYU) CERN Academic Training, Feb 2-5, 2009

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

Theory of Probability Sir Harold Jeffreys Table of Contents

Probabilistic Inference for Multiple Testing

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

STAT 425: Introduction to Bayesian Analysis

Harvard University. Harvard University Biostatistics Working Paper Series

Lecture 8: Information Theory and Statistics

Statistical techniques for data analysis in Cosmology

Inferential models: A framework for prior-free posterior probabilistic inference

Parametric Techniques Lecture 3

Introduction to Machine Learning. Lecture 2

1 Hypothesis Testing and Model Selection

Solution: chapter 2, problem 5, part a:

Statistical Data Analysis Stat 3: p-values, parameter estimation

CD posterior combining prior and data through confidence distributions

Theory and Methods of Statistical Inference. PART I Frequentist likelihood methods

P Values and Nuisance Parameters

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Theory and Methods of Statistical Inference. PART I Frequentist theory and methods

Remarks on Improper Ignorance Priors

Parametric Techniques

If we want to analyze experimental or simulated data we might encounter the following tasks:

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours

Primer on statistics:

BTRY 4830/6830: Quantitative Genomics and Genetics

A Very Brief Summary of Bayesian Inference, and Examples

Statistical Theory MT 2006 Problems 4: Solution sketches

Conditional confidence interval procedures for the location and scale parameters of the Cauchy and logistic distributions

Noninformative Priors for the Ratio of the Scale Parameters in the Inverted Exponential Distributions

Nonparametric Bayesian Methods - Lecture I

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

CS 540: Machine Learning Lecture 2: Review of Probability & Statistics

Statistical Theory MT 2007 Problems 4: Solution sketches

Institute of Actuaries of India

Uncertain Inference and Artificial Intelligence

Theory and Methods of Statistical Inference

STATISTICS SYLLABUS UNIT I

Statistics. Lecture 4 August 9, 2000 Frank Porter Caltech. 1. The Fundamentals; Point Estimation. 2. Maximum Likelihood, Least Squares and All That

Statistics and Data Analysis

Statistical Methods for Particle Physics (I)

ASSESSING A VECTOR PARAMETER

Machine Learning 4771

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Master s Written Examination

DA Freedman Notes on the MLE Fall 2003

Bayesian vs frequentist techniques for the analysis of binary outcome data

The comparative studies on reliability for Rayleigh models

Default priors and model parametrization

Journeys of an Accidental Statistician

A REVERSE TO THE JEFFREYS LINDLEY PARADOX

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

The formal relationship between analytic and bootstrap approaches to parametric inference

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

Statistical Inference on Constant Stress Accelerated Life Tests Under Generalized Gamma Lifetime Distributions

Physics 403. Segev BenZvi. Credible Intervals, Confidence Intervals, and Limits. Department of Physics and Astronomy University of Rochester

E. Santovetti lesson 4 Maximum likelihood Interval estimation

Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs

A Discussion of the Bayesian Approach

COS513 LECTURE 8 STATISTICAL CONCEPTS

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Probability and Estimation. Alan Moses

PRINCIPLES OF STATISTICAL INFERENCE

COPYRIGHTED MATERIAL CONTENTS. Preface Preface to the First Edition

arxiv: v1 [hep-ex] 9 Jul 2013

Transcription:

Confidence distributions in statistical inference Sergei I. Bityukov Institute for High Energy Physics, Protvino, Russia Nikolai V. Krasnikov Institute for Nuclear Research RAS, Moscow, Russia Motivation A bit of history Plan Confidence distributions (CD and acd) Examples and inferential information contained in a CD Inference: a brief summary Combination of CDs Conclusion References

Motivation (I) Common sense If we have the procedure which states the one-toone conformity between the observed value of random variable and the confidence interval of any level of significance then we can reconstruct the confidence density of the parameter and, correspondingly, the confidence distribution by single way. The confidence distribution is interpreted here in the same way as confidence interval. From the duality between testing and confidence interval estimation, the cumulative confidence distribution function for parameter µ evaluated at µ 0, F (µ 0 ), is the p value of testing H 0 : µ µ 0 against its one-side alternative. It is thus a compact format of representing the information regarding µ contained in the data and given the model. Before the data have been observed, the confidence distribution is a stochastic element with quantile intervals (F 1 (α 1 ), F 1 (1 α 2 )) which covers the unknown parameter with probability 1 α 1 α 2. After having observed the data, the realized confidence distribution is not a distribution of probabilities in the frequentist sense, but of confidence attached to interval statements concerning µ. Note, the confidence distribution here is a frequentist concept and does not rely on a prior distribution.

Motivation (II) Construction by R.A. Fisher Example of the construction by R.A. Fisher (B. Efron (1978)) Random variable x with parameter µ x N (µ, 1). (1) Probability density function here is We can write ϕ(x µ) = 1 2π e (x µ)2 2. (2) x = µ + ɛ, (3) where ɛ N (0, 1) and µ is a constant. Let we have got ˆx realization of x. It is an unbias estimator of parameter µ, then µ = ˆx ɛ. (4) As known ( ɛ) N (0, 1), because of the symmetry of the bell-shaped curve about its central point, i.e. µ ˆx N (ˆx, 1). (5) It means that we construct the confidence density of the parameter ϕ(µ ˆx) = 1 2π e (ˆx µ)2 2 (6) by the single way for each value of ˆx.

Motivation (III) The presence of invariant The construction above is direct consequence of the identity ˆx α 1 ϕ(x ˆx)dx + ˆx+α 2 ˆx α 1 ϕ(µ ˆx)dµ + ˆx+α 2 ϕ(x ˆx)dx = 1, (7) where ˆx is the observed value of random variable x, ˆx α 1 and ˆx + α 2 are bounds of confidence interval for location parameter µ. The presence of the identities of such type (Eq.7) is a property of statistically self-dual distributions (S. Bityukov, V. Taperechkina, V. Smirnova (2004); S. Bityukov, N. Krasnikov (2005)): normal and normal ϕ(x µ, σ) = ϕ(µ x, σ) = 1 2πσ e (x µ)2 2σ 2, σ = const Cauchy and Cauchy f(x µ, b) = f(µ x, b) = Laplace and Laplace b π(b 2 + (x µ) 2 ), b = const f(x µ, b) = f(µ x, b) = 1 2b e x µ b, b = const These identities allow to reconstruct the corresponding confidence densities by the single way.

Motivation (IV) The invariant in the case of asymmetric distributions In the case of Poisson and Gamma-distributions we also can exchange the parameter and random variable, conserving the same formula for the distribution of probabilities: f(i µ) = f(µ i) = µi e µ In this case (here estimator of parameter with bias) we can use another identity as invariant for reconstruction of confidence density (S. Bityukov, N. Krasnikov, V. Taperechkina (2000); S. Bityukov, N Krasnikov (2002)) by the single way (any another reconstruction is inconsistent with the identity and, correspondingly, breaks the conserving of probability): i! i=ˆx+1 µ i 1e µ 1 i! + µ µˆx 2 e µ µ 1 ˆx! dµ + ˆx i=0 µ i 2e µ 2 i! = 1 (8) for any real µ 1 0 and µ 2 0 and non-negative integer ˆx, i.e. i=ˆx+1 f(i µ 1 ) + µ 2 µ 1 f(µ ˆx)dµ + ˆx i=0 f(i µ 2 ) = 1, (9) where f(i µ) = f(µ i) = µi e µ. f(µ i) is a pdf of i! Gamma-distribution Γ 1,i+1 and ˆx usually is a number of observed events.

A bit of history Point estimators, confidence intervals and p-values have long been fundamental tools for frequentist statisticians. Confidence distributions (CDs), which can be viewed as distribution estimators, are often convenient devices for constructing all the above statistical procedures plus more. The basic notion of CDs traces back to the fiducial distribution of Fisher (1930); however, it can be viewed as a pure frequentist concept. Indeed, as pointed out in Schweder, Hjort (2002) the CD concept is Neymannian interpretation of Fisher s fiducial distribution [Neyman (1941)]. Its development has proceeded from Fisher (1930) though recent contributions, just to name a few, of Efron (1993; 1998), Fraser (1991; 1996), Lehmann (1993), Singh, Xie, Strawderman (2001; 2005; 2007), Schweder, Hjort (2002) and others. In the works (Bityukov, Krasnikov (2002); Bityukov, Taperechkina, Smirnova (2004)) the approach for reconstruction of the confidence distribution densities by the using of the corresponding identities is developed. Recently (Bickel (2006)), the method for incorporating expert knowledge into frequentist approach by combining generalized confidence distributions is proposed. So, this methodology is in active progressing.

Confidence distributions (I) Suppose X 1, X 2,..., X n are n independent random draws from a population F and χ is the sample space corresponding to the data set X n = (X 1, X 2,..., X n ) T. Let θ be a parameter of interest associated with F (F may contain other nuisance parameters), and let Θ be the parameter space. Definition 1 (Singh, Xie, Strawderman (2005)): A function H n ( ) = H n (X n, ( )) on χ Θ [0, 1] is called a confidence distribution (CD) for a parameter θ if (i) for each given X n χ, H n ( ) is a continuous cumulative distribution function; (ii) at the true parameter value θ = θ 0, H n (θ 0 ) = H n (X n, θ 0 ), as a function of the sample X n, has the uniform distribution U(0, 1). The function H n ( ) is called an asymptotic confidence distribution (acd) if requirement (ii) above is replaced by (ii) : at θ = θ 0, H n (X n, θ 0 ) W U(0, 1) as n +, and the continuity requirment on H n ( ) is dropped. We call, when it exists, h n (θ) = H n(θ) a confidence or CD density. Item (i) basically requires the function H n ( ) to be a distribution function for each given sample. Item (ii) basically states that the function H n ( ) contains the right amount of information about the true θ 0.

Confidence distributions (II) It follows from the definition of CD that if θ < θ 0, H n (θ) sto 1 H n (θ), and if θ > θ 0, 1 H n (θ) sto H n (θ). Here sto is a stochastic comparison between two random variables; i.e. for two random variable Y 1 and sto Y 2, Y 1 Y 2, if P (Y 1 t) P (Y 2 t) for all t. Thus a CD works, in a sense, like a compass needle. It points towards θ 0, when placed at θ θ 0, by assigning more mass stochastically to that side (left or right) of θ that contains θ 0. When placed at θ 0 itself, H n (θ) = H n (θ 0 ) has the uniform U[0, 1] distribution and thus it is noninformative in direction. For every α in (0, 1), let (, ξ n (α)] be a 100α% lowerside confidence interval, where ξ n (α) = ξ n (X n, α) is continuous and increasing in α for each sample X n. Then H n ( ) = ξn 1 ( ) is a CD in the usual Fisherian sense. In this case, {X n : H n (θ) α} = {X n : θ ξ n (α)} (10) for any α in (0, 1) and θ in Θ R. Thus, at θ = θ 0, P r{h n (θ 0 ) α} = α and H n (θ 0 ) is U(0, 1) distributed. Definition 1 is very convenient for the purpose of verifying if a particular function is a CD or an acd.

Confidence distributions (III) There are another definition of confidence distribution. Consider data X randomly drawn from a distribution parametrized in part by θ, the scalar parameter with true value θ 0. Define a distribution generator as a function T that maps X to a distribution of θ. Definition 2 (Bickel (2006)): If F T (X) (θ), the corresponding cumulative distribution function, is uniform U(0, 1) when θ = θ 0, then T (X) is called a confidence distribution for θ. It follows that, for continuous data, the (1 α 1 α 2 )100% confidence interval (FT 1 (X) (α 1), FT 1 (X) (1 α 2)) has exact coverage in the sense that it includes the true value of the parameter with relative frequency (1 α 1 α 2 ) over an infinite number of realizations of X and, if random, T. Here α 1 + α 2 1, α 1 0, α 2 0.

Examples and inferential information contained in a CD (I) Example 1 Normal mean and variance (Singh, Xie, Strawderman (2005)): Suppose X 1, X 2,..., X n is a sample from N (µ, σ 2 ), with both µ and σ 2 unknown. A CD for µ is H n (y) = F tn 1 ( y X s n / n ), where X and s 2 are, respectively, the sample mean and variance, and F tn 1 ( ) is a cumulative distribution function of the Student t n 1 -distribution. A CD for σ 2 is H n (y) = 1 F χ 2 n 1 ( (n 1)s2 n ) for y 0, y where F χ 2 n 1 ( ) is the cumulative function of the χ 2 n 1 - distribution. Example 2 p-value function (Singh, Xie, Strawderman (2005)): For any given θ, let p n ( θ) = p n (X n, θ) be a p-value for a one-sided test K 0 : θ θ versus K 0 : θ > θ. Assume that the p-value is available for all θ. The function p n ( ) is called a p-value function. Typically, at the true value θ = θ 0, p n (θ 0 ) as a function of X n is exactly (or asymptotically) U(0, 1)-distributed. Also, H n ( ) = p n ( ) for every fixed sample is almost always a cumulative distribution function. Thus, usually p n ( ) satisfies the requirements for a CD(or acd).

Examples and inferential information contained in a CD (II) Example 3 Likelihood functions (Singh, Xie, Strawderman (2005)): There is a connection between the concepts of acd and various types of likelihood functions. In an exponential family, both the profile likelihood and the implied likelihood (Efron(1993)) are acd densities after a normalization. The work (Singh, Xie, Strawderman (2001)) provided a formal proof, with some specific conditions, which shows that e l n (θ) is proportional to an acd density for the parameter θ, where l n (θ) = l n(θ) l n (ˆθ), l n (θ) is the log-profile likelihood function, and ˆθ is the maximum likelihood estimator of θ. A CD contains a wealth of information, somewhat comparable to, but different than, a Bayesian posterior distribution. A CD (or acd) derived from a likelihood function can also be interpreted as an objective Bayesian posterior.

Inference: a brief summary (Singh, Xie, Strawderman (2005)) Confidence interval. From the definition, it is evident that the intervals (, Hn 1 (1 α)], [Hn 1 (α), + ) and (Hn 1 (α/2), H 1 n (1 α/2)) provide 100(1 α)%-level confidence intervals of different kinds for θ, for any α (0, 1). The same is true for an acd, where the confidence level is achieved in limit. Point estimation. Natural choices of point estimators of the parameter θ, given H n (θ), include the median M n = Hn 1(1/2), the mean θ = tdh n(t) and the maximum point of the CD density ˆθ = arg max θ h n (θ), h n (θ) = H n(θ). Hypothesis testing. From a CD, one can obtain p- values for various hypothesis testing problems. The work (Fraser (1991)) developed some results on such a topic through p-value functions. The natural line of thinking is to measure the support that H n ( ) lends to a null hypothesis K 0 : θ C. There are possible two types of support: 1. Strong-support p s (C) = C dh n(θ). 2. Weak-support p w (C) = sup θ C 2min(H n (θ), 1 H n (θ)). If K 0 is of the type (, θ 0 ] or [θ 0, + ) or a union of finitely many intervals, the strong-support p s (C) leads to the classical p-values. If K 0 is a singleton, that is, K 0 is θ = θ 0, then the weak-support p w (C) leads to the classical p-values.

Combination of CDs (Singh, Xie, Strawderman (2005)) The notion of a CD (or acd) is attractive for the purpose of combining information. The main reasons are that there is a wealth of information on θ inside a CD, the concept of CD is quite broad, and the CDs are relatively easy to construct and interpret. Let H 1 (y),..., H L (y) be L independent CDs, with the same true parameter value θ 0. Suppose g c (u 1,..., u L ) is any continuous function from [0, 1] L to R that is monotonic in each coordinate. A general way of combining, depending on g c (u 1,..., u L ) can be described as follows: Define H c (u 1,..., u L ) = G c (g c (u 1,..., u L )), where G c ( ) is the continuous cumulative distribution function of g c (u 1,..., u L ), and U 1,..., U L are independent U(0, 1) distributed random variables. Denote H c (y) = H c (H 1 (y),..., H L (y)). It is easy to verify that H c (y) is a CD function for the parameter θ. H c (y) is a combined CD. Let F 0 ( ) be any continuous cumulative distribution function and F 1 0 ( ) be its inverse function. A convenient special case of the function g c is g c (u 1,..., u L ) = F 1 0 (u 1 ) +... + F 1 0 (u L ). In this case, G c ( ) = F 0... F 0 ( ), where stands for convolution. Just like the p-value combination approach, this general CD combination recipe is simple and easy to implement. For example, let F 0 (t) = Φ(t) is the cumulative distribution function of the standard normal. In this case H NM (y) = Φ( 1 L [Φ 1 (H 1 (y)) +... + Φ 1 (H L (y))])

Conclusion The notion of confidence distribution, an enterely frequentist concept, is in essence a Neymanian interpretation of Fisher s fiducial distribution. It contains information related to every kind of frequentist inference. The confidence distribution is a direct generalization of the confidence interval, and is a useful format of presenting statistical inference. The follow quotation from Efron(1998) on Fisher s contribution of the fiducial distribution seems quite relevant in the context of CDs:... but here is a safe prediction for the 21st century: statisticians will be asked to solve bigger and more complicated problems. I believe there is a good chance that objective Bayes methods will be developed for such problem, and that something like fiducial inference will play an important role in this development. Maybe Fisher s biggest blunder will become a big hit in the 21st centure! We are grateful to Bob Cousins, Vladimir Gavrilov, Vassili Kachanov, Louis Lyons, Victor Matveev and Nikolai Tyurin for interest and support of this work. We thank Prof. Kesar Singh and Prof. Minge Xie for helpful and constructive comments. We also thank Yuri Gouz, Alexandre Nikitenko and Claudia Wulz for very useful discussions.

References D.R. Bickel (2006), Incorporating expert knowledge into frequentist interface by combining generalized confidence distributions, e-print: math/0602377, 2006. S.I. Bityukov, N.V. Krasnikov, V.A. Taperechkina (2000), Confidence intervals for Poisson distribution parameter, Preprint IFVE 2000-61, Protvino, 2000; also, e-print: hep-ex/0108020, 2001. S.I.Bityukov (2002), Signal Significance in the Presence of Systematic and Statistical Uncertainties, JHEP 09 (2002) 060; e-print: hep-ph/0207130; S.I. Bityukov, N.V. Krasnikov, NIM A502 (2003) 795-798. S.I. Bityukov, V.A. Taperechkina, V.V. Smirnova (2004), Statistically dual distributions and estimation of the parameters, e-print: math.st/0411462, 2004. S.I. Bityukov, N.V. Krasnikov (2005), Statistically dual distributions and conjugate families, in Proceedings of MaxEnt 05, August 2005, San Jose, CA, USA. B. Efron (1978), Controversies in the Foundations of Statistics, The American Mathematical Monthly, 85(4) (1978) 231-246. B. Efron (1993), Bayes and likelihood calculations from confidence intervals. Biometrika 80 (1993) 3-26. MR1225211. B. Efron (1998), R.A. Fisher in the 21st Century. Stat.Sci. 13 (1998) 95-122. R.A. Fisher (1930), Inverse probability. Proc. of the Cambridge Philosophical Society 26 (1930) 528-535. D.A.S. Fraser (1991), Statistical inference: Likelihood to significance. J. Amer. Statist. Assoc. 86 (1991) 258-265. MR1137116. D.A.S. Fraser (1996), Comments on Pivotal inference and the fiducial argument, by G. A. Barnard. Internat. Statist. Rev. 64 (1996) 231-235. E.L. Lehmann (1993). The Fisher, Neyman-Pearson theories of testing hypotheses:one theory or two? J. Amer. Statist. Assoc. 88 (1993) 1242-1249. MR1245356. J. Neyman (1941), Fiducial argument and the theory of confidence intervals. Biometrika 32 (1941) 128-150. MR5582. T. Schweder and N.L. Hjort (2005), Confidence and likelihood, Scand. J. Statist. 29 (2002) 309-332.

K. Singh, M. Xie, W. Strawderman. (2001), Confidence distributions - concept, theory and applications, Technical report, Dept.Statistics, Rutgers Univ., Revised 2004 K. Singh, M. Xie, W.E. Strawderman (2005), Combining information from independent sources through confidence distributions, The Annals of Statistics 33 (2005) 159-183. K. Singh, M. Xie, W. Strawderman. (2007), Confidence distributions (CD) - Distribution Estimator of a Parameter. Yehuda Vardi Volume. IMS-LNMS. (to appear).