Charles Geyer University of Minnesota. joint work with. Glen Meeden University of Minnesota.

Similar documents
Fuzzy and Randomized Confidence Intervals and P-values

Fuzzy and Randomized Confidence Intervals and P-values

Two examples of the use of fuzzy set theory in statistics. Glen Meeden University of Minnesota.

Fuzzy and Randomized Confidence Intervals and P -values

Lawrence D. Brown, T. Tony Cai and Anirban DasGupta

Stat 5421 Lecture Notes Fuzzy P-Values and Confidence Intervals Charles J. Geyer March 12, Discreteness versus Hypothesis Tests

Design of the Fuzzy Rank Tests Package

Testing Statistical Hypotheses

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Chapter 4. Theory of Tests. 4.1 Introduction

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

Derivation of Monotone Likelihood Ratio Using Two Sided Uniformly Normal Distribution Techniques

Testing Statistical Hypotheses

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Plugin Confidence Intervals in Discrete Distributions

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

Math 494: Mathematical Statistics

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

Solution: First note that the power function of the test is given as follows,

TOLERANCE INTERVALS FOR DISCRETE DISTRIBUTIONS IN EXPONENTIAL FAMILIES

Weizhen Wang & Zhongzhan Zhang

Visual interpretation with normal approximation

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology

Spring 2012 Math 541B Exam 1

A3. Statistical Inference Hypothesis Testing for General Population Parameters

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

Ch. 5 Hypothesis Testing


Approximate and Fiducial Confidence Intervals for the Difference Between Two Binomial Proportions

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti

STA 732: Inference. Notes 2. Neyman-Pearsonian Classical Hypothesis Testing B&D 4

A proportion is the fraction of individuals having a particular attribute. Can range from 0 to 1!

Statistics for Applications Spring 2009

Hypothesis testing: theory and methods

A Very Brief Summary of Statistical Inference, and Examples

Unobservable Parameter. Observed Random Sample. Calculate Posterior. Choosing Prior. Conjugate prior. population proportion, p prior:

Parameter Estimation, Sampling Distributions & Hypothesis Testing

Fundamental Probability and Statistics

A New Confidence Interval for the Difference Between Two Binomial Proportions of Paired Data

One-sample categorical data: approximate inference

STAT 830 Hypothesis Testing

Lecture 23: UMPU tests in exponential families

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Hypotheses testing as a fuzzy set estimation problem

Hypothesis Testing. BS2 Statistical Inference, Lecture 11 Michaelmas Term Steffen Lauritzen, University of Oxford; November 15, 2004

How to Attain Minimax Risk with Applications to Distribution-Free Nonparametric Estimation and Testing 1

Reports of the Institute of Biostatistics

Introduction 1. STA442/2101 Fall See last slide for copyright information. 1 / 33

Modified Large Sample Confidence Intervals for Poisson Distributions: Ratio, Weighted Average and Product of Means

exp{ (x i) 2 i=1 n i=1 (x i a) 2 (x i ) 2 = exp{ i=1 n i=1 n 2ax i a 2 i=1

Review. December 4 th, Review

Summary of Chapters 7-9

THE SHORTEST CLOPPER PEARSON RANDOM- IZED CONFIDENCE INTERVAL FOR BINOMIAL PROBABILITY

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Taiwan Published online: 11 Sep To link to this article:

Hypothesis Testing. A rule for making the required choice can be described in two ways: called the rejection or critical region of the test.

simple if it completely specifies the density of x

8 Basics of Hypothesis Testing

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Statistical hypothesis testing The parametric and nonparametric cases. Madalina Olteanu, Université Paris 1

Institute of Actuaries of India

Psychology 282 Lecture #4 Outline Inferences in SLR

Coverage Properties of Confidence Intervals for Proportions in Complex Sample Surveys

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

Confidence intervals for a binomial proportion in the presence of ties

Review Quiz. 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the

F79SM STATISTICAL METHODS

AMCS243/CS243/EE243 Probability and Statistics. Fall Final Exam: Sunday Dec. 8, 3:00pm- 5:50pm VERSION A

Application of Variance Homogeneity Tests Under Violation of Normality Assumption

Practice Problems Section Problems

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

STAT 830 Hypothesis Testing

Chapter 7. Hypothesis Testing

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

Statistics 135 Fall 2008 Final Exam

BEST TESTS. Abstract. We will discuss the Neymann-Pearson theorem and certain best test where the power function is optimized.

Hypothesis Testing - Frequentist

Topic 10: Hypothesis Testing

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

14.30 Introduction to Statistical Methods in Economics Spring 2009

This paper is not to be removed from the Examination Halls

557: MATHEMATICAL STATISTICS II HYPOTHESIS TESTING: EXAMPLES

Primer on statistics:

Some General Types of Tests

(DMSTT 01) M.Sc. DEGREE EXAMINATION, DECEMBER First Year Statistics Paper I PROBABILITY AND DISTRIBUTION THEORY. Answer any FIVE questions.

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Lecture 16 November Application of MoUM to our 2-sided testing problem

Mathematical statistics

Detection theory. H 0 : x[n] = w[n]

Topic 15: Simple Hypotheses

STATISTICS SYLLABUS UNIT I

On the GLR and UMP tests in the family with support dependent on the parameter

McGill University. Faculty of Science. Department of Mathematics and Statistics. Part A Examination. Statistics: Theory Paper

Chapter 6. Hypothesis Tests Lecture 20: UMP tests and Neyman-Pearson lemma

Loglikelihood and Confidence Intervals

Transcription:

Fuzzy Confidence Intervals and P -values Charles Geyer University of Minnesota joint work with Glen Meeden University of Minnesota http://www.stat.umn.edu/geyer/fuzz 1

Ordinary Confidence Intervals OK for continuous data, but a really bad idea for discrete data. Why? Coverage Probability γ(θ) = pr θ {l(x) < θ < u(x)} = x S I (l(x),u(x)) (θ) f θ (x) As θ moves across the boundary of a possible confidence interval (l(x), u(x)), the coverage probability jumps by f θ (x). Ideally, γ is a constant function equal to the nominal confidence coefficient. But that s not possible. 2

Binomial Example Binomial data, sample size n = 10, confidence interval calculated by R function prop.test coverage probability 0.94 0.95 0.96 0.97 0.98 0.99 1.00 0.0 0.2 0.4 0.6 0.8 1.0 θ 3

Recent Literature Agresti and Coull (Amer. Statist., 1998) Approximate is better than exact for interval estimation of binomial proportions. Brown, Cai, and DasGupta (Statist. Sci., 2001) Interval estimation for a binomial proportion (with discussion). Casella (Statist. Sci., 2001) Comment on Brown, et al. All recommend different intervals. All recommended intervals are bad, just slightly less bad than other possibilities. Ordinary confidence intervals for discrete data are irreparably bad. 4

Randomized Tests Randomized test defined by critical function φ(x, α, θ). observed data x. significance level α null hypothesis H 0 : θ = θ 0 Decision is randomized: reject H 0 with probability φ(x, α, θ 0 ). Since probabilities are between zero and one, so is φ(x, α, θ). Classical uniformly most powerful (UMP) and UMP unbiased (UMPU) tests are randomized when data are discrete. 5

Randomized Test Example Observe X Binomial(20, θ). Test H 0 : θ = 0.5 H 1 : θ < 0.5 Distribution of X under H 0. x f(x) F(x) 4 0.0046 0.0059 5 0.0148 0.0207 6 0.0370 0.0577 7 0.0739 0.1316 Nonrandomized test can have α = 0.0207 or α = 0.0577, but nothing in between. Randomized test that rejects with probability one when X 5 and with probability 0.7928 when X = 6 has α = 0.05. Pr(X 5) + 0.7928 Pr(X = 6) = 0.0207 + 0.7928 0.0370 = 0.0500 6

Fuzzy Sets Indicator function I A of ordinary set A I A (x) 1.0 0.0 x Membership function I B of fuzzy set B I B (x) 1.0 0.0 x Membership function I B (x) indicates degree to which x is to be considered to be in set B. Ordinary sets are special case of fuzzy sets called crisp sets. 7

Fuzzy Tests and Confidence Intervals For fixed α and θ 0, x φ(x, α, θ 0 ) is the fuzzy decision function for the size α test of H 0 : θ = θ 0. For fixed x and α, θ 1 φ(x, α, θ) is (the membership function of) the fuzzy confidence interval with coverage 1 α. For fixed x and θ 0, α φ(x, α, θ 0 ) is (the cumulative distribution function of) the fuzzy P -value for test of H 0 : θ = θ 0. 8

Binomial Example Sample size n = 10, fuzzy confidence interval associated with UMPU test, confidence level 1 α = 0.95. Data x = 0 (solid curve) x = 4 (dashed curve) and x = 9 (dotted curve). 1 φ(x, α, θ) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 θ 9

Binomial Example Sample size n = 10, fuzzy confidence interval associated with UMPU test, confidence level 1 α = 0.95. Data x = 0. 1 φ(x, α, θ) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 θ 10

Binomial Example Sample size n = 10, fuzzy confidence interval associated with UMPU test, confidence level 1 α = 0.95. Data x = 4. 1 φ(x, α, θ) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 θ 11

Binomial Example Sample size n = 10, fuzzy confidence interval associated with UMPU test, confidence level 1 α = 0.95. Data x = 9. 1 φ(x, α, θ) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 θ 12

Exactness UMP and UMPU tests are exact E θ {φ(x, α, θ)} = α, for all α and θ Fuzzy confidence intervals inherit exactness E θ {1 φ(x, α, θ)} = 1 α, for all α and θ 13

Fuzzy P -values For P -values to even be definable a test must have nested fuzzy critical regions, α 1 α 2 implies φ(x, α 1, θ) φ(x, α 2, θ). For any such test for discrete data and for any x and θ α φ(x, α, θ) ( ) is a continuous non-decreasing function that maps [0, 1] onto [0, 1]. So ( ) is the cumulative distribution function of a continuous random variable P, which we call the fuzzy P -value of the test. 14

Exactness Again UMP and UMPU tests are exact E θ {φ(x, α, θ)} = α, for all α and θ By definition of fuzzy P -value pr{p α X} = φ(x, α, θ 0 ) Hence fuzzy P -values also inherit exactness pr θ0 {P α} = E θ0 {pr{p α X}} = E θ0 {φ(x, α, θ 0 )} = α, 15

Probability Density Function Cumulative distribution of fuzzy P -value is α φ(x, α, θ 0 ) ( ) Probability density function is α α φ(x, α, θ 0) ( ) For UMP tests, fuzzy P -values are uniformly distributed on an interval. For UMPU tests, ( ) is piecewise linear and ( ) is piecewise constant (a step function). 16

Binomial Example Sample size n = 10, fuzzy P -value associated with UMPU test, null hypothesis θ 0 = 0.7, data x = 10. φ(x, α, θ) α 0.00 0.01 0.02 0.03 0.04 0.05 0.00 0.01 0.02 0.03 0.04 0.05 0.06 α 17

Fuzzy and Randomized Concepts Decisions Fuzzy decision reports φ(x, α, θ 0 ). Randomized decision generates Uniform(0, 1) random variate U, and reports reject H 0 if U < φ(x, α, θ 0 ) and accept H 0 otherwise. P -values Fuzzy P -value is the theoretical random variable having the cumulative distribution function α φ(x, α, θ 0 ). Randomized P -value is a realization (simulated value) of this random variable. 18

Fuzzy and Randomized Concepts (Continued) γ-cuts If I B is the membership function of a fuzzy set B, the γ-cut of B is the crisp set γ I B = { x : I B (x) γ }. Confidence Intervals Fuzzy confidence interval is the fuzzy set B with membership function I B (θ) = 1 φ(x, α, θ). Randomized confidence interval is the crisp set U I B, where U is uniform (0, 1) random variate. 19

Situations with UMP and UMPU Tests Binomial Poisson Negative Binomial Two Binomials p 1 (1 p 2 )/p 2 (1 p 1 ) Two Poissons µ 1 /µ 2 Two Negative Binomials (1 p 2 )/(1 p 1 ) McNemar p 12 /(p 12 + p 21 ) Fisher p 11 p 22 /p 12 p 21 20

Summary Fuzzy decisions, confidence intervals, and P -values based on UMP and UMPU tests are the Right Thing (exact and uniformly most powerful). Crisp confidence intervals are the Wrong Thing (for discrete data). UMP and UMPU for any exponential family with single parameter of interest. Fuzzy outside classical UMP and UMPU? Fuzzy or Randomized? 21

Appendix: UMP For one-parameter exponential family having canonical statistic T (X) and canonical parameter θ there exists UMP test with hypotheses H 0 = { ϑ : ϑ θ } H 1 = { ϑ : ϑ > θ } significance level α, and critical function φ(x, α, θ) = 1, T (x) > C γ, T (x) = C 0, T (x) < C where γ and C are determined by E θ {φ(x, α, θ)} = α. The analogous lower-tailed test is the same except that all inequalities are reversed. 22

Appendix: UMPU For one-parameter exponential family having canonical statistic T (X) and canonical parameter θ there exists UMPU test with hypotheses H 0 = { ϑ : ϑ = θ } H 1 = { ϑ : ϑ θ } significance level α, and critical function 1, T (x) < C 1 φ(x, α, θ) = γ 1, T (x) = C 1 0, C 1 < T (x) < C 2 γ 2, T (x) = C 2 1, C 2 < T (x) where C 1 C 2 and γ 1, γ 2, C 1, and C 2 are determined by E θ {φ(x, α, θ)} = α E θ {T (X)φ(X, α, θ)} = αe θ {T (X)} 23

Appendix: More Bad Crisp Intervals I Performance of usual (Wald) interval ˆp ± 1.96 ˆp(1 ˆp) n for Binomial(30, p). Dotted line is nominal level (0.95). Nominal 95% Wald Interval, n = 30 coverage probability 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 1.0 p 24

Appendix: More Bad Crisp Intervals II Performance of score interval p : ˆp p < 1.96 p(1 p) n for Binomial(30, p). level (0.95). Dotted line is nominal Nominal 95% Score Interval, n = 30 coverage probability 0.85 0.90 0.95 1.00 0.0 0.2 0.4 0.6 0.8 1.0 p 25

Appendix: More Bad Crisp Intervals III Performance of likelihood interval { p : 2 [ ln (ˆp) l n (p) ] < 1.96 2 } where l n is log likelihood for Binomial(30, p). Dotted line is nominal level (0.95). Nominal 95% Likelihood Interval, n = 30 coverage probability 0.85 0.90 0.95 1.00 0.0 0.2 0.4 0.6 0.8 1.0 p 26

Appendix: More Bad Crisp Intervals IV Performance of variance stabilized interval ( g 1 g(ˆp) ± 1.96 ) n where g(p) = 2 sin 1 ( p) for Binomial(30, p). Dotted line is nominal level (0.95). Nominal 95% Variance Stabilized Interval, n = 30 coverage probability 0.6 0.7 0.8 0.9 1.0 0.0 0.2 0.4 0.6 0.8 1.0 p 27

Appendix: More Bad Crisp Intervals V Performance of Clopper-Pearson exact interval for Binomial(30, p). Dotted line is nominal level (0.95). Nominal 95% Clopper Pearson Interval, n = 30 coverage probability 0.95 0.96 0.97 0.98 0.99 1.00 0.0 0.2 0.4 0.6 0.8 1.0 p 28

Appendix: An Early Randomized Interval Performance of Blyth-Hutchinson randomized interval for Binomial(30, p). Dotted line is nominal level (0.95). Would be exact except for rounding error. Randomization variate U rounded to one sig. fig. Interval endpoints rounded to two sig. fig. 95% Blythe Hutchinson Randomized Interval, n = 10 coverage probability 0.93 0.94 0.95 0.96 0.97 0.0 0.2 0.4 0.6 0.8 1.0 p 29

Appendix: Old Literature Blyth and Hutchinson (Biometrika, 1960) Table of Neyman-shortest unbiased confidence intervals for the binomial parameter. Lehmann and Scheffé (Sankyā, 1950, 1955) Completeness, similar regions, and unbiased estimation. Eudey (Berkeley Tech. Rept., 1949) On the treatment of discontinuous random variables. Wald (Econometrica, 1947) Foundations of a General Theory of Sequential Decision Functions. von Neumann and Morgenstern (1944) Theory of Games and Economic Behavior. 30