Good Confidence Intervals for Categorical Data Analyses. Alan Agresti

Similar documents
Pseudo-score confidence intervals for parameters in discrete statistical models

PROD. TYPE: COM. Simple improved condence intervals for comparing matched proportions. Alan Agresti ; and Yongyi Min UNCORRECTED PROOF

Reports of the Institute of Biostatistics

Modeling and inference for an ordinal effect size measure

Categorical Data Analysis Chapter 3

Inference for Binomial Parameters

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Categorical data analysis Chapter 5

Multivariate Extensions of McNemar s Test

Links Between Binary and Multi-Category Logit Item Response Models and Quasi-Symmetric Loglinear Models

n y π y (1 π) n y +ylogπ +(n y)log(1 π).

Categorical Variables and Contingency Tables: Description and Inference

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

Testing Independence

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

CDA Chapter 3 part II

STAT 7030: Categorical Data Analysis

In Defence of Score Intervals for Proportions and their Differences

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Multinomial Logistic Regression Models

Solutions for Examination Categorical Data Analysis, March 21, 2013

Generalized Linear Models for Non-Normal Data

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links

Ordinal Variables in 2 way Tables

Cohen s s Kappa and Log-linear Models

Unit 9: Inferences for Proportions and Count Data

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

BIOS 625 Fall 2015 Homework Set 3 Solutions

Review of One-way Tables and SAS

Unit 9: Inferences for Proportions and Count Data

The material for categorical data follows Agresti closely.

DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS

One-sample categorical data: approximate inference

Previous lecture. Single variant association. Use genome-wide SNPs to account for confounding (population substructure)

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STAT 705: Analysis of Contingency Tables

8 Nominal and Ordinal Logistic Regression

Lecture 01: Introduction

MODELING AND INFERENCE FOR AN ORDINAL EFFECT SIZE MEASURE

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

STA6938-Logistic Regression Model

Testing for independence in J K contingency tables with complex sample. survey data

Longitudinal Modeling with Logistic Regression

Charles Geyer University of Minnesota. joint work with. Glen Meeden University of Minnesota.

Chapter 1. Modeling Basics

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Repeated ordinal measurements: a generalised estimating equation approach


You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

Generalized Linear Models

Review of Statistics 101

Generalized Linear Modeling - Logistic Regression

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Confidence intervals for a binomial proportion in the presence of ties

ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS

Logistic regression: Miscellaneous topics

COPYRIGHTED MATERIAL. Introduction CHAPTER 1

A Simple Approximate Procedure for Constructing Binomial and Poisson Tolerance Intervals

Weizhen Wang & Zhongzhan Zhang

Testing and Model Selection

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses

Stat 5101 Lecture Notes

STAT5044: Regression and Anova

Approximate and Fiducial Confidence Intervals for the Difference Between Two Binomial Proportions

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Chapter 5: Logistic Regression-I

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study

Statistical Data Analysis Stat 3: p-values, parameter estimation

LOGISTIC REGRESSION Joseph M. Hilbe

Asymptotic equivalence of paired Hotelling test and conditional logistic regression

Textbook Examples of. SPSS Procedure

Confidence Intervals of the Simple Difference between the Proportions of a Primary Infection and a Secondary Infection, Given the Primary Infection

3 Way Tables Edpsy/Psych/Soc 589

LOGISTICS REGRESSION FOR SAMPLE SURVEYS

E509A: Principle of Biostatistics. GY Zou

Generalized Models: Part 1

Bayesian performance

Estimation and sample size calculations for correlated binary error rates of biometric identification devices

Discrete Multivariate Statistics

Linear Regression Models P8111

High-Throughput Sequencing Course

Session 3 The proportional odds model and the Mann-Whitney test

Describing Stratified Multiple Responses for Sparse Data

Sections 2.3, 2.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1 / 21

Lecture 5: LDA and Logistic Regression

Introduction. Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University

Fitting stratified proportional odds models by amalgamating conditional likelihoods

Log-linear Models for Contingency Tables

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk

Short Course Introduction to Categorical Data Analysis

Comparing p s Dr. Don Edwards notes (slightly edited and augmented) The Odds for Success

A New Confidence Interval for the Difference Between Two Binomial Proportions of Paired Data

Transcription:

Good Confidence Intervals for Categorical Data Analyses Alan Agresti Department of Statistics, University of Florida visiting Statistics Department, Harvard University LSHTM, July 22, 2011 p. 1/36

Outline of my talk Score test-based confidence interval (CI) as alternative to Wald, likelihood-ratio-test-based large-sample intervals Relation of score-test-based inference to Pearson chi-squared statistic, its application for variety of categorical data analyses Pseudo-score CI for model parameters based on generalized Pearson statistic comparing models Good small-sample exact pseudo-score CIs An adjusted Wald pseudo-score method performs well for simple cases (e.g., proportions and their difference) by approximating the score CI LSHTM, July 22, 2011 p. 2/36

Inverting tests to obtain CIs For parameter β, consider CIs based on inverting standard tests of H 0 : β = β 0 (95% CI is set of β 0 for which P -value > 0.05) Most common approach inverts one of three standard asymptotic chi-squared tests: Likelihood-ratio (Wilks 1938), Wald (1943), score (Rao 1948) For log likelihood L(β), denote maximum likelihood (ML) estimate by ˆβ score u(β) = L(β)/ β information ι(β) = E[ 2 L(β)/ β 2 ] LSHTM, July 22, 2011 p. 3/36

Wald, likelihood-ratio, score large-sample inference Wald test: [(ˆβ β 0 )/SE] 2 = (ˆβ β 0 ) 2 ι(ˆβ). e.g., 95% Wald CI is ˆβ ±1.96(SE) Likelihood-ratio (LR) statistic: 2[L(β 0 ) L(ˆβ)] Rao s score test statistic: [u(β 0 )] 2 ι(β 0 ) = [ L(β)/ β 0] 2 E[ 2 L(β)/ β 2 0 ] where the partial derivatives are evaluated at β 0. (For canonical GLMs, this is standardized sufficient stat.) The three methods are asymptotically equivalent under H 0 In practice, Wald inference popular because of simplicity, ease of forming it using software output LSHTM, July 22, 2011 p. 4/36

Examples of score-test-based inference Pearson chi-squared test of independence in two-way contingency table McNemar test for binary matched-pairs Mantel-Haenszel test of conditional independence for stratified 2 2 tables Cochran-Armitage trend test for several ordered binomials Y binomial(n,π), ˆπ = y/n. Test of H 0 : π = π 0 uses z = ˆπ π 0 N(0,1) null distribution (or π0 (1 π 0 )/n z2 χ 2 1 ). Inverting two-sided test gives Wilson CI for π (1927) (Wald 95% CI is ˆπ ±1.96 ˆπ(1 ˆπ)/n) LSHTM, July 22, 2011 p. 5/36

Wald CI for binomial parameter, (n = 15) Coverage Probability as a Function of pi for the 95% Wald Interval, When n = 15 1.00 Coverage Probability 0.95 0.90 0.85 0.80 0.0 0.2 0.4 0.6 0.8 binomial parameter pi LSHTM, July 22, 2011 p. 6/36

Score CI for binomial parameter, (n = 15) Coverage Probability as a Function of pi for the 95% Score Interval, When n = 15 1.00 Coverage Probability 0.95 0.90 0.85 0.80 0.0 0.2 0.4 0.6 0.8 pi LSHTM, July 22, 2011 p. 7/36

Examples of score inference: Not so well known Difference of proportions for independent binomial samples Consider H 0 : π 1 π 2 = β 0. Score statistic is square of z = (ˆπ 1 ˆπ 2 ) β 0 [ˆπ1 (β 0 )(1 ˆπ 1 (β 0 ))/n 1 ]+[ˆπ 2 (β 0 )(1 ˆπ 2 (β 0 ))/n 2 ], where ˆπ 1 and ˆπ 2 are sample proportions and ˆπ 1 (β 0 ) and ˆπ 2 (β 0 ) are ML estimates under constraint π 1 π 2 = β 0 (When β 0 = 0, z 2 = Pearson chi-squared for 2 2 table) Score CI for π 1 π 2 inverts this test (Mee 1984, Miettinen and Nurminen 1985) LSHTM, July 22, 2011 p. 8/36

Examples of score inference: Not well known (2) Score CI for odds ratio for 2 2 table {n ij } (Cornfield 1956): For given β 0, let {ˆµ ij (β 0 )} have same row and column margins as {n ij } and 95% CI = set of β 0 satisfying ˆµ 11 (β 0 )ˆµ 22 (β 0 ) ˆµ 12 (β 0 )ˆµ 21 (β 0 ) = β 0. X 2 (β 0 ) = (n ij ˆµ ij (β 0 )) 2 /ˆµ ij (β 0 ) 1.96 2 Likewise, score CI applies to relative risk, logistic regression parameters, generic measure of association (Lang 2008), but not found in standard software (Some R functions: www.stat.ufl.edu/ aa/cda/software.html) LSHTM, July 22, 2011 p. 9/36

Aside: 2 2 tables with no successes (meta-analyses) For significance tests (e.g., Cochran-Mantel-Haenszel and small-sample exact), no information about whether there is an association; data make no contribution to the tests. For estimation, no information about odds ratio or relative risk but there is about π 1 π 2 (i.e., impact on practical, not statistical, significance). Response Response Group Success Failure Success Failure 1 0 10 0 100 2 0 20 0 200 Score CIs for π 1 π 2 : ( 0.16,0.28), ( 0.02,0.04) Wald CIs for π 1 π 2 : (0.0,0.0), (0.0,0.0) Note: Not necessary to add constants to empty cells. LSHTM, July 22, 2011 p. 10/36

Relation of score statistic to Pearson chi-squared For counts {n i } for a multinomial model and testing goodness of fit using ML fit {ˆµ i } under H 0, the score test statistic is the Pearson chi-squared statistic X 2 = (n i ˆµ i ) 2 True for generalized linear models (Smyth 2003, Lovison 2005) When model refers to parameter β, inverting Pearson chi-squared test of H 0 : β = β 0 gives score CI e.g., 95% CI for β is set of β 0 for which X 2 χ 2 1,0.05 = (1.96)2 For small to moderate n, actual coverage probability closer to nominal level for score CI than Wald, usually LR ˆµ i LSHTM, July 22, 2011 p. 11/36

Evidence that score-test-based inference is good Testing independence in two-way tables (Koehler and Larntz 1980), LR stat. G 2 = 2 n ij log(n ij /ˆµ ij ) CI for binomial parameter (Newcombe 1998) CI for difference of proportions, relative risk, odds ratio, comparing dependent samples (Newcombe 1998, Tango 1998, Agresti and Min 2005) Multivariate comparisons of proportions for independent and dependent samples (Agresti and Klingenberg 2005, 2006) Simultaneous CIs comparing binomial proportions (Agresti, Bini, Bertaccini, Ryu 2008) CI for ordinal effect measures such as P(y 1 > y 2 ) (Ryu and Agresti 2008) LSHTM, July 22, 2011 p. 12/36

Score-test-based inference infeasible for some models Example: 2006 General Social Survey Data responses to How successful is the government in (1) Providing health care for the sick? (2) Protecting the environment? (1 = successful, 2 = mixed, 3= unsuccessful) y 2 = Environment y 1 = Health Care 1 2 3 Total 1 199 81 83 363 2 129 167 112 408 3 164 169 363 696 Total 492 417 558 1467 LSHTM, July 22, 2011 p. 13/36

A marginal model for multivariate data Cumulative logit marginal model for responses (y 1,y 2 ) logit[p(y 1 j)] = α j, logit[p(y 2 j)] = α j +β, j = 1,2. designed to detect location shift in marginal distributions. Multinomial likelihood in terms of cell probabilities {π ij = µ ij /n} and cell counts {n ij } L(π) π n 11 11 πn 12 12 πn 33 33 but model parameters refer to marginal probabilities. LSHTM, July 22, 2011 p. 14/36

LR and score inference comparing two models Contingency table {n i } with ML fitted values {ˆµ i } for a model and {ˆµ i0 } for simpler null model (e.g., with β = β 0 ): H 0 : Simpler model, H a : More complex model LR statistic (for multinomial sampling) is G 2 = 2 i ˆµ i log(ˆµ i /ˆµ i0 ). Rao (1961) suggested Pearson-type statistic, X 2 = i ˆµ i0 (ˆµ i ˆµ i0 ) 2 LSHTM, July 22, 2011 p. 15/36

Pseudo-score CI based on Pearson stat. (with E. Ryu) For hypothesis testing, G 2 nearly universal, X 2 mostly ignored since Haberman (1977) results on sparse asymptotics. Confidence intervals: Popular to obtain profile likelihood confidence intervals: If G 2 = G 2 (β 0 ) is LR stat. for H 0 : β = β 0, then 95% LR CI is {β 0 } such that G 2 (β 0 ) χ 2 1,0.05 e.g., in SAS, available with LRCI option in PROC GENMOD; in R, with confint() function applied to model object Since score CI often out-performs LR CI for simple discrete measures, as alternative to LR CI, could find pseudo-score CI {β 0 } such that X 2 (β 0 ) χ 2 1,0.05 Biometrika 2010 LSHTM, July 22, 2011 p. 16/36

Example: Cumulative logit marginal model Cumulative logit marginal model logit[p(y 1 j)] = α j, logit[p(y 2 j)] = α j +β, j = 1,2. Joe Lang (Univ. Iowa) has R function mph.fit for ML fitting of general class of models (JASA 2005, Ann. Statist. 2004) For various fixed β 0, need to fit model with that constraint (using offset), giving {ˆµ ij,0 } to compare to {ˆµ ij } in 3 3 table to find β 0 with X 2 (β 0 ) χ 2 1,0.05. 95% pseudo-score CI is (0.2898, 0.5162) Here, n large and results similar to LR CI of (0.2900, 0.5162) Simulation studies show pseudo-score often performs better for small n. LSHTM, July 22, 2011 p. 17/36

Pseudo-score inference for discrete data When independent {y i } for a GLM, a Pearson-type pseudo-score statistic is X 2 = i (ˆµ i ˆµ i0 ) 2 v(ˆµ i0 ) = (ˆµ ˆµ 0 ) ˆV 1 0 (ˆµ ˆµ 0 ), v(ˆµ i0 ) = estimated null variance of y i ˆV 0 = diagonal matrix containing v(ˆµ i0 ) (Lovison 2005) Binary response: v(ˆµ i0 ) = ˆµ i0 (1 ˆµ i0 ) Pseudo-score CI useful for discrete dist s other than binomial, multinomial, and for complex sample survey data (e.g., variances inflated from simple random sampling) and clustered data (Biometrika 2010) LSHTM, July 22, 2011 p. 18/36

Small-sample methods (not exact ) Using score (or other) stat., can use small-sample distributions (e.g., binomial), rather than large-sample approximations (e.g., normal), to obtain P-values and confidence intervals. Because of discreteness, error probabilities do not exactly equal nominal values. For CI, inverting test with actual size.05 for all β 0 guarantees actual coverage probability 0.95. Inferences are conservative actual error probabilities 0.05 nominal level. Actual coverage prob varies for different β values and is unknown in practice. Example: Binomial (n,π) with n = 5 LSHTM, July 22, 2011 p. 19/36

Large-sample score CI vs. small-sample CI (n = 5) Coverage Probability 1.00 0.95 0.90 0.85 Small Sample CI Large Sample Score 0.80 π 0.0 0.2 0.4 0.6 0.8 1.0 LSHTM, July 22, 2011 p. 20/36

Examples of small-sample CIs (95%) Use tail method: Invert two separate one-sided tests each of size 0.025. (P-value = double the minimum tail probability.) 1. Binomial parameter π Clopper and Pearson (1934) suggest solution (π L,π U ) to n k=t obs ( n k ) π k L(1 π L ) n k = 0.025 and t obs k=0 ( n k ) π k U(1 π U ) n k = 0.025 LSHTM, July 22, 2011 p. 21/36

Examples of small-sample CIs (2) 2. Logistic regression parameter For subject i with binary outcome y i, explanatory variables (x i0 = 1,x i1,x i2,...,x ik ) Model: logit[p(y i = 1)] = β 0 +β 1 x i1 + +β k x ik Use dist. of score stat. after eliminating nuisance para. by conditioning on sufficient stat s (T j = i y ix ij for β j ) e.g., Bounds (β 1L,β 1U ) of 95% CI for β 1 satisfy P(T 1 t 1,obs t 0,t 2,...,t k ;β 1L ) = 0.025 P(T 1 t 1,obs t 0,t 2,...,t k ;β 1U ) = 0.025 LSHTM, July 22, 2011 p. 22/36

Randomizing Eliminates Conservatism For testing H 0 : β = β 0 against H a : β > β 0 using a test stat. T, a randomized test has P -value P β0 (T > t obs )+U P β0 (T = t obs ) where U is a uniform(0,1) random variable. To construct CI with actual coverage probability 0.95, and P βu (T < t obs )+U P βu (T = t obs ) = 0.025 P βl (T > t obs )+(1 U) P βl (T = t obs ) = 0.025. LSHTM, July 22, 2011 p. 23/36

Use randomized methods in practice? Randomized CI suggested by Stevens (1950), for binomial parameter Pearson (1950): Statisticians may come to accept randomization after performing experiment just as they accept randomization before the experiment. Stevens (1950): We suppose that most people will find repugnant the idea of adding yet another random element to a result which is already subject to the errors of random sampling. But what one is really doing is to eliminate one uncertainty by introducing a new one.... It is because this uncertainty is eliminated that we no longer have to keep on the safe side, and can therefore reduce the width of the interval. LSHTM, July 22, 2011 p. 24/36

Mid-P Pseudo-Score Approach Mid-P-value (Lancaster 1949, 1961): Count only (1/2)P β0 (T = t obs ) in P-value; e.g., for H a : β > β 0, P β0 (T > t obs )+(1/2)P β0 (T = t obs ). Unlike randomized P-value, depends only on data. Under H 0, ordinary P-value stochastically larger than uniform, E(mid-P-value)= 1/2. Sum of right-tail and left-tail P-values is 1+P β0 (T = t obs ) for ordinary P-value, 1 for mid-p-value. LSHTM, July 22, 2011 p. 25/36

CI based on mid-p-value Mid-P CI based on inverting tests using mid-p-value: P βl (T > t obs )+(1/2) P βl (T = t obs ) = 0.025. P βu (T < t obs )+(1/2) P βu (T = t obs ) = 0.025. Coverage prob. not guaranteed 0.95, but mid-p CI tends to be a bit conservative. For binomial, how do Clopper Pearson and mid-p CI behave as n increases? (from Agresti and Gottard 2007) LSHTM, July 22, 2011 p. 26/36

Clopper-Pearson ( ) and mid-p (- -) CIs forπ = 0.50 coverage 0.92 0.94 0.96 0.98 1.00 0 50 100 150 200 n LSHTM, July 22, 2011 p. 27/36

Simple approximations to score CIs often work well Example: Binomial proportion Finding all π 0 such that ˆπ π 0 π 0 (1 π 0 ) n < 2 provides 95% score CI of form M ±2s (approximating 1.96 by 2) with M = ( ) n ˆπ + n+4 ( ) 4 1 n+4 2 = t obs +2 n+4 s 2 = 1 [ ( ) n ˆπ(1 ˆπ) n+4 n+4 + 1 2 ( )] 1 4 2 n+4 LSHTM, July 22, 2011 p. 28/36

Adjusted Wald CI approximates score CI For 95% CI, this suggests an adjusted Wald CI (plus 4 CI) with π = t obs+2 n+4 and ñ = n+4 π ±2.0 π(1 π)/ñ Midpoint same as 95% score CI, but wider (Jensen s inequality) In fact, simple adjustments of Wald improve performance dramatically and give performance similar to score CI: Proportion: Add 2 successes and 2 failures before computing Wald CI (Agresti and Coull 1998) Difference: Add 2 successes and 2 failures before computing Wald CI (Agresti and Caffo 2000) Paired Difference: Add 2 successes and 2 failures before computing Wald CI (Agresti and Min 2005) LSHTM, July 22, 2011 p. 29/36

Clopper-Pearson, Wald, and Plus 4 CI (n = 10) Coverage probabilities for 95% confidence intervals for a binomial parameter π with n=10. Coverage Probability 1.00 0.95 0.90 0.85 0.80 0.75 Clopper Pearson Wald Adjusted Wald 0.70 π 0.0 0.2 0.4 0.6 0.8 1.0 LSHTM, July 22, 2011 p. 30/36

Coverage Probability Coverage Probability Coverage Probability 1.00 1.00 1.00.95.95.95 95%.90.90.90.85.85.85.80.80.80.75.75.75.70 0.2.4.6.8 1 pi.70 0.2.4.6.8 1 pi.70 0.2.4.6.8 1 pi Wald Adjusted Coverage Probability Coverage Probability Coverage Probability 1.00 1.00 1.00.95.95.95 99%.90.90.90.85.85.85.80.80.80.75.75.75.70 0.2.4.6.8 1 pi.70 0.2.4.6.8 1 pi.70 0.2.4.6.8 1 pi n=5 n=10 n=20 LSHTM, July 22, 2011 p. 31/36

Some comparisons (95% CI) For all n tremendous improvement for π near 0 or 1. e.g., Brown, Cai, and Das Gupta (2001): n 0 required such that cov. prob. 0.94 for all n n 0 is π Wald Adjusted 0.01?? LSHTM, July 22, 2011 p. 32/36

Some comparisons (95% CI) For all n tremendous improvement for π near 0 or 1. e.g., Brown, Cai, and Das Gupta (2001): n 0 required such that cov. prob. 0.94 for all n n 0 is π Wald Adjusted 0.01 7963 1 LSHTM, July 22, 2011 p. 33/36

Good CIs shrink midpoints Poor performance of Wald intervals due to centering at ˆπ, (ˆπ 1 ˆπ 2 ) rather than being too short. Wald CI has greater length than adjusted intervals unless parameters near boundary of parameter space. Intervals resulting from Bayesian approach can also perform well in frequentist sense. Single proportion: Brown et al. (2001) Comparing proportions: Agresti and Min (2005). Using independent Jeffreys beta(0.5, 0.5) priors gives frequentist performance similar to score CI Many score CIs, mid-p corrections, Bayes CIs available in R at www.stat.ufl.edu/ aa/cda/software.html LSHTM, July 22, 2011 p. 34/36

Summary Full model saturated: Score confidence interval inverts goodness-of-fit test using Pearson chi-squared statistic Full model unsaturated: Pseudo-score method of inverting Pearson test comparing fitted values available when ordinary score CI infeasible and may perform better than profile likelihood CI for small n Good small-sample CI inverts score test with mid-p -value For proportions and their differences, pseudo-score method that adjusts Wald method by adding 4 observations performs well LSHTM, July 22, 2011 p. 35/36

Some references Agresti, A., and Coull, B. (1998). Approximate better than exact for CIs for binomial parameters, American Statistician, 52: 119-126. Agresti, A., and Min, Y. (2001). On small-sample confidence intervals for parameters in discrete distributions, Biometrics 57: 963-971. Agresti, A., and Gottard, A. (2007). Nonconservative exact small-sample inference for discrete data, Comput. Statist. & Data Anal. 51: 6447-6458. Agresti, A., and Ryu, E. (2010). Pseudo-score inference for parameters in discrete statistical models. Biometrika 97: 215-222. Bera, A. K., and Bilias, Y. (2001). Rao s score, Neyman s C(α) and Silvey s LM tests. Journal of Statistical Planning and Inference 97, 9-44. Lovison G. (2005). On Rao score and Pearson X 2 statistics in generalized linear models. Statistical Papers 46, 555-574. Rao, C. R. (1961). A study of large sample test criteria through properties of efficient estimates. Sankhya A23, 25-40. Smyth, G. K. (2003). Pearson s goodness of fit statistic as a score test statistic. In Science and Statistics: A Festschrift for Terry Speed, IMS Lecture Notes Monograph Series. LSHTM, July 22, 2011 p. 36/36