Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Similar documents
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Biometrika Trust. Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika.

Biometrika Trust. Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika.

LOGISTIC REGRESSION Joseph M. Hilbe

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

Mind Association. Oxford University Press and Mind Association are collaborating with JSTOR to digitize, preserve and extend access to Mind.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

Biometrika Trust. Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Poisson regression: Further topics

International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access to Biometrics.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

Semiparametric Generalized Linear Models

APPROXIMATE BAYESIAN SHRINKAGE ESTIMATION

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

Introduction to General and Generalized Linear Models

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

FAILURE-TIME WITH DELAYED ONSET

Approximating the Conway-Maxwell-Poisson normalizing constant

The Econometric Society is collaborating with JSTOR to digitize, preserve and extend access to Econometrica.

E. DROR, W. G. DWYER AND D. M. KAN

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

if n is large, Z i are weakly dependent 0-1-variables, p i = P(Z i = 1) small, and Then n approx i=1 i=1 n i=1

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

INFORMS is collaborating with JSTOR to digitize, preserve and extend access to Management Science.

Poisson Regression. Ryan Godwin. ECON University of Manitoba

BOOTSTRAPPING WITH MODELS FOR COUNT DATA

Goodness-of-fit tests for the cure rate in a mixture cure model

Generalized Linear Models

A Few Special Distributions and Their Properties

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion

The Review of Economic Studies, Ltd.

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Institute of Actuaries of India

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

11. Bootstrap Methods

On Equi-/Over-/Underdispersion. and Related Properties of Some. Classes of Probability Distributions. Vladimir Vinogradov

,..., θ(2),..., θ(n)

Augustin: Some Basic Results on the Extension of Quasi-Likelihood Based Measurement Error Correction to Multivariate and Flexible Structural Models

STAT5044: Regression and Anova

11. Generalized Linear Models: An Introduction

Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions Data

On the General Solution of Initial Value Problems of Ordinary Differential Equations Using the Method of Iterated Integrals. 1

Generalized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized Linear Mixed Models for Count Data

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

4. Distributions of Functions of Random Variables

Ecological Society of America is collaborating with JSTOR to digitize, preserve and extend access to Ecology.

DISPLAYING THE POISSON REGRESSION ANALYSIS

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

arxiv: v2 [stat.me] 8 Jun 2016

Modeling Overdispersion

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Econometric Analysis of Cross Section and Panel Data

ATINER's Conference Paper Series STA

SAS Software to Fit the Generalized Linear Model

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Longitudinal data analysis using generalized linear models

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Information in a Two-Stage Adaptive Optimal Design

Outline of GLMs. Definitions

1 Uniform Distribution. 2 Gamma Distribution. 3 Inverse Gamma Distribution. 4 Multivariate Normal Distribution. 5 Multivariate Student-t Distribution

MODEL SELECTION BASED ON QUASI-LIKELIHOOD WITH APPLICATION TO OVERDISPERSED DATA

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

Subject CS1 Actuarial Statistics 1 Core Principles

11 Survival Analysis and Empirical Likelihood

Generalized Linear Models: An Introduction

Mixture distributions in Exams MLC/3L and C/4

Linear Model Under General Variance

Bias-corrected AIC for selecting variables in Poisson regression models

Pseudo-score confidence intervals for parameters in discrete statistical models

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

The Periodogram and its Optical Analogy.

Foundations of Probability and Statistics

Adaptive modelling of conditional variance function

The Use of Survey Weights in Regression Modelling

INFORMS is collaborating with JSTOR to digitize, preserve and extend access to Mathematics of Operations Research.

Likelihood Asymptotics for Changepoint Problem

Sample size calculations for logistic and Poisson regression models

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Generalized Linear Models Introduction

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

PQL Estimation Biases in Generalized Linear Mixed Models

Generalized Linear Models (GLZ)

DELTA METHOD and RESERVING

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori

Week 2: Review of probability and statistics

Exponential Families

Transcription:

Biometrika Trust Some Remarks on Overdispersion Author(s): D. R. Cox Source: Biometrika, Vol. 70, No. 1 (Apr., 1983), pp. 269-274 Published by: Oxford University Press on behalf of Biometrika Trust Stable URL: http://www.jstor.org/stable/2335966 Accessed: 06-10-2016 13:07 UTC JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org. Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://about.jstor.org/terms Biometrika Trust, Oxford University Press are collaborating with JSTOR to digitize, preserve and extend access to Biometrika

Biometrika (1983), 70, 1, pp. 269-74 269 Printed in Great Britain Some remarks on overdispersion BY D. R. COX Department of Mathematics, Imperial College, London SUMMARY It is shown that maximum likelihood estimation of a simple model retains high efficiency in the presence of modest amounts of overdispersion. The main requirement is that the target parameter should be the moment parameter of an exponential family distribution, or more generally of a parameter for which the order n-' bias of the maximum likelihood estimate is zero. Extensions for models with explanatory variables are outlined. Some key words: Asymptotic theory; Dispersion test; Exponential distribution; Maximum likelihood; Negative binomial distribution; Pareto distribution; Poisson distribution; Quasilikelihood. 1. INTRODUCTION Analysis of data via a single-parameter family of distributions implies in particular that the variance is determined by the mean. Familiar examples are the Poisson, binomial and exponential distributions. A very common practical complication is the presence of overdispersion, or more rarely underdispersion, leading to a failure of the variance-mean relation. Overdispersion in general has two effects. One is that summary statistics have a larger variance than anticipated under the simple model. This has long been recognized and is commonly allowed for by an empirical inflation factor, either assumed from prior experience or estimated. The second effect is a possible loss of efficiency in using statistics appropriate for the single-parameter family. There are two lines of approach. One is detailed representation of the overdispersion by a specific model. The other is to examine the effect on the conventional analysis of changes from the simple model. Two fairly familiar examples are studied in?2 as a preliminary to a more general analysis. 2. Two SIMPLE EXAMPLES Suppose first that Y1,..., Y,, are independently and identically distributed in a Poisson distribution of mean 0, optimally estimated by Y = X Yi/n. Overdispersion is most simply represented by supposing that Yj has a Poisson distribution of mean Ej, where E 1, * E), are independently and identically distributed with a gamma distribution of mean, and index y, E(E@) = j,u var(es) = ((1) Then Y1,..., Y,, have a negative binomial distribution and the following conclusions hold. (i) The sample mean Y remains efficient as an estimate of,, being the maximum likelihood estimate whether y is known or unknown. (ii) If the Poisson distribution is parameterized in terms of some nonlinear function + of 0, for example = logo or e-@ or 1/0, then the Poisson-based estimate, for example

270 D. R. Cox log Y or e-, is not a consistent estimate of E(@D) in the corresponding overdispersed model. (iii) We have that n var (Y) =?u +2/(2) and the inflation is by a factor independent of, if and only if y oc,u. This means that in the compounding gamma distribution the variance is proportional to the mean, mimicking the Poisson distribution. To some extent the properties (i)-(iii) are special both to the Poisson distribution and to the special choice of compounding distribution. We therefore consider a second example. Suppose that Y1,..., Y,, are independently and identically exponentially distributed with mean 0 and rate p = 1/0. Again Y estimates 0. The simplest representation is to suppose the rate parameter to be a random variable P having a gamma distribution of mean A and index y: E(Pr) = (I/y)rF(?+r)/F(y). Write Iu = E(P - ) = A{(-1)}. Then E(Y) =, n var (Y) = M2 y/(y-2). (3) The individual Yj have density y{(y- l)ii}y (y + 7/A)y + -{y+ (Y-1) y}+ 1 The analogues of (i)-(iii) for the Poisson distribution are as follows. (i)' The sample mean Y is no longer fully efficient for estimating P. For known y, th maximum likelihood estimate of M has, by the usual calculations, asymptotically nvar(/i) =,u2 (y + 2)/y (4) so that the asymptotic efficiency of Y relative The parameters M and y are slightly nono unknown (4) and the asymptotic relative efficiency are increased by 0(1/y6). Recall that 1/y2 is the fourth power of the coefficient of variation of E and that if v is the variance inflation factor, so that var (YI) = v,u, then v = y/(y -2); thus the asymptotic relative efficiency is (2v -1)/v2. Even when v = 2, y = 4, representing substantial overdispersion, the asymptotic relative efficiency is 3/4. High asymptotic relative efficiency is retained for modest overdispersion. (ii)' If the exponential distribution is parameterized in terms of some nonlinear function 4 of 0, for example 4 = 1/0 or 0 = log 0, then the exponential-based for example 1/ Y or log Y, is not a consistent estimate of E((D) in the corresponding overdispersed model. (iii)' As already implicitly noted, for constant y, the variance-mean relation for the compounding distribution mimics that for the exponential distribution. For constant y, n var ( Y) = v,u2, where v is constant. 3. MORE GENERAL DISCUSSION To treat the problem in a more general way, asymptotic arguments seem necessary. The limiting operations involved are, of course, purely technical devices for deriving

Some remarks on overdispersion 271 approximations and care in formulation is needed. Here we consider a model with overdispersion on the borderline of detectibility, i.e. such that there is a reasonable but not overwhelming chance of detecting the overdispersion from the data. Suppose then that the initial model is that Y1,..., Y,, are independently distributed with Yj having density fj(y; 6), where 0 is a scalar parameter. Suppose next that?1,..., O the values of 0 for the n observations, are independently distributed with mean 1u and variance /<In. Note that this increases var(yy) by 0(1/In) and that this is on the borderline of detectibility in the above sense. Under suitable regularity conditions, the density of Yj in the overdispersed mo Ee {fj(y; E)} =fj(y; p)?+ I j(y P)+ 0 21Vn n,~ {jy 2) I+ n hj(y; 11) +?(n)} (5) where, with gj(y;,u) = log fj(y;,u), hj(y; P) = {agj(y; g)/agi}2+02gj(y;,i)/0a12. (6) Thus if It and 1 denote log likelihoods from respectively overdispersed and original models, then for a random vector Y T n lt(it T; Y) = 1(g; Y)+ E hj(yj; +Op(1) where = 49/; Y) + 2TV nk( p) + Op( 1), (7) nk(,u) = E{X hi(yi; Y); Ho} (8) In (8) the expectation is taken under the original model, T = 0, and at the true parameter-value g,u say. In (7), the remainder term is Op(l) for any fixed T a within 0(1/In) of its true value. Of course higher order terms in (7) could be evaluated. Now a constant difference between It and 1 would be of no consequence. Thus we consider Olt/a/,, = 0l1/a/'1 +-21Tnk'(y) + -2d.Vnk(gl) + OP( 1), ai~!a,1= /I1~n~I~L,2 dyu?~() (9) where k(u) = 2ilO+i?001 = -i300-t110 in the fairly standard notation r gj( yj; p) s a3 gj(y nirst = L E[{ a,i( }j; ) 2 Y; {1)}tJ so that irst is an average generalized information. If T is fixed or if T is a parameter independent of u, dt/d, = 0. In fact because k(,io) = 0 the term in dtidu is in any case negligible to the order considered below. To analyse the deviations of the maximum likelihood estimates 't and j from the true value of,u we expand in Taylor series. It follows from (9) that 't and A differ by an amount proportional to T and of order 1/I/n, unless k'(u) = 0. This difference is of the same order as the standard error of the maximum likelihood estimate and hence cannot be ignored. The requirement that k'(,u) = 0 is equivalent to choice of parameterization

272 D. R. Cox making the bias of the maximum likelihood estimate of, under the simple model zero to order n- ; see, for instance, Cox & Hinkley (1974, p. 310). When this condition hold and jt differ by Op(l/n), whether T is known or estimated. That is, simple maxi likelihood estimation retains full asymptotic efficiency when there is overdispersion on the borderline of detectibility, provided that the target parameter is correctly chosen. In particular, in full exponential family problems, the target parameter is the expectation over the compounding distribution of the moment parameter of the exponential family. These results generalize (i) and (ii) of? 2. The inflation in var (,) induced by overdispersion can in principle be examined by more detailed expansions. In the full exponential family, with 0 the moment parameter and Y the canonical statistic, 0 = Y and in the simple model n var ( Y) = v(o), say. In the overdispersed model nvar (Y) = E{v(e) + var (e)} v(,u) + var (e) {1?+ v" ()}. There is thus inflation by an approximately constant factor if var(e) oc 1v?(u) (10) The family of densities derived from (5) 4. TEST FOR OVERDISPERSION or in many ways preferably f (y; p) { 1 + sh(y; M1)j f (y; p) exp Ish( Y; M)l)}a(s, 1), (1 where a(e, /u) is a normalizing constant, represents for positive e overdispersion, and for negative e underdispersion, relative to f (y; lu). This suggests as test statistic, for e = 0 from a random sample Yl,..., Y,, Xh(Yj, t), where,0 is the maximum likelihood estimate of, when e = 0. When e = 0 the statistic has asymptotically zero mean and variance n var [{ag ( Y;II} + a2q(y* i) aq(y;!) =]J n{i40+?2i210 + i020-(i300i + i10)2/i200}, (12) where the i's can be evaluated at go. This is a rather general version of standard dispersion tests. When the parameter is the moment parameter or more generally defined as in?3, i300+i110 = 0 and the final term in (12) vanishes. 5. GENERALIZATIONS The analysis sketched in?? 2-4 can be generalized in various ways, the most important being as follows: (a) the parameter 0 may be a vector; (b) each Yj may have its own parameter-value Oj, these being related by a reg model Q\= (xj;,b), where xj is a vector of explanatory variables for the jth

Some remarks on overdispersion 273 individual,,b is a vector of regression coefficients, usually of dimension small compared with n and tj is a function of known form; (c) the individuals may be grouped in 'clusters' in such a way that all individuals in the same 'cluster' have a common random term. This may be combined with the kind of dependence outlined in (b). The generalization (a) is immediate. As a simple example suppose that the initial model is that Y1,..., Y,, are independently normally distributed with mean A and standard deviation K. The moment parameters of the normal distribution are the expected values of the canonical statistics (Yj, YV), that is are A and )2 + that in the overdispersed model in which (A, K) becomes a random variable (A, of standard normal estimates leads to the estimation of E(A), E(A2 + K2) = {E(A)}2 + var (A) + E(K2). Indeed it is clear that the standard estimates of mean and variance tend to E(A) and var (A) + E(K2). If in the compounding distribution, A and K are independent, it is easy to show that the cumulant generating functions are related by fy(t) = VA(t) + K2(2), where, for example, /y(t) = log E(etY). Thus observation of Y allows the estimati odd order cumulants of A and certain combinations of the even order cumulants of A and of K2, of which the sum of variances is the simplest case. The discussion of? 3 can be adapted to apply to the regression model (b). For this we use (7) with, for Yj,, replaced by ii(xj; /3) and u by u{i(xj;,b)}. We then examine, as in the relation between the gradient vectors alt/af3 and ai/3,b. The resulting maximum likelihood estimates differ by Op(1/n) if E{ahj( Yj; Mu)/Oy} = 0; no special requirement is involved for the dependence of X on il. The implication is that the use of maximum likelihood estimation as for the standard model retains high asyruptotic efficiency in the presence of modest overdispersion provided that the regression model being fitted is regarded as applying to expected values of parameters with zero n-1 bias in simple estimation. Thus, for example, fitting by maximum likelihood of a log linear model for Poisson-distributed data retains high efficiency under borderline overdispersion, provided that the log linear model determines the expected value of the observed count. That is, if the log linear model specifies a Poisson distribution for Yj with log E( Yj) = xt/, the overdispersed model should have E( Yj) = exp (x4t), wi var (Yj) > E( Yj). An overdispersed model in which Yj is considered to have a Poiss distribution with log E(Yj) = xjt + j, where dj in turn is a random variable of expectation zero, would, however, lead to the inconsistencies exemplified in (ii) and (ii)' of? 2. This discussion shows that the method of quasilikelihood (Wedderburn, 1974) is likely to have high efficiency for modest amounts of overdispersion. Generalization (c), arising from 'clustering', will not be discussed in detail. If any explanatory variables are constant within clusters, the broad conclusions above will apply. If, however, there is a need to treat differently dependencies within and dependencies between clusters, a more elaborate discussion is necessary.

274 D. R. COX REFERENCES Cox, D. R. & HINKLEY, D. V. (1974). Theoretical Statistics. London: Chapman and Hall. WEDDERBURN, R. W. M. (1974). Quasi-likelihood functions, generalized linear models, and the Gauss- Newton method. Biometrika 61, 439-47. [Received May 1982]