A New Method for Estimating Overdispersion. David Fletcher and Peter Green Department of Mathematics and Statistics

A New Method for Estmatng Overdsperson Davd Fletcher and Peter Green Department of Mathematcs and Statstcs Byron Morgan Insttute of Mathematcs, Statstcs and Actuaral Scence Unversty of Kent, England

Overvew Overdsperson n generalsed lnear models Just model t... Quantfy usng Pearson's statstc (adjust SEs, AIC) Problems wth Pearson's statstc Alternatves Parametrc bootstrap Classcal analogue of "Bayesan p-value" Smulaton results

Overdsperson Exponental famly of dstrbutons ncludes Posson Bnomal (Multnomal) Exponental Sngle parameter mples varance-mean relatonshp e.g. Posson has V ( y ) = µ Often get more varaton Postve correlaton between "ndvduals" Between-ndvdual varaton ("heterogenety")

Overdsperson Posson V ( y ) = µ Overdsperson V ( y) = φµ ( φ > 1) Alternatves 2 V ( y) = aµ (Posson-lognormal) 2 µ µ V y = a + b (Negatve bnomal)

Just model t... Add a random effect e.g. replace Posson by Posson-lognormal Negatve bnomal Generalsed lnear mxed model Bayesan herarchcal model Quas-lkelhood: just specfy mean-varance relatonshp More robust? Analogy wth use of least squares for non-normal data

Example Posson regresson "Low µ" Scenaro "Hgh µ" Scenaro 0 2 4 6 8 10 10 20 30 40 2 4 6 8 10 2 4 6 8 10

Example Consder 10 x-values each replcated twce "Low µ" Scenaro "Hgh µ" Scenaro 0 2 4 6 8 10 20 30 40 50 2 4 6 8 10 2 4 6 8 10

Quantfy overdsperson When model s correct, Pearson's GOF statstc = ( y ˆ µ ) Vˆ ( y) 2 2 χn p 2 ~ If V ( y) = φµ use ˆ φ = n 2 p

Quantfy overdsperson Posson regresson (n=20) wth φ =1 (no overdsperson) Samplng dstrbuton of ˆ φ Low µ scenaro Hgh µ scenaro 0.0 0.5 1.0 1.5 2.0 2.5

Quantfy overdsperson Posson regresson (n=20) wth φ = 2 Samplng dstrbuton of ˆ φ Hgh µ scenaro Low µ scenaro 0 1 2 3 4 5

Alternatve approaches Parametrc bootstrap Smulate model-fttng process Assume ftted model "correct" Parameter values = estmates Compare smulated and observed ˆ φ Classcal analogue of Bayesan p-value Smulate data-generaton process Assume ftted model "correct" Parameters values from samplng dstrbutons Compare smulated and observed ˆ φ

Alternatve approaches (may not work for small samples) Parametrc bootstrap Smulate model-fttng process Assume ftted model "correct" Parameter values = estmates Compare smulated and observed ˆ φ Classcal analogue of Bayesan p-value Smulate data-generaton process Assume ftted model "correct" Parameters values from samplng dstrbutons Compare smulated and observed ˆ φ

Parametrc bootstrap Use ftted model M ( ˆ θ ) to calculate ˆ φ For = 1,..., B (e.g. B = 100) Generate M ˆ θ Ft M to y from ˆ y and calculate φ Estmate relatve bas n ˆ φ by settng * ˆ ˆ ˆ E γ γ γ γ ˆ γ * B = = 1 ˆ γ B ˆ γ = log ˆ φ ˆ γ = log ˆ φ γ = logφ Set ( ˆ ˆ ) ˆ γ = ˆ γ γ γ = 2 ˆ γ ˆ γ * * B Bas adjustment

Condtonng ssue Parametrc bootstrap n GOF GOF should be condtonal,.e. only consder smulated data wth same parameter estmates as for observed data (Davson & Hnkley,1997)? Lttle practcal dfference n many problems? Crcumvented by "Morgan p-value" dea...?

Classcal analogue of the Bayesan p-value Bayesan p-value For = 1,..., B(e.g. B = 100) * Generate θ from posteror for θ Generate * * y from M ( θ ) Calculate dscrepances D y*, M ( θ * ) and D y, M ( θ * ) Plot D y*, M ( θ * ) versus, ( * ) p = D y M θ and calculate ( * ( θ * )) > ( θ * ) { D y } M D y M #,, B

Classcal analogue of Bayesan p-value (Morgan) For = 1,..., B * Generate θ from samplng dstrbuton for ˆ θ Generate * * y from M ( θ ) Calculate dscrepances D y*, M ( θ * ) and D y, M ( θ * ) Plot D y*, M ( θ * ) versus, ( * ) p = D y M θ and calculate ( * ( θ * )) > ( θ * ) { D y } M D y M #,, B

Inverse predcton combned wth "Morgan p-value" For several canddate values φ c (e.g. 40) For = 1,..., B (e.g. B = 25) * Generate θ from samplng dstrbuton for ˆ θ Calculate * * µ from M ( θ ) Generate y* ( φ ) wth Calculate η* ( φ ) Lnear regresson to fnd ˆM c c E y φ µ * * c = ( φc ), M ( θ ), ( θ * ) ( * * ) D( y M ) D y = log φ satsfyng M V y φ = φ µ * * c c E η * ˆ φ = 0

Smulaton results Posson regresson (n=20) wth φ = 2 ( Low µ scenaro) Parametrc bootstrap φˆ "Morgan -p-value" 0 2 4 6 8

Ideas/Issues Bootstrap and "Morgan p-value" can mprove on ˆ φ "Morgan p-value" faster (only smple model here) Choce of dscrepancy functon? Best strategy for nverse predcton? Confdence ntervals for φ? Bayesan analogy (obvously) Knock-on effects re SEs and AIC? Assumptons re samplng dstrbuton for θ? Applcaton to mark-recapture models

A New Method for Estmatng Overdsperson Davd Fletcher and Peter Green Department of Mathematcs and Statstcs Byron Morgan Insttute of Mathematcs, Statstcs and Actuaral Scence Unversty of Kent, England