A New Method for Estimating Overdispersion. David Fletcher and Peter Green Department of Mathematics and Statistics

Similar documents
Negative Binomial Regression

Introduction to Generalized Linear Models

Chapter 5 Multilevel Models

Advances in Longitudinal Methods in the Social and Behavioral Sciences. Finite Mixtures of Nonlinear Mixed-Effects Models.

Chapter 11: Simple Linear Regression and Correlation

Statistics for Economics & Business

Effective plots to assess bias and precision in method comparison studies

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

4.3 Poisson Regression

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

On Outlier Robust Small Area Mean Estimate Based on Prediction of Empirical Distribution Function

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

QUASI-LIKELIHOOD APPROACH TO RATER AGREEMENT PLUS LINEAR BY LINEAR ASSOCIATION MODEL FOR ORDINAL CONTINGENCY TABLES

Chapter 15 Student Lecture Notes 15-1

Introduction to Regression

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

Chapter 14: Logit and Probit Models for Categorical Response Variables

Statistics for Business and Economics

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

A Robust Method for Calculating the Correlation Coefficient

Learning Objectives for Chapter 11

T E C O L O T E R E S E A R C H, I N C.

STAT 3008 Applied Regression Analysis

An R implementation of bootstrap procedures for mixed models

How its computed. y outcome data λ parameters hyperparameters. where P denotes the Laplace approximation. k i k k. Andrew B Lawson 2013

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

Outline. Zero Conditional mean. I. Motivation. 3. Multiple Regression Analysis: Estimation. Read Wooldridge (2013), Chapter 3.

4.1. Lecture 4: Fitting distributions: goodness of fit. Goodness of fit: the underlying principle

Conjugacy and the Exponential Family

Basic Business Statistics, 10/e

Statistics MINITAB - Lab 2

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Designing a Pseudo R-Squared Goodness-of-Fit Measure in Generalized Linear Models

Basically, if you have a dummy dependent variable you will be estimating a probability.

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

CHAPTER 6 GOODNESS OF FIT AND CONTINGENCY TABLE PREPARED BY: DR SITI ZANARIAH SATARI & FARAHANIM MISNI

BAYESIAN CURVE FITTING USING PIECEWISE POLYNOMIALS. Dariusz Biskup

Biostatistics. Chapter 11 Simple Linear Correlation and Regression. Jing Li

is the calculated value of the dependent variable at point i. The best parameters have values that minimize the squares of the errors

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Hierarchical Bayes. Peter Lenk. Stephen M Ross School of Business at the University of Michigan September 2004

Non-Mixture Cure Model for Interval Censored Data: Simulation Study ABSTRACT

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

x i1 =1 for all i (the constant ).

ANOMALIES OF THE MAGNITUDE OF THE BIAS OF THE MAXIMUM LIKELIHOOD ESTIMATOR OF THE REGRESSION SLOPE

Probabilistic Classification: Bayes Classifiers. Lecture 6:

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Semiparametric geographically weighted generalised linear modelling in GWR 4.0

Small Area Interval Estimation

Jon Deeks and Julian Higgins. on Behalf of the Statistical Methods Group of The Cochrane Collaboration. April 2005

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications

Originated from experimental optimization where measurements are very noisy Approximation can be actually more accurate than

Scatter Plot x

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Draft. Paper to be submitted for publication

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Regression Analysis. Regression Analysis

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

Problem of Estimation. Ordinary Least Squares (OLS) Ordinary Least Squares Method. Basic Econometrics in Transportation. Bivariate Regression Analysis

Comparison of Regression Lines

Properties of Least Squares

January Examinations 2015

Rockefeller College University at Albany

Statistical inference for generalized Pareto distribution based on progressive Type-II censored data with random removals

Bias-correction under a semi-parametric model for small area estimation

Generalized Linear Mixed Models. Bruce Craig - Purdue University

A Semiparametric Block Bootstrap for Clustered Data

The Ordinary Least Squares (OLS) Estimator

28. SIMPLE LINEAR REGRESSION III

First Year Examination Department of Statistics, University of Florida

Chapter 9: Statistical Inference and the Relationship between Two Variables

A Note on Test of Homogeneity Against Umbrella Scale Alternative Based on U-Statistics

β0 + β1xi and want to estimate the unknown

The corresponding link function is the complementary log-log link The logistic model is comparable with the probit model if

Chapter 3. Two-Variable Regression Model: The Problem of Estimation

Assignment 5. Simulation for Logistics. Monti, N.E. Yunita, T.

Robust Small Area Estimation Using a Mixture Model

Multivariate Ratio Estimator of the Population Total under Stratified Random Sampling

Biostatistics 360 F&t Tests and Intervals in Regression 1

Some basic statistics and curve fitting techniques

Lab 4: Two-level Random Intercept Model

Unit 10: Simple Linear Regression and Correlation

0.1 The micro "wage process"

Applications of GEE Methodology Using the SAS System

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Hydrological statistics. Hydrological statistics and extremes

Alternative Risk Models for Ranking Locations for Safety Improvement

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

Random Partitions of Samples

A COMPARATIVE STUDY OF MODELS FOR CORRELATED BINARY DATA WITH APPLICATIONS TO HEALTH SERVICES RESEARCH

The Power of Proc Nlmixed

Logistic Regression Maximum Likelihood Estimation

Transcription:

A New Method for Estmatng Overdsperson Davd Fletcher and Peter Green Department of Mathematcs and Statstcs Byron Morgan Insttute of Mathematcs, Statstcs and Actuaral Scence Unversty of Kent, England

Overvew Overdsperson n generalsed lnear models Just model t... Quantfy usng Pearson's statstc (adjust SEs, AIC) Problems wth Pearson's statstc Alternatves Parametrc bootstrap Classcal analogue of "Bayesan p-value" Smulaton results

Overdsperson Exponental famly of dstrbutons ncludes Posson Bnomal (Multnomal) Exponental Sngle parameter mples varance-mean relatonshp e.g. Posson has V ( y ) = µ Often get more varaton Postve correlaton between "ndvduals" Between-ndvdual varaton ("heterogenety")

Overdsperson Posson V ( y ) = µ Overdsperson V ( y) = φµ ( φ > 1) Alternatves 2 V ( y) = aµ (Posson-lognormal) 2 µ µ V y = a + b (Negatve bnomal)

Just model t... Add a random effect e.g. replace Posson by Posson-lognormal Negatve bnomal Generalsed lnear mxed model Bayesan herarchcal model Quas-lkelhood: just specfy mean-varance relatonshp More robust? Analogy wth use of least squares for non-normal data

Example Posson regresson "Low µ" Scenaro "Hgh µ" Scenaro 0 2 4 6 8 10 10 20 30 40 2 4 6 8 10 2 4 6 8 10

Example Consder 10 x-values each replcated twce "Low µ" Scenaro "Hgh µ" Scenaro 0 2 4 6 8 10 20 30 40 50 2 4 6 8 10 2 4 6 8 10

Quantfy overdsperson When model s correct, Pearson's GOF statstc = ( y ˆ µ ) Vˆ ( y) 2 2 χn p 2 ~ If V ( y) = φµ use ˆ φ = n 2 p

Quantfy overdsperson Posson regresson (n=20) wth φ =1 (no overdsperson) Samplng dstrbuton of ˆ φ Low µ scenaro Hgh µ scenaro 0.0 0.5 1.0 1.5 2.0 2.5

Quantfy overdsperson Posson regresson (n=20) wth φ = 2 Samplng dstrbuton of ˆ φ Hgh µ scenaro Low µ scenaro 0 1 2 3 4 5

Alternatve approaches Parametrc bootstrap Smulate model-fttng process Assume ftted model "correct" Parameter values = estmates Compare smulated and observed ˆ φ Classcal analogue of Bayesan p-value Smulate data-generaton process Assume ftted model "correct" Parameters values from samplng dstrbutons Compare smulated and observed ˆ φ

Alternatve approaches (may not work for small samples) Parametrc bootstrap Smulate model-fttng process Assume ftted model "correct" Parameter values = estmates Compare smulated and observed ˆ φ Classcal analogue of Bayesan p-value Smulate data-generaton process Assume ftted model "correct" Parameters values from samplng dstrbutons Compare smulated and observed ˆ φ

Parametrc bootstrap Use ftted model M ( ˆ θ ) to calculate ˆ φ For = 1,..., B (e.g. B = 100) Generate M ˆ θ Ft M to y from ˆ y and calculate φ Estmate relatve bas n ˆ φ by settng * ˆ ˆ ˆ E γ γ γ γ ˆ γ * B = = 1 ˆ γ B ˆ γ = log ˆ φ ˆ γ = log ˆ φ γ = logφ Set ( ˆ ˆ ) ˆ γ = ˆ γ γ γ = 2 ˆ γ ˆ γ * * B Bas adjustment

Condtonng ssue Parametrc bootstrap n GOF GOF should be condtonal,.e. only consder smulated data wth same parameter estmates as for observed data (Davson & Hnkley,1997)? Lttle practcal dfference n many problems? Crcumvented by "Morgan p-value" dea...?

Classcal analogue of the Bayesan p-value Bayesan p-value For = 1,..., B(e.g. B = 100) * Generate θ from posteror for θ Generate * * y from M ( θ ) Calculate dscrepances D y*, M ( θ * ) and D y, M ( θ * ) Plot D y*, M ( θ * ) versus, ( * ) p = D y M θ and calculate ( * ( θ * )) > ( θ * ) { D y } M D y M #,, B

Classcal analogue of Bayesan p-value (Morgan) For = 1,..., B * Generate θ from samplng dstrbuton for ˆ θ Generate * * y from M ( θ ) Calculate dscrepances D y*, M ( θ * ) and D y, M ( θ * ) Plot D y*, M ( θ * ) versus, ( * ) p = D y M θ and calculate ( * ( θ * )) > ( θ * ) { D y } M D y M #,, B

Inverse predcton combned wth "Morgan p-value" For several canddate values φ c (e.g. 40) For = 1,..., B (e.g. B = 25) * Generate θ from samplng dstrbuton for ˆ θ Calculate * * µ from M ( θ ) Generate y* ( φ ) wth Calculate η* ( φ ) Lnear regresson to fnd ˆM c c E y φ µ * * c = ( φc ), M ( θ ), ( θ * ) ( * * ) D( y M ) D y = log φ satsfyng M V y φ = φ µ * * c c E η * ˆ φ = 0

Smulaton results Posson regresson (n=20) wth φ = 2 ( Low µ scenaro) Parametrc bootstrap φˆ "Morgan -p-value" 0 2 4 6 8

Ideas/Issues Bootstrap and "Morgan p-value" can mprove on ˆ φ "Morgan p-value" faster (only smple model here) Choce of dscrepancy functon? Best strategy for nverse predcton? Confdence ntervals for φ? Bayesan analogy (obvously) Knock-on effects re SEs and AIC? Assumptons re samplng dstrbuton for θ? Applcaton to mark-recapture models

A New Method for Estmatng Overdsperson Davd Fletcher and Peter Green Department of Mathematcs and Statstcs Byron Morgan Insttute of Mathematcs, Statstcs and Actuaral Scence Unversty of Kent, England