Poisson Regression. Gelman & Hill Chapter 6. February 6, 2017

Similar documents
Binary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

Generalized Linear Models

Poisson Regression. The Training Data

Exam Applied Statistical Regression. Good Luck!

Stat 579: Generalized Linear Models and Extensions

Nonlinear Models. What do you do when you don t have a line? What do you do when you don t have a line? A Quadratic Adventure

Log-linear Models for Contingency Tables

Generalized linear models

Statistical Modelling with Stata: Binary Outcomes

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

Notes for week 4 (part 2)

MSH3 Generalized linear model Ch. 6 Count data models

9 Generalized Linear Models

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.

Non-Gaussian Response Variables

Checking the Poisson assumption in the Poisson generalized linear model

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

Simple Linear Regression

Linear Regression Models P8111

Loglinear models. STAT 526 Professor Olga Vitek

Homework 1 Solutions

Outline of GLMs. Definitions

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

BMI 541/699 Lecture 22

Generalized Estimating Equations

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

STAC51: Categorical data Analysis

Answer Key for STAT 200B HW No. 8

Generalized linear models

Homework 10 - Solution

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Sample solutions. Stat 8051 Homework 8

Chapter 22: Log-linear regression for Poisson counts

Reaction Days

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Lecture 8. Poisson models for counts

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Introduction to General and Generalized Linear Models

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Experimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla.

Logistic Regressions. Stat 430

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Generalized Linear Models 1

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

Ch 3: Multiple Linear Regression

Modeling Overdispersion

Section Poisson Regression

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Bayesian Model Diagnostics and Checking

Regression Models for Risks(Proportions) and Rates. Proportions. E.g. [Changes in] Sex Ratio: Canadian Births in last 60 years

Generalized Linear Models

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Today. HW 1: due February 4, pm. Aspects of Design CD Chapter 2. Continue with Chapter 2 of ELM. In the News:

Generalized Linear Models. Kurt Hornik

Poisson Regression. Ryan Godwin. ECON University of Manitoba

Generalized Linear Models

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Quasi-likelihood Scan Statistics for Detection of

Ch 2: Simple Linear Regression

Single-level Models for Binary Responses

Joseph O. Marker Marker Actuarial a Services, LLC and University of Michigan CLRS 2010 Meeting. J. Marker, LSMWP, CLRS 1

Lecture 6 Multiple Linear Regression, cont.

SPH 247 Statistical Analysis of Laboratory Data. April 28, 2015 SPH 247 Statistics for Laboratory Data 1

Poisson regression: Further topics

Poisson regression 1/15

Section 4.6 Simple Linear Regression

12 Modelling Binomial Response Data

Exercise 5.4 Solution

SOS3003 Applied data analysis for social science Lecture note Erling Berge Department of sociology and political science NTNU.

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

MATH 644: Regression Analysis Methods

Multivariate Statistics in Ecology and Quantitative Genetics Summary

Lecture 12: Effect modification, and confounding in logistic regression

STAT 526 Spring Midterm 1. Wednesday February 2, 2011

Business Statistics. Lecture 10: Correlation and Linear Regression

MSH3 Generalized linear model

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Generalized Linear Models I

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

Lecture 18: Simple Linear Regression

Generalized Linear Models: An Introduction

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Unit 6 - Introduction to linear regression

Statistics 203: Introduction to Regression and Analysis of Variance Course review

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

Generalized linear models IV Examples

MSH3 Generalized linear model

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Transcription:

Poisson Regression Gelman & Hill Chapter 6 February 6, 2017

Military Coups Background: Sub-Sahara Africa has experienced a high proportion of regime changes due to military takeover of governments for a variety of reasons: ethnic fragmentation, arbitrary borders, economic problems, outside interventions, poorly developed government institutions, etc. Data in Gill (page 551-552) is a subset from Bratton and Van de Valle (1994) to examine factors to try to explain military coups in 33 countries from each country s colonial independence to 1989. africa = read.table("africa.dat", header = T)

Variables MILTCOUP MILTITARY POLLIB PARTY93 PCTVOTE PCTTURN SIZE POP NUMREGIM NUMELEC # of coups # of years of military oligarchy (0=no civil rights, 1=limited, 2=extensive) # of political parties % legislative voting % registered voting of country (1000 km^2 (in millions) Number of regimes Number of elections Type of study? Are causal conclusions possible?

Distribution of Response 0.04 0.02 0.00 80 60 40 20 0 2500 2000 1500 1000 500 0 6 4 2 PARTY93 PCTVOTE SIZE MILTCOUP Corr: 0.143 Corr: 0.0452 Corr: 0.0976 Corr: 0.301 Corr: 0.0355 Corr: 0.114 0 0 20 40 60 0 20 40 60 80 0 500 1000150020002500 0 2 4 6 Response is non-negative PARTY93 PCTVOTE SIZE MILTCOUP

Poisson Distribution Y i λ i P(λ i ) p(y i ) = λy i i e λ i y i! y i = 0, 1,..., λ i > 0 Used for counts with no upper limit E(Y i ) = V (Y i ) = λ i How to build in covariates into the mean? λ > 0 log(λ) = η R log link

Generalized Linear Model Canonical Link function for Poisson data is the log link log(λ i ) = η i = β 0 + X 1 β 1 +... X p β p (linear predictor) λ = exp(β 0 + X 1 β 1 +... X p β p ) Holding all other X s fixed a 1 unit change in X j λ = exp(β 0 + X 1 β 1 +... (X j + 1)β j +... X p β p ) λ = exp(β j ) exp(β 0 + X 1 β 1 +... X j β j +... X p β p ) λ = exp(β j )λ λ /λ = exp(β j ) exp(β j ) is called a relative risk (risk relative to some baseline)

Output from glm africa.glm = glm(miltcoup ~ MILITARY + POLLIB + PARTY93+ PCTVOTE + PCTTURN + SIZE*POP + NUMREGIM*NUMELEC, family=poisson, data=africa) round(summary(africa.glm)$coef, 4) ## Estimate Std. Error z value Pr(> z ) ## (Intercept) 2.9209 1.3368 2.1850 0.0289 ## MILITARY 0.1709 0.0509 3.3575 0.0008 ## POLLIB -0.4654 0.3319-1.4022 0.1609 ## PARTY93 0.0247 0.0109 2.2792 0.0227 ## PCTVOTE 0.0613 0.0217 2.8202 0.0048 ## PCTTURN -0.0361 0.0137-2.6372 0.0084 ## SIZE -0.0018 0.0007-2.5223 0.0117 ## POP -0.1188 0.0397-2.9961 0.0027 ## NUMREGIM -0.8662 0.4571-1.8949 0.0581 ## NUMELEC -0.4859 0.2118-2.2948 0.0217 ## SIZE:POP 0.0001 0.0000 3.0111 0.0026 ## NUMREGIM:NUMELEC 0.1810 0.0689 2.6265 0.0086

lack of fit? Under the hypothesis that the model is correct, residual deviance has an asymptotic χ 2 n p 1 distribution Residual deviance is the change in deviance from the model to a saturated model where each observation has its own λ i Under the alternative that we have omitted important terms, expect to see a large residual deviance Compare observed deviance to χ 2 distribution c(summary(africa.glm)$deviance, summary(africa.glm)$df.residual) ## [1] 7.547369 21.000000 1 - pchisq(summary(africa.glm)$deviance, summary(africa.glm)$df. ## [1] 0.9967843 So no evidence of lack of fit (overdispersion).

Diagnostics Residuals 1.0 0.0 1.0 Residuals vs Fitted 9 28 32 Std. deviance resid. 1.5 0.0 1.5 32 28 Normal Q Q 9 3 2 1 0 1 2 2 1 0 1 2 Predicted values Theoretical Quantiles Std. deviance resid. 0.0 0.6 1.2 Scale Location 9 32 28 Std. Pearson resid. 1.5 0.0 1.5 Residuals vs Leverage Cook's distance 32 25 30 0.5 1 10.5 3 2 1 0 1 2 0.0 0.2 0.4 0.6 0.8 1.0 Predicted values Leverage

Residuals in GLMS residuals: Y i ˆλ i (observed - fitted) Pearson Goodness of Fit Pearson Residuals: X 2 = i (Y i ˆλ i ) 2 ˆλ i r i = Y i ˆλ i ˆλ i residuals.glm(africa.glm, type="pearson") residual deviance: Change in deviance for Model compared to Saturated model D = 2 { } y i log(y i /ˆλ i ) (y i ˆλ i ) i = d i residuals.glm(africa.glm, type="deviance")

Coefficients Estimate Std. Error z value Pr(> z ) (Intercept) 2.92 1.34 2.18 0.03 MILITARY 0.17 0.05 3.36 0.00 POLLIB -0.47 0.33-1.40 0.16 PARTY93 0.02 0.01 2.28 0.02 PCTVOTE 0.06 0.02 2.82 0.00 PCTTURN -0.04 0.01-2.64 0.01 SIZE -0.00 0.00-2.52 0.01 POP -0.12 0.04-3.00 0.00 NUMREGIM -0.87 0.46-1.89 0.06 NUMELEC -0.49 0.21-2.29 0.02 SIZE:POP 0.00 0.00 3.01 0.00 NUMREGIM:NUMELEC 0.18 0.07 2.63 0.01

Treat Political Liberties as a Factor? africa.glm1 = glm(miltcoup ~ MILITARY + factor(pollib) + PARTY93 + PCTVOTE+ PCTTURN + SIZE*POP + NUMREGIM*NUMELEC, family=poisson, data=africa) anova(africa.glm, africa.glm1, test="chi") ## Model 1: MILTCOUP ~ MILITARY + POLLIB + PARTY93 + PCTVOT ## Analysis of Deviance Table ## ## SIZE * POP + NUMREGIM * NUMELEC ## Model 2: MILTCOUP ~ MILITARY + factor(pollib) + PARTY93 ## SIZE * POP + NUMREGIM * NUMELEC ## Resid. Df Resid. Dev Df Deviance Pr(>Chi) ## 1 21 7.5474 ## 2 20 7.1316 1 0.41581 0.519

Interpretation of Coefficients Asymptotic distribution (Frequentist) (β j ˆβ j )/SE(β j ) N(0, 1) 95% CI for coefficient of MILITARY: 0.171 ± 1.96 0.051 = (0.078, 0.282) relative risk is exp(0.171) = 1.186 95% CI for relative risk e CI (exp(0.078), exp(0.282)) = (1.081, 1.325) Keeping everything else constant, for every additional year of military oligarchy, the risk of a military coup increases by 8 to 32 percent

Deviance Goodness of Fit deviance is -2 log(likelihood) evaluated at the MLE of the parameters in that model 2 i (y i log(ˆλ i ) + ˆλ i log(y i!)) smaller is better (larger likelihood) null deviance is the deviance under the Null model, that is a model with just an intercept or λ i = λ and ˆλ = Ȳ saturated model deviance is the deviance of a model where each observation has there own unique λ i and the MLE of ˆλ i = y i, the change in deviance has a Chi-squared distribution with degrees of freedom equal to the change in number of parameters in the models.

Derivation the residual deviance is the change in the deviance between the given model and the saturated model. substituting, we have D = 2 ( ) y i log(ˆλ i ) ˆλ i log(y i!) i 2 (y i log(y i ) y i log(y i!)) i =2 ( ) y i (log(y i ) log(ˆλ i )) (y i ˆλ i )) i ( = 2 y i (log(y i /ˆλ i ) (y i ˆλ ) i )) = 2d i This has a chi squared distibution with n (p + 1) degrees of freedom.