Sample solutions. Stat 8051 Homework 8

Similar documents
Modeling Overdispersion

Poisson Regression. The Training Data

Logistic Regressions. Stat 430

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/

Poisson Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Exercise 5.4 Solution

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

Checking the Poisson assumption in the Poisson generalized linear model

R Hints for Chapter 10

CHAPTER 3 Count Regression

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Reaction Days

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Statistical analysis of trends in climate indicators by means of counts

R code and output of examples in text. Contents. De Jong and Heller GLMs for Insurance Data R code and output. 1 Poisson regression 2

Exam Applied Statistical Regression. Good Luck!

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

Logistic Regression - problem 6.14

Contents. 1 Introduction: what is overdispersion? 2 Recognising (and testing for) overdispersion. 1 Introduction: what is overdispersion?

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Duration of Unemployment - Analysis of Deviance Table for Nested Models

Homework 10 - Solution

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.

Non-Gaussian Response Variables

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

PAPER 206 APPLIED STATISTICS

9 Generalized Linear Models

Interactions in Logistic Regression

Generalized linear models

12 Modelling Binomial Response Data

On the Inference of the Logistic Regression Model

R Output for Linear Models using functions lm(), gls() & glm()

Linear Regression Models P8111

Log-linear Models for Contingency Tables

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

UNIVERSITY OF TORONTO Faculty of Arts and Science

Generalized Linear Models. stat 557 Heike Hofmann

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Chapter 3: Generalized Linear Models

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

STAC51: Categorical data Analysis

Matched Pair Data. Stat 557 Heike Hofmann

STAT 510 Final Exam Spring 2015

Classification. Chapter Introduction. 6.2 The Bayes classifier

Review of Poisson Distributions. Section 3.3 Generalized Linear Models For Count Data. Example (Fatalities From Horse Kicks)

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression

Introduction to General and Generalized Linear Models

Example: 1982 State SAT Scores (First year state by state data available)

Poisson Regression. Gelman & Hill Chapter 6. February 6, 2017

Various Issues in Fitting Contingency Tables

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

STA102 Class Notes Chapter Logistic Regression

Age 55 (x = 1) Age < 55 (x = 0)

Neural networks (not in book)

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion

Faculty of Science FINAL EXAMINATION Mathematics MATH 523 Generalized Linear Models

robmixglm: An R package for robust analysis using mixtures

Leftovers. Morris. University Farm. University Farm. Morris. yield

Logistic Regression 21/05

Using R in 200D Luke Sonnet

Week 7 Multiple factors. Ch , Some miscellaneous parts

Booklet of Code and Output for STAD29/STA 1007 Midterm Exam

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

Stat 8053, Fall 2013: Poisson Regression (Faraway, 2.1 & 3)

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

A strategy for modelling count data which may have extra zeros

22s:152 Applied Linear Regression. RECALL: The Poisson distribution. Let Y be distributed as a Poisson random variable with the single parameter λ.

MSH3 Generalized linear model Ch. 6 Count data models

Generalized Linear Models

Chapter 22: Log-linear regression for Poisson counts

The linear model. Our models so far are linear. Change in Y due to change in X? See plots for: o age vs. ahe o carats vs.

STATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours

Generalized linear models

ssh tap sas913, sas

Loglinear models. STAT 526 Professor Olga Vitek

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Generalised linear models. Response variable can take a number of different formats

STA 450/4000 S: January

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

Explanatory variables are: weight, width of shell, color (medium light, medium, medium dark, dark), and condition of spine.

Notes for week 4 (part 2)

cor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson )

Poisson regression: Further topics

Regression Methods for Survey Data

Model Estimation Example

Stat 5102 Final Exam May 14, 2015

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.

Improving the Precision of Estimation by fitting a Generalized Linear Model, and Quasi-likelihood.

Today. HW 1: due February 4, pm. Aspects of Design CD Chapter 2. Continue with Chapter 2 of ELM. In the News:

Generalized Additive Models

Consider fitting a model using ordinary least squares (OLS) regression:

Cherry.R. > cherry d h v <portion omitted> > # Step 1.

Regression and Generalized Linear Models. Dr. Wolfgang Rolke Expo in Statistics C3TEC, Caguas, October 9, 2015

Statistics 191 Introduction to Regression Analysis and Applied Statistics Practice Exam

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Transcription:

Sample solutions Stat 8051 Homework 8 Problem 1: Faraway Exercise 3.1 A plot of the time series reveals kind of a fluctuating pattern: Trying to fit poisson regression models yields a quadratic model if we only consider significant polynomial effects. > n = as.numeric(discoveries) > discoveries1 = data.frame(n) > discoveries1$year = 1860:1959 > > m1.pois = glm(n~poly(year,2), family=poisson, data=discoveries1) > summary(m1.pois) glm(formula = n ~ poly(year, 2), family = poisson, data = discoveries1) -2.9066-0.8397-0.2544 0.4776 3.3303 (Intercept) 1.07207 0.06064 17.681 < 2e-16 *** 1

Sample solutions 8 2 poly(year, 2)1-2.04766 0.66937-3.059 0.00222 ** poly(year, 2)2-3.05975 0.64821-4.720 2.35e-06 *** Null deviance: 164.68 on 99 degrees of freedom Residual deviance: 132.84 on 97 degrees of freedom AIC: 407.85 Number of Fisher Scoring iterations: 5 There is not much overdispersion as we can see from the value of the dispersion parameter: > (dp <- sum(residuals(m1.pois,type="pearson")^2)/m1.pois$df.res) [1] 1.305649 The significant year effects indicate that the discovery rates vary significantly from constant with respect to year. Note Doing the half-normal plot and testing for outliers using studentized residuals reveals point 26 as a potential outlier. Removing this point actually improves the quadratic fit (check). Problem 2: Faraway Exercise 3.2 A poission model with linear effect of dose does very badly in terms of deviance: > m2.pois = glm(colonies~dose, family=poisson, data=salmonella) > summary(m2.pois) glm(formula = colonies ~ dose, family = poisson, data = salmonella) -2.6482-1.8225-0.2993 1.2917 5.1861 (Intercept) 3.3219950 0.0540292 61.485 <2e-16 *** dose 0.0001901 0.0001172 1.622 0.105

Sample solutions 8 3 Null deviance: 78.358 on 17 degrees of freedom Residual deviance: 75.806 on 16 degrees of freedom AIC: 172.34 Number of Fisher Scoring iterations: 4 The value of the dispersion parameter is very high, and we can actually check for its significance using the function dispersiontest from package AER. > require(aer) > dispersiontest(m2.pois, alternative="greater") Overdispersion test data: m2.pois z = 1.913, p-value = 0.02787 alternative hypothesis: true dispersion is greater than 1 sample estimates: dispersion 4.522293 Fitting a negative binomial model doesn t improve the fit. Instead we can try fitting polynomial link functions. Doing so reveals that a cubic fit gives all polynomial effects significant and decreases the deviance as well: > m21.pois = glm(colonies~poly(dose,3), family=poisson, data=salmonella) > summary(m21.pois) glm(formula = colonies ~ poly(dose, 3), family = poisson, data = salmonella) -2.43608-0.85295-0.07833 0.56028 2.65580 (Intercept) 3.3300 0.0455 73.181 < 2e-16 *** poly(dose, 3)1 0.3826 0.1903 2.011 0.0444 * poly(dose, 3)2-0.8648 0.1767-4.893 9.91e-07 *** poly(dose, 3)3 0.7745 0.1716 4.514 6.37e-06 ***

Sample solutions 8 4 Null deviance: 78.358 on 17 degrees of freedom Residual deviance: 36.055 on 14 degrees of freedom AIC: 136.59 Number of Fisher Scoring iterations: 4 > dispersiontest(m21.pois, alternative="greater") Overdispersion test data: m21.pois z = 1.808, p-value = 0.03531 alternative hypothesis: true dispersion is greater than 1 sample estimates: dispersion 2.030354 The dispersion parameter is still high, so let us now fit a cubic negative-binomial model: > m21.nb = glm.nb(colonies~poly(dose,3), data=salmonella) > summary(m21.nb) glm.nb(formula = colonies ~ poly(dose, 3), data = salmonella, init.theta = 28.81281522, link = log) -1.61674-0.67164-0.03553 0.37609 1.67190 (Intercept) 3.33054 0.06321 52.687 < 2e-16 *** poly(dose, 3)1 0.37991 0.26626 1.427 0.153626 poly(dose, 3)2-0.85836 0.25686-3.342 0.000832 *** poly(dose, 3)3 0.75662 0.25465 2.971 0.002966 ** (Dispersion parameter for Negative Binomial(28.8128) family taken to be 1) Null deviance: 38.447 on 17 degrees of freedom

Sample solutions 8 5 Residual deviance: 17.613 on 14 degrees of freedom AIC: 132.43 Number of Fisher Scoring iterations: 1 Theta: 28.8 Std. Err.: 18.9 2 x log-likelihood: -122.434 Here also the higher powers of dose turn out to be significant. Problem 3: Faraway Exercise 3.7 Here the number of complaints linearly depends on the number of visits, so we fit a rate model (log(visits) as offset). The summary is as given below: glm(formula = complaints ~ residency + gender + revenue + hours + offset(log(visits)), family = poisson, data = esdcomp) -1.9434-0.9490-0.3130 0.7859 1.8036 (Intercept) -8.1202460 0.8502806-9.550 <2e-16 *** residencyy -0.2090058 0.2011520-1.039 0.2988 genderm 0.1954338 0.2181525 0.896 0.3703 revenue 0.0015761 0.0028294 0.557 0.5775 hours 0.0007019 0.0003505 2.002 0.0452 * Null deviance: 63.435 on 43 degrees of freedom Residual deviance: 54.518 on 39 degrees of freedom AIC: 187.3 Number of Fisher Scoring iterations: 5 Number of hours turns out to be the only significant variables. Also there is no significant overdispersion, so the poisson model is sufficient, though the fit is not good.

Sample solutions 8 6 > dispersiontest(m32.pois) Overdispersion test data: m32.pois z = 1.1459, p-value = 0.1259 alternative hypothesis: true dispersion is greater than 1 sample estimates: dispersion 1.17846