Poisson regression 1/15

Similar documents
1/15. Over or under dispersion Problem

Generalized Linear Models. Kurt Hornik

Homework for Lecture Regression Analysis Sections

Outline of GLMs. Definitions

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

Forestry experiment and dose-response modelling. In the News: High water mark: the rise in sea levels may be accelerating Economist, Jan 17

Linear Regression Models P8111

Generalized Linear Models

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Generalized Linear Models Introduction

LOGISTIC REGRESSION Joseph M. Hilbe

Stat 579: Generalized Linear Models and Extensions

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

if n is large, Z i are weakly dependent 0-1-variables, p i = P(Z i = 1) small, and Then n approx i=1 i=1 n i=1

Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

Chapter 4: Generalized Linear Models-II

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

9 Generalized Linear Models

CHAPTER 3 Count Regression

Reparametrization of COM-Poisson Regression Models with Applications in the Analysis of Experimental Count Data


Generalized Linear Models 1

Survival Analysis Math 434 Fall 2011

Poisson Regression. Ryan Godwin. ECON University of Manitoba

Lecture 5: LDA and Logistic Regression

x 21 x 22 x 23 f X 1 X 2 X 3 ε

Homework 1 Solutions

Section Poisson Regression

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Poisson regression: Further topics

Chapter 22: Log-linear regression for Poisson counts

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Classification. Chapter Introduction. 6.2 The Bayes classifier

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

The material for categorical data follows Agresti closely.

Generalized Estimating Equations

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression

Generalized linear models

High-Throughput Sequencing Course

MSH3 Generalized linear model Ch. 6 Count data models

Sections 4.1, 4.2, 4.3

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.

Generalized linear models

Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square

Biostatistics Advanced Methods in Biostatistics IV

MODELING COUNT DATA Joseph M. Hilbe

Modelling geoadditive survival data

ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T.

General Regression Model

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

Chapter 4: Generalized Linear Models-I

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses

11. Generalized Linear Models: An Introduction

Linear Methods for Prediction

Generalized Linear Models

Introduction to General and Generalized Linear Models

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

SB1a Applied Statistics Lectures 9-10

Linear Regression With Special Variables

Maximum Likelihood (ML) Estimation

Generalized Linear Models

Figure 36: Respiratory infection versus time for the first 49 children.

Introduction to Generalized Linear Models

Poisson Regression. Gelman & Hill Chapter 6. February 6, 2017

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.

Generalized linear models

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y

,..., θ(2),..., θ(n)

Generalized Linear Models: An Introduction

Is the cholesterol concentration in blood related to the body mass index (bmi)?

Answer Key for STAT 200B HW No. 8

Binary Response: Logistic Regression. STAT 526 Professor Olga Vitek

Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

MATH Generalized Linear Models

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Semiparametric Generalized Linear Models

Regression Models - Introduction

Stat 579: Generalized Linear Models and Extensions

Generalized Linear Mixed-Effects Models. Copyright c 2015 Dan Nettleton (Iowa State University) Statistics / 58

Review of Poisson Distributions. Section 3.3 Generalized Linear Models For Count Data. Example (Fatalities From Horse Kicks)

STAC51: Categorical data Analysis

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Index. Regression Models for Time Series Analysis. Benjamin Kedem, Konstantinos Fokianos Copyright John Wiley & Sons, Inc. ISBN.

Linear Methods for Prediction

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

Likelihoods for Generalized Linear Models

Generalized Linear Models I

Binomial and Poisson Probability Distributions

Transcription:

Poisson regression 1/15

2/15 Counts data Examples of counts data: Number of hospitalizations over a period of time Number of passengers in a bus station Blood cells number in a blood sample Number of typos in a book

3/15 Example: tortoise species data The Galapagos Islands off the coast of Ecuador are great locations for studying the factors that influence the development and survival of different life species. The data set provides counts for the total number of tortoise species, and the number of species that occur only on that one island (the endemics) (Johnson and Raven, 1973).

4/15 Example: tortoise species data This data set also contains the following geographic variables: Area: area in square km; Elevation: elevation in meters; Nearest: distance from nearest island; Scruz: distance from Santa Cruz (which is near the center of the Galapagos); Adjacent: area of adjacent island in square km.

5/15 Poisson distribution for counts data Poisson distribution can be defined via a counting process with the following properties: 1. The expected number of events occurring in an interval of time is proportional to the length of the interval. 2. The probability that two events occurring in an infinitely small interval is 0. 3. The number of events occurring in separate intervals are independent. Poisson is a good approximation of Binomial distributed data when the total number of trials is large and small success probability.

6/15 Poisson regression Assume that the response Y i is a count, where Y i could taking values 0,1,2,. The distribution of Y i may be modelled by the Poisson distribution with mean µ i. That is Y i Poisson(µ i ), which has the pmf f (y) = exp( µ)µ y /y! for y = 0, 1, 2. Here µ > 0.

7/15 Link function One common link function used for the Poisson regression is the log function. That is log(µ i ) = X T i β, where X i is a p-dim predictor and β is a p-dim unknown coefficients. The link function implies that µ i = exp(xi T β).

8/15 Maximum likelihood estimator The log-likelihood function of β is l(β) = log{ = n e µ i µ Y i i Y i! Y i Xi T β The the MLE for β is ˆβ = arg max β } = Y i log(µ i ) exp(xi T β) [ Y i Xi T β µ i log(y i!). log(y i!) ] exp(xi T β).

9/15 Score function and hessian matrix The score function is l(β) β = {Y i exp(xi T β)}x i. The MLE ˆβ is a solution of l(β)/ β = 0. The Hessian matrix is 2 l(β) β β T = X i Xi T exp(xi T β) = X T VX, where X = (X 1,, X n ) T is an n p design matrix and V = diag{exp(x T 1 β),, exp(x T n β)}.

10/15 Asymptotic normality of ˆβ Applying the large sample theory of the maximum likelihood estimator ˆβ, we have ˆβ β N(0, (X T VX) 1 ). Wald type inference for β could be based on the asymptotic normality.

Deviance The log-likehood for µ i in a saturated model is l(µ i ) = {Y i log(y i ) Y i } + Const.. The log-likelihood for µ i is the full model with µ i = exp(xi T β) is l(β) = where ˆµ i = exp(x T i {Y i log(ˆµ i ) ˆµ i } + Const.. ˆβ) and ˆβ is the MLE of β. The deviance is then defined as D = 2 {Y i log(y i /ˆµ i ) (Y i ˆµ i )}. 11/15

12/15 Some remarks The likelihood ratio type inference could be conducted based on the deviance. The analysis of deviance can be done as that in logistic regression model. The model diagnostic and residual plots could be also done similarly as those in logistic regression model.

13/15 Over or under dispersion In poisson regression model, we assume that E(Y i ) = Var(Y i ) = µ i. Note that the mean and variance are the same. This might not be flexible in practice. A generalization of the Poisson regression model is E(Y i ) = µ i and Var(Y i ) = φµ i, where φ is the dispersion parameter.

14/15 Quasi-likelihood Similar to the logistic regression model, the quasi log-likelihood for β can be defined as Q(β) = µi Y i µ Y i φv (µ) dµ where V (µ) = µ and µ i = exp(xi T β). The estimation of β is the same as the usual poisson regression without dispersion parameter. The asymptotic normality of ˆβ is ˆβ β N(0, φ(x T VX) 1 ).

15/15 Estimation of dispersion parameter The dispersion parameter φ can be estimated by where ˆµ i = exp(x T i n ˆφ = (Y i ˆµ i )/ˆµ i. n p ˆβ).