Lecture 14: Introduction to Poisson Regression

Similar documents
Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Log-linear Models for Contingency Tables

Lecture 12: Effect modification, and confounding in logistic regression

Linear Regression Models P8111

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

Single-level Models for Binary Responses

BMI 541/699 Lecture 22

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Modeling Overdispersion

Multinomial Logistic Regression Models

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Chapter 22: Log-linear regression for Poisson counts

Introduction to General and Generalized Linear Models

Introduction to the Analysis of Tabular Data

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

STA102 Class Notes Chapter Logistic Regression

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation

8 Nominal and Ordinal Logistic Regression

STAT 7030: Categorical Data Analysis

Cohen s s Kappa and Log-linear Models

Generalised linear models. Response variable can take a number of different formats

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

Section Poisson Regression

Exam Applied Statistical Regression. Good Luck!

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

Lecture 2: Probability and Distributions

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

9 Generalized Linear Models

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102

Chapter 1 Statistical Inference

Generalized logit models for nominal multinomial responses. Local odds ratios

Lecture 10: Introduction to Logistic Regression

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Lecture 7 Time-dependent Covariates in Cox Regression

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Probability: Why do we care? Lecture 2: Probability and Distributions. Classical Definition. What is Probability?

Logistic Regression. Some slides from Craig Burkett. STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy

Introducing Generalized Linear Models: Logistic Regression

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

We know from STAT.1030 that the relevant test statistic for equality of proportions is:

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Today. HW 1: due February 4, pm. Aspects of Design CD Chapter 2. Continue with Chapter 2 of ELM. In the News:

Chapter 26: Comparing Counts (Chi Square)

Binary Logistic Regression

Chapter 20: Logistic regression for binary response variables

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

Inference and Regression

Discrete Multivariate Statistics

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

An Introduction to Mplus and Path Analysis

Model Estimation Example

R Hints for Chapter 10

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios

Lecture 4 Multiple linear regression

Lecture 5: ANOVA and Correlation

UNIVERSITY OF TORONTO Faculty of Arts and Science

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Confidence Intervals for the Odds Ratio in Logistic Regression with One Binary X

McGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper

Hypothesis testing. Data to decisions

Classification. Chapter Introduction. 6.2 The Bayes classifier

Modeling the Mean: Response Profiles v. Parametric Curves

Binomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials

Logistic Regressions. Stat 430

An Introduction to Path Analysis

Confidence Intervals for the Odds Ratio in Logistic Regression with Two Binary X s

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )

Regression Methods for Survey Data

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Poisson regression: Further topics

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Logistic Regression - problem 6.14

Exercise I.1 I.2 I.3 I.4 II.1 II.2 III.1 III.2 III.3 IV.1 Question (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Answer

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).

Lecture 5: Poisson and logistic regression

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Thursday, August 30, 2018

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Lecture 1 Introduction to Multi-level Models

Generalized linear models

Homework 1 Solutions

Lecture 2: Poisson and logistic regression

Lecture 2: Categorical Variable. A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

Logistic regression analysis. Birthe Lykke Thomsen H. Lundbeck A/S

Section 4.6 Simple Linear Regression

Ron Heck, Fall Week 3: Notes Building a Two-Level Model

Transcription:

Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52

Overview Modelling counts Contingency tables Poisson regression models 2 / 52

Modelling counts I Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week Number of coughs in a concert hall, per minute Number of customers arriving in a shop, daily 3 / 52

Modelling counts II Counts show up in all kinds of scenarios in our every day life In public health, count data are particularly important in measuring disease rates: We often count the number of people acquiring a particular disease in a given year Modelling counts gives us a framework in which to estimate disease incidence Poisson regression will allow us to measure association between incidence, and risk factors of interest adjusting for other related covariates 4 / 52

A whole new model to deal with counts? I So far we have talked about: Linear regression: for normally distributed errors Logistic regression: for binomial distributed errors Typically, our count data do not follow either of these distributions Counts are not binary (0 / 1) Counts are discrete, not continuous Counts typically have a right skewed distribution 5 / 52

A whole new model to deal with counts? II So far, the regression strategies we ve discussed allow us to model: Expected values and expected increase in linear regression Log odds or log odds ratios in logistic regression In modelling counts, we are typically more interested in Incidence rates Incidence ratios (when comparing across levels of a risk factor) Poisson regression will provide us with a framework to handle counts properly! 6 / 52

Poisson Assumptions Before going on to talk about Poisson regression, let s revisit some key concepts about the Poisson distribution: The occurrences of a random event in an interval of time are independent In theory, an infinite number of occurrences of the event are possible (though perhaps rare) within the interval In any extremely small portion of the interval, the probability of more than one occurrence of the event is approximately zero 7 / 52

Poisson Probability I The probability of x occurrence of an event in an interval is: P(X = x) = e λ λ x x!, x = 0, 1, 2,... where λ = the expected number of occurrences in the interval e = a constant ( 2.718) For the Poisson distribution: mean = variance = λ Mode = floor (λ), largest integer less than λ We can also think of λ as the rate parameter 8 / 52

Poisson Probability II Poisson probability mass function for λ = 5 Poisson probablity, λ=5 Pr(X=x) 0.00 0.05 0.10 0.15 5 10 15 20 x 9 / 52

Poisson Probability III Poisson probability mass function for λ = 30 Poisson probablity, λ=30 Pr(X=x) 0.00 0.02 0.04 0.06 0 10 20 30 40 50 60 70 x 10 / 52

Poisson and Binomial The Poisson distribution can be used to approximate a binomial distribution when: n is large and p is very small, or np = λ is fixed, and n becomes infinitely large 11 / 52

Example: Cancer in a large population Yearly cases of esophageal cancer in a large city 30 cases observed in 1990 P(X = 30) = e λ λ 30 30! λ = yearly average number of cases of esophageal cancer 12 / 52

Example: Belief in Afterlife I Men and women were asked whether or not they believed in afterlife (General Social Survey 1991) Possible responses were: yes, no, or unsure Y N or U M 435 147 582 F 375 134 509 Total 810 281 1091 13 / 52

Example: Belief in Afterlife II Question: Is belief in the afterlife independent of gender? We could address this question using a χ 2 test To perform a χ 2 test: Begin by stating our model: independece Then, we calculate expected cell counts for each entry in the table 14 / 52

Example: Belief in Afterlife III Under an independence model, the probability of belief in afterlife for both genders would be estimated as number who believe ˆp = P(belief in afterlife) = total number asked = 810 1091 0.742 Then, we calculate the expected counts as: E(males answering yes) = # men asked ˆp = 582 0.742 = 432 E(females answering yes) = # women asked ˆp = 509 0.742 = 378 15 / 52

Example: Belief in Afterlife IV The observed and expected cell counts are as follows: Y N or U M 435 (432) 147 (150) 582 F 375 (378) 134 (131) 509 Total 810 281 1091 Then, the χ 2 statistic is calculated as: i,j (O ij E ij ) 2 E ij = (435 432)2 432 + (147 150)2 150 (375 378)2 + 378 = 0.173 + (134 131)2 131 16 / 52

Example: Belief in Afterlife V Remeber, we are testing the null hypothesis of independence, versus the alternative that the proportion believing in afterlife differs across genders: H 0 : H a : p male = p female = p overall p male p female We decided to perform the hypothesis test using a χ 2 test. Here, the appropriate degrees of freedom are: (number of rows 1)(number of columns 1) = (2 1)(2 1) = 1 17 / 52

Example: Belief in Afterlife VI Let s test the hypothesis at level α = 0.05. Look up the appropriate critical value in R: > qchisq(1-0.05, df=1) [1] 3.841459 Our observed χ 2 statistic of 0.173 is much smaller than the critical value We can also look up the corresponding p-value for our test: > pchisq(0.173, df=1, lower.tail=f) [1] 0.6774593 Conclusion: fail to reject the null hypothesis 18 / 52

Example: Belief in Afterlife Poisson Model I Just now, we had to calculate the expected counts to perform the χ 2 test Actually, we could have written down a linear model to express the expected counts systematically Make use of the Poisson approximation to binomial Model: Y ij Poisson(λ ij ) λ ij = λ α male γresponse Here we interpret λ ij as the Poisson rate for the cell in the i th row and j th column λ is the baseline rate, α is the male effect, and γ is the response 19 / 52

Example: Belief in Afterlife Poisson Model II Recall that the log of a product is equal to the sum of the log: log(a b) = log(a) + log(b) So, let s log-transform our model... originally we had: λ ij = λ α male γresponse Taking the log of both sides, we have: log(λ ij ) = log λ + log(α male ) + log(γ response ) This is looking like a linear model... 20 / 52

Example: Belief in Afterlife Poisson Model III We stated the systematic portion of this model as log(λ ij ) = log λ + log(α male ) + log(γ response ) which we can also write using β s to look like other linear models: log(λ ij ) = β 0 + β 1 I (male) + β 2 I (response) The probabilistic portion of this model enters as: Y ij Poisson(λ ij ) 21 / 52

Example: Belief in Afterlife Poisson Model IV In this Poisson regression model: The outcome is the log of the expected cell count The baseline β 0 is the log expected cell count for females responding no β 1 is the increase in log expected cell count for males compared to females β 2 is the increase in log expected cell count for the response yes compared to no 22 / 52

Fitting the afterlife model in R I Data format We began with the view of our data in a table Y N or U M 435 (432) 147 (150) 582 F 375 (378) 134 (131) 509 Total 810 281 1091 However, R expects to see unique observations listed, together with relevant covariates (indicators here), as follows: count male yes 1 435 1 1 2 147 1 0 3 375 0 1 4 134 0 0 23 / 52

Fitting the afterlife model in R II Once the data is entered in R, we can analyze it using the glm command, specifying the family as poisson: > summary(out<-glm(count ~ male + yes, family=poisson)) Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) 4.87595 0.06787 71.839 <2e-16 *** male 0.13402 0.06069 2.208 0.0272 * yes 1.05868 0.06923 15.291 <2e-16 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Null deviance: 272.68544 on 3 degrees of freedom Residual deviance: 0.16200 on 1 degrees of freedom AIC: 35.407 24 / 52

Fitting the afterlife model in R III So we fit the model: log(λ ij ) = β 0 + β 1 I (male) + β 2 I (response) and our fitted model is: log(λ ij ) = 4.88 + 0.134 I (male) + 1.06 I (response) 25 / 52

Predicting log expected cell counts I Using the fitted model: log(λ ij ) = β 0 + β 1 I (male) + β 2 I (response) we can get predicted values for log counts in each of the four cells: For females responding no : log(e(count female, no)) = 4.88 + 0.134 0 + 1.06 0 = 4.88 For males responding no : log(e(count male, no)) = 4.88 + 0.134 1 + 1.06 0 = 5.01 26 / 52

Predicting log expected cell counts II Using the fitted model: log(λ ij ) = β 0 + β 1 I (male) + β 2 I (response) we can get predicted values for log counts in each the four cells: For females responding yes : log(e(count female, yes)) = 4.88 + 0.134 0 + 1.06 1 = 5.94 For males responding yes : log(e(count male, yes)) = 4.88 + 0.134 1 + 1.06 1 = 6.07 27 / 52

Predicting expected cell counts I Using the fitted model: log(λ ij ) = β 0 + β 1 I (male) + β 2 I (response) we can get predicted values for counts in each of the four cells: For females responding no : E(count female, no) = exp(4.88) = 131 For males responding no : E(count male, no) = exp(4.88 + 0.134 1) = exp(5.01) = 150 28 / 52

Predicting expected cell counts II Using the fitted model: log(λ ij ) = β 0 + β 1 I (male) + β 2 I (response) we can get predicted values for counts in each of the four cells: For females responding yes : E(count female, yes) = exp(4.88 + 1.06) = exp(5.94) = 378 For males responding yes : E(count male, yes) = exp(4.88+0.134+1.06) = exp(6.07) = 432 29 / 52

Comparing model fit in R with our fit by hand When we did our χ 2 test, we calculated the expected cell counts by hand: Estimate ˆp = 810 1091, probability of responding yes under the independence model ˆp (or 1 ˆp) by the number of males (or females) to get the expected number in each cell Expected cell counts represented the numbers we should have gotten under a perfect independence model We obtained the expected cell counts: Y N or U M 432 150 582 F 378 131 509 Total 810 281 1091 which are exactly what we got by Poisson regression! 30 / 52

Afterlife coefficients I By fitting the independence model, we force the relative rate of responding yes versus no to the question of belief in the afterline to be fixed across males and females Deviation from the independence model suggests the proportion of those believing in afterlife differs by gender 31 / 52

Afterlife coefficients II Under the independence model: β 0 = 4.88 is the log expected count of females responding no, the baseline group β 1 = 0.134 is the difference in log expected counts comparing males to females β 2 = 1.05 is the difference in log expected counts for yes responses compared to no responses 32 / 52

Afterlife coefficients III Under the independence model: exp(β 0 ) = 131.6 is the expected count for females responding no, the baseline group expβ 1 = 1.14 is the ratio comparing the counts of males to females exp(β 2 )=2.85 is the ratio of the number of yes responses compared to no responses 33 / 52

A regression model for count data Contingency table: regression model Suppose we have r categories on one axis and c categories on the other We can think of an r c contingency table with r rows and c columns Independence model: log(e(y ij )) = log(λ ij ) = log(λ) + log(α i ) + log(γ j ) To test independence, we write out χ 2 test statistic as: (O ij E ij ) 2 i,j E ij which is distributed as χ 2 (r 1)(c 1) of independence under the null hypothesis 34 / 52

Relation to other types of regression Both logistic and poisson regression are types of linear models Extend ideas from linear regression with continuous outcomes to: binary outcomes (logistic regression) count data (Poisson regression) We say logistic regression uses the logit link Poisson regression uses a log link Poisson models is also called log-linear models Regression Outcome Link Link type Distribution name function Linear Normal identity f (µ) = µ Logistic Binary logit logit(p) = log( p 1 p ) Log-linear Poisson log log(p) 35 / 52

Interpretation I Under the model log(e(y ij )) = log(λ ij ) = log(λ) + log(α i ) + log(γ j ) we can interpret: λ ij is the predicted rate for the i th row and j th column of our contingency table λ is the baseline rate 36 / 52

Interpretation II Under the model log(e(y ij )) = log(λ ij ) = log(λ) + log(α i ) + log(γ j ) we can interpret: α i is the rate ratio, the effect of being in row i compared to baseline (and holding column fixed) γ j is the rate ratio, the effect of being in column j compared to baseline (and holding row fixed) The rate ratio is the factor by which the baseline is multiplied to get the rate, or expected cell count, for our particular non-baseline category 37 / 52

Example: Customers at a lumber company I Outcome: Y = number of customers visiting store from region Predictors: X 1 : number of housing units in region X 2 : average household income X 3 : average housing unit age in region X 4 : distance to nearest competitor X 5 : average distance to store in miles Counts are obtained for 110 regions, so our n=110 38 / 52

Example: Customers at a lumber company II Poisson (log-linear) regression model Given observations and covariates: Y i, X ij, 1 i n, 1 j p Systematic component of model: log(λ i ) = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 3 + β 4 X 4 + β 5 X 5 Probabilistic component of model: Y i X i Poisson(λ i ) 39 / 52

Example: Customers at a lumber company III Here s a look at some of the data: Customers Housing Income Age Competitor Store 1 9 606 41393 3 3.04 6.32 2 6 641 23635 18 1.95 8.89 3 28 505 55475 27 6.54 2.05 4 11 866 64646 31 1.67 5.81 5 4 599 31972 7 0.72 8.11 6 4 520 41755 23 2.24 6.81 7 0 354 46014 26 0.77 9.27 8 14 483 34626 1 3.51 7.92 9 16 1034 85207 13 4.23 4.40... 40 / 52

The Distribution of Customer counts Histogram of Customers The customer counts distribution is clearly not normally distributed Linear regression would not work well here Log-linear regression will work just fine Frequency 0 10 20 30 40 0 5 10 15 20 25 30 35 Customers 41 / 52

The fitted model I > summary(lumber.glm <- glm(customers ~ Housing + Income +Age +Competitor +Store, family=poisson()) ) Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) 2.942e+00 2.072e-01 14.198 < 2e-16 *** Housing 6.058e-04 1.421e-04 4.262 2.02e-05 *** Income -1.169e-05 2.112e-06-5.534 3.13e-08 *** Age -3.726e-03 1.782e-03-2.091 0.0365 * Competitor 1.684e-01 2.577e-02 6.534 6.39e-11 *** Store -1.288e-01 1.620e-02-7.948 1.89e-15 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Null deviance: 422.22 on 109 degrees of freedom Residual deviance: 114.99 on 104 degrees of freedom 42 / 52

The fitted model II We wrote down the model: log(λ i ) = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 3 + β 4 X 4 + β 5 X 5 and obtained the parameter estimates in R to be: log(λ i ) = 2.94 + 0.00061Housing i 0.000017Income i 0.0037β 3 Age i + 0.168Competitor i 0.129Store i 43 / 52

Interpretation I We interpret ˆβ 0 = 2.94 as the baseline log expected count, or log rate, in the group with all covariates (housing, income, age, distance to nearest competitor, and average distance to store) set to zero exp( ˆβ 0 ) = exp(2.94) = 18.9 is the expected count of customers in the baseline group This baseline value does not quite make sense It may be helpful to center our covariates, but this is not a big deal if we don t care about baseline because our primary inference is about the increase with respect to covariates 44 / 52

Interpretation II We interpret ˆβ 1 = 6.05 10 4 as the increase in log expected count, or the log rate ratio comparing districts whose number of housing units differ by one, adjusting for other covariates exp( ˆβ 1 ) = exp(6.05 10 4 ) = 1.000605 is the rate ratio comparing districts whose mean housing units differ by one, adjusting for other covariates exp(100 ˆβ 1 ) = exp(100 6.05 10 4 ) = 1.062 is the rate ratio comparing districts whose mean housing units differ by one hundred Keeping other factors constant, a 100 unit increase in housing units, would yield an expected 6.2% increase in customer count 45 / 52

Interpretation III Although our reported estimate for β 1 is very small, so is the standard error (1.42 10 4 ) The z-statistic reported in R is 4.26, with a p-value of 2.02 10 5 We can conclude that there is a statistically significant association between the customer count and mean housing units in the region We could have calculated this z-statistic ourselves as This z-statistic is also called a Wald statistic ˆβ 1 SE( ˆβ 1 ) 46 / 52

Interpretation IV We interpret ˆβ 2 = 1.169 10 5 to be the increase in log expected count, or the log rate ratio comparing districts whose mean income differs by $1, adjusting for other covariates exp( ˆβ 2 ) = exp( 1.169 10 5 ) = 0.999 is the relative rate comparing regions whose mean income differs by $1, adjusting for other covariates 47 / 52

Interpretation V exp(10000 ˆβ 2 ) = exp(10000 1.169 10 5 ) = 0.889 is the relative rate comparing regions whose mean income differs by $10000, adjusting for other covariates Keeping other factors constant, the relative rate comparing regions whose average income differs by $10000 is 0.889 For a $10000 change in average income, we would predict an 11% decrease in the rate of customers visiting the store 48 / 52

Interpretation VI A 95% confidence interval for β 2 is ˆβ 2 ± 1.96 SE( ˆβ 2 ) = 1.169 10 5 ± 1.96 2.11 10 6 = ( 1.58 10 5, 7.55 10 6 ) A 95% confidence interval for exp(β 2 ) is e ( 1.58 10 5, 7.55 10 6) = (0.9999842, 0.9999924) A 95% confidence interval for exp(10000β 2 ) is e (10000 ( 1.58 10 5 ),10000 ( 7.55 10 6 )) = (0.854, 0.927) 49 / 52

Interpretation VII The 95% confidence interval for the relative rate of customers visiting the store is (0.854, 0.927) for a $10,000 increase in average income for the region Question: Based on this model, if we are going to choose a location to build a new store, should we choose areas with higher or lower income? Does it matter? 50 / 52

Summary I Poisson regression gives us a framework in which to build models for count data It is a special case of generalized linear models, so it is closely related to linear and logistic regression modelling All of the same modelling techniques will carry over from linear regression: Adjustment for confounding Allowing for effect modification by fitting interactions Splines and polynomial terms 51 / 52

Summary II Can test significance of regression coefficients using z-statistics (also called Wald statistic) Since Poisson regression specifies a model, we can calculate likelihood likelihood ratio tests will apply for testing nested models Will need a new interpretation for regression coefficients: Intercept term interpreted as log rate for baseline Coefficients for covariates interpreted as log rate ratio, or log relative rate Exponentiate to get rid of the logs 52 / 52