Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

Faculty of Health Sciences Regression models Counts, Poisson regression, 27-5-2013 Lene Theil Skovgaard Dept. of Biostatistics 1 / 36

Count outcome PKA & LTS, Sect. 7.2 Poisson regression The Binomial and Poisson distributions Example: Fever episodes Confounding Poisson regression for survival data Home pages: http://biostat.ku.dk/~pka/regrmodels13 E-mail: ltsk@sund.ku.dk 2 / 36

Count variables Definition: A variable, that may take on any non-negative integer, i.e. 0, 1,... Examples: Number of fever episodes during pregnancy Number of metastases following an experimentally induced cancer in laboratory rats Number of deaths due to lung cancer in a year, in a specific region 3 / 36

Well...of course These variables cannot be infinitely large, 429623 fever episodes 895482143 metastases 50 million deaths but in practice they may be very large and perhaps with no well defined upper limit 4 / 36

The Binomial distribution If we have a well defined upper limit, c, we can represent the count as a sum of zeroes and ones, and if we can assume these to be independent, we know that y Bin(c, p) p being the probability of a one for each week of pregnancy organ in a rat inhabitant in a region P(u) = pr(y i = u) = ( ) c p u (1 p) c u u 5 / 36

Binomial distributions, for p=0.005, 0.05 and 0.3 6 / 36

Approximations to the Binomial distribution, I When c is large, and p is moderate ( 0.5), the Binomial distribution looks like a Normal distribution N (m, s 2 ) where the parameter m is the mean value (the expected count) and the standard deviation is s = m = cp cp(1 p) 7 / 36

Approximations to the Binomial distribution, II The law of rare events When c is large, and p is small, the Binomial distribution looks like a Poisson distribution pr(y i = u) = mu u! exp( m), where again the parameter m is the mean value (the expected count) m = cp and the standard deviation is SD = m = cp 8 / 36

Number of fever episodes What is a fever episode? A day with fever? A week where fever occurs? A period with fever, until it ends? We will take it to mean a pregnancy week with occurrence of fever 9 / 36

Notation c: the number of pregnancy weeks (observed), here c = 14 p i : the probability of a fever episode for the ith woman in any of the c pregnancy weeks (assumed to be identical for all weeks, i.e., independent of gestational age) v ij : an indicator of fever in week j for the ith woman y i : the number of fever episodes for the ith woman Note: y i = v i1 + + v ic, a sum of zeros and/or ones 10 / 36

Distribution of fever episodes If fever episodes occur independently of each other in separate weeks, we know that for a specific individual (the index i is omitted) y Bin(c, p) Since p is probably small, we may approximate with a Poisson distribution ( ) c pr(y = u) = p u (1 p) c u mu u u! exp( m) where m may depend on some covariates 11 / 36

Fever episodes, according to parity parity 0: no previous children, expecting first child Number of Fever Episodes Parity 0 1 2 3 4 5 6 7 8 9 10 12 0 4474 731 69 10 2 1 0 0 0 0 0 0 1 5219 1141 114 10 1 2 1 1 0 0 2 0 Total 9693 1872 183 20 3 3 1 1 0 0 2 0 many 0 s (no fever episodes) largest count is 10 out of 14 weeks 12 / 36

Distribution characteristics Fever Episodes Average Parity 0 1 Average SD 2 Age ˆm ŝ 2 0 4474 813 0.172 0.189 27.88 1 5219 1272 0.223 0.264 31.06 Total 9693 2085 0.200 0.231 29.63 Do we see reasonably identical averages and variances (squared standard deviations)? Do we see an effect of parity? The estimated ratio (of average number of fever episodes) is 0.172/0.223 = 0.7713 and highly significant 13 / 36

Model for fever episodes y i : the number of fever episodes for the ith woman, assumed to be Poisson distributed with mean m i = cp i We relate m i = E(y i ) to a linear predictor, using a logarithmic link (in order to respect positive probabilities): log(e(y i )) = log(m i ) = LP i and the linear predictor can then be modeled as a function of covariates. 14 / 36

Covariate effect: Parity Do children attract infection to the pregnant mother? x i,1 : the parity of the ith woman LP i = a + b 1 I (x i,1 = 0) We get the estimate ˆb 1 = 0.2558(0.0423), (P <0.0001) and therefore a clear marginal effect of parity, with back-transformed ratio 0.77(0.71, 0.84) But: This apparent difference might be due to other reasons: age at conception (as a quantitative variable with a linear effect) alcohol habits... Very few women drink more than one or two units a week, so we disregard this covariate 15 / 36

Covariate effect: Age x i,2 : the age at conception for the ith woman LP i = a + b 2 (x i,2 30) We find ˆb 2 = 0.00069(0.00491), so the effect of a 10 years increase is a factor 0.9931, P = 0.89, i.e. virtually no effect 16 / 36

Confounding between parity and age? Possibly... 17 / 36

Multiple regression model Linear predictor: log(e(y i )) = log(m i ) = LP i = a + b 1 I (x i,1 = 0) + b 2 (x i,2 30) choosing a woman of age 30 with previous children as the reference Estimate (CI) Ratio Estimate (CI) P Intercept 1.488 ( 1.541, 1.436) Parity 0 0.300 ( 0.390, 0.211) 0.741 (0.677, 0.810) <0.0001 1 0 1 Age, 10 years 0.140 ( 0.244, 0.035) 0.870 (0.783, 0.965) 0.0088 18 / 36

Interpretation,I Intercept A reference woman aged 30, with previous children is expected to have exp( 1.4882) = 0.226 fever episodes Parity Women with no previous children have a mean number of fever episodes of exp( 0.300) = 0.741 compared to women with previous children, i.e. approximately 26% less, provided that they have the same age The confidence interval ranges from 19% to 32% lower. 19 / 36

Interpretation,II Age Older women have a somewhat lower level of fever episodes: A ten-year increase in age yields an estimated decrease in the mean number of fever episodes of approximately 13% (CI 4 22%), for women with identical parities 20 / 36

Comparison of unadjusted and adjusted effects Ratio Estimate (CI) Covariate(s) Parity, 1 vs. 0 Age, 10 years Only parity 1.29 (1.19,1.40) Only age 0.99 (0.90, 1.09) Both age and parity 1.35 (1.23, 1.48) 0.87 (0.78, 0.97) 21 / 36

Comparison of unadjusted and adjusted effects, II Unadjusted (marginal) effects: More episodes for parity 1+ (Ratio 1.29 (1.19,1.40), P < 0.0001) Slight negative effect of age (Ratio for 10 years: 0.99 (0.90, 1.09), P=0.89) Adjusted effects: More episodes for parity 1+ (Ratio 1.35 (1.23,1.48), P < 0.0001) Significant negative effect of age (Ratio for 10 years: 0.87 (0.78, 0.97), P=0.0088) 22 / 36

Illustration of Confounding The association between parity and age (see the Boxplot on p. 17) results in a significant age effect when adjusting for parity We have an example of two closely related explanatory variables that have opposite effects on the outcome: Women with children have a higher risk but older women have a lower risk 23 / 36

Interaction? Interaction between parity and age (as a linear effect): No: Estimated difference in the age effect of 0.0047 (0.0109) The age effect is somewhat more pronounced for women of parity 0, but not at all significantly, P = 0.66 24 / 36

Model check for linearity in age Residual plots for the model, and smoothed version (parity 1: dots, solid curve, parity 0:circles, dashed curve) 25 / 36

Model with splines in age Predicted values for age effects in the two parity groups, linear spline, with breaks at age 20 and 30 (parity 1: solid curve, parity 0: dashed curve) The deviation from linearity is not significant, P = 0.57 26 / 36

Goodness-of-fit test for model Observed and expected number of fever episodes in ten subgroups according to predicted values: Predicted Mean Number of Number of Number of Fever Episodes Fever Episodes Women Observed (O) Expected (E) O E E 0.138 0.166 1176 188 187.57 0.031 0.166 0.172 1179 197 199.18 0.154 0.172 0.177 1179 212 205.56 0.449 0.177 0.183 1177 196 211.40 1.059 0.183 0.207 1178 239 229.77 0.609 0.207 0.215 1179 273 249.63 1.479 0.215 0.221 1177 229 257.01 1.747 0.221 0.227 1179 265 264.13 0.054 0.227 0.234 1176 272 270.54 0.088 0.234 0.267 1178 287 283.20 0.226 Overall chi-squared statistic of 7.02, P = 0.53 27 / 36

Goodness-of-fit, continued Comparison of observed and expected number of women, according to number of fever episodes: Number of Number of Women O E Fever Episodes Observed (O) Expected (E) E 0 9693 9644.63 0.492 1 1872 1923.71 1.179 2 183 194.44 0.890 3 30 14.21 4.189 Test statistic: 19.97 χ 2 (2), P <0.0001 Too many 0 s and 3-categories Overdispersion? 28 / 36

Comparison to other approaches The Poisson distribution is used here as an approximation to the Binomial distribution Compare to assuming the distribution to be Bin(c = 14, p) and choosing the link function to be log (close to logit since p is small), with the same linear predictor a model assuming Normality, with log-link (even though of course the number of fever episodes is restricted to nonnegative integers) 29 / 36

Alternative approaches Comparison of estimates in models assuming Poisson, Normal, and Binomial distributions: Parity 0 vs. 1 Age, 10 Years Prediction for Model Estimate (SD) P-Value Estimate (SD) P-Value Age 30, Parity 1 Poisson 0.300 (0.046) < 0.0001 0.140 (0.053) 0.0088 0.226 (0.214, 0.238) Binomial 0.300 (0.045) < 0.0001 0.139 (0.053) 0.0083 0.222 (0.211, 0.234) Normal, log-link 0.300 (0.050) < 0.0001 0.141 (0.058) 0.015 0.226 (0.214, 0.238) Somewhat larger SD for normality analysis Overdispersion? 30 / 36

Poisson regression for survival data In the Cox regression model (n c covariates) the log(hazard) is: log(h 0 (t)) + b 1 x i,1 +... + b nc x i,nc. Here, the baseline hazard, h 0 (t) is completely unspecified - no assumptions about the shape of the function. An alternative is to approximate h 0 (t) by a function which is piecewise constant the Poisson regression model for survival data. 31 / 36

Poisson regression for melanoma data For illustration, we use a model based on 3 intervals with cuts at 2.5 and 5 years: Table: Results from fitting a Cox and a Poisson regression model to the malignant melanoma survival data. Cox Poisson Covariate b SD b SD Gender 0.413 0.240 0.396 0.240 Tumor thickness 0.0994 0.0345 0.0964 0.0346 Ulceration 0.952 0.268 0.960 0.269 Age 0.218 0.0775 0.222 0.0763 Intercept (log(ĥ 01 )) 5.093 0.523 Intercept (log(ĥ 02 )) 4.936 0.506 Intercept (log(ĥ 03 )) 4.963 0.476 32 / 36

Poisson regression with categorical covariates The piecewise constant hazard model is particularly attractive when all covariates are categorical because, in this case, data may be reduced to tables of counts and person-years at risk. These tables are sufficient to fit the model. 33 / 36

Table: Failure counts/person-years at risk for the malignant melanoma survival data according to tumor thickness, ulceration, and three time intervals. Time < 2.5 years Tumor thickness Ulceration 0 2 mm 2 5 mm 5+ mm Absent 1/53.47 11/96.12 12/47.12 Present 3/212.30 3/50.00 0/17.50 Time 2.5 5 years Tumor thickness Ulceration 0 2 mm 2 5 mm 5+ mm Absent 4/47.13 9/64.54 4/26.91 Present 4/193.60 2/42.88 1/15.35 Time 5 years Tumor thickness Ulceration 0 2 mm 2 5 mm 5+ mm Absent 1/44.88 6/38.87 0/28.88 Present 7/151.97 2/59.44 1/17.32 34 / 36

Poisson regression for survival data: Comments Nice features: The model works with the standard epidemiological rates A substantial data reduction is obtained in large (e.g., register-based) studies As exemplified, results tend to be very similar to those based on a Cox regression model The model may be fitted using standard software Time is treated as a factor in the model in the same way as other categorical covariates and, therefore, examination of proportional hazards is a simple time covariate interaction A less nice feature is that the analysis depends on the choice of intervals. 35 / 36

Why: Poisson regression Even though there is no assumption in the model of anything having a Poisson distribution, the model may be fitted by, formally, treating the failure counts as Poisson with log(person-years at risk) being a so-called offset in the model. This is because the likelihood function for such a model is proportional to the likelihood function based on the piecewise constant hazard model. 36 / 36