Diagnostics in Poisson Regression. Models - Residual Analysis

Dagnostcs n Posson Regresson Models - Resdual Analyss 1

Outlne Dagnostcs n Posson Regresson Models - Resdual Analyss Example 3: Recall of Stressful Events contnued 2

Resdual Analyss Resduals represent varaton n the data that cannot be explaned by the model. Resdual plots useful for dscoverng patterns, outlers or msspecfcatons of the model. Systematc patterns dscovered may suggest how to reformulate the model. If the resduals exhbt no pattern, then ths s a good ndcaton that the model s approprate for the partcular data. 3

Types of Resduals for Posson Regresson Models Raw resduals: O E Pearson resduals: (or standardsed) O E E Adjusted resduals O E (preferred): SD(O E ) S D ( O - E ) = E ( 1 - E / n ) 4

If H 0 s true, the adjusted resduals have a standard normal N(0,1) dstrbuton for large samples. Back to the Recall of Stressful Events example 5

Example 3: Recall of Stressful Events Data month adjusted resdual month adjusted resdual 1 2.46 10 0.66 2 1.02 11-0.42 3 2.10 12 0.30 4 3.18 13 1.02 5-1.14 14-1.86 6 1.02 15-0.78 7 0.66 16-2.58 8-1.50 17-2.58 9-0.06 18-1.50 6

Example 3: Recall of Stressful Events If the adjusted resduals follow N(0,1), we expect 18 0.05 = 0.9 1 adjusted resdual larger than 1.96 or smaller than -1.96 Months 1, 3, 4 Months 16, 17 postve adjusted resduals negatve adjusted resduals More lkely to report recent events (postve resdual: means observed s larger than expected,.e. more lkely to report a stressful event n a months mmedately pror to ntervew) 7

Example 3: Recall of Stressful Events Plot of adjusted resduals by month shows downward trend. 8

Normal Q-Q Plot Probablty plot, whch s a graphcal method for comparng two probablty dstrbutons by plottng ther quantles aganst each other (Q stands for Quantles,.e. quantle aganst quantle) Plots observed quantles aganst expected quantles, hence plot quantles of adjusted resduals aganst the quantles of the standard normal. Ponts should le close to y = x lne, f adjusted resduals are N(0,1). 9

Conclusons Dvergence n the tals from straght lne Strong evdence that equprobable model does not ft the data. More lkely to report recent events. Such a tendency would result f respondents were more lkely to remember recent events than dstant events. So, use another model. Whch one? Let us explore a Posson tme trend model (a Posson model wth a covarate) 11

Posson Regresson Model wth a Covarate Posson Tme Trend Model 12

Outlne Posson Tme Trend Model Posson regresson model wth a covarate Example 3: Recall of Stressful Events contnued 13

Posson Tme Trend Model ncludng a covarate y = number of events n month. The log of the expected count s a lnear functon of tme before ntervew: log( μ ) α β 1,, C where µ = E(y ) s the expected number of events n month. Meanng: model assumes response varable Y has a Posson dstrbuton, and assumes the logarthm of ts expected value can be modelled by a lnear combnaton of unknown parameters (here 2). 14

Posson Tme Trend Model ncludng a covarate For model log( μ ) α β 1,, C β = 0 equprobable model (no effect of the covarate). β > 0 the expected count µ ncreases as month () ncreases. β < 0 the expected count µ decreases as month () ncreases. 15

Posson Model wth one covarate In general, the form of a Posson model wth one explanatory varable (X) s: lo g ( m ) = a + b x Explanatory varable X can be categorcal or contnuous. In ths example month s the explanatory varable X. 16

Posson Model: Parameter Estmates Predcted Values and Resduals Parameters estmated va maxmum lkelhood estmaton and. αˆ These estmates are used to compute predcted values: βˆ log( μˆ ) αˆ βˆ μˆ exp( αˆ βˆ ) The adjusted resduals are μˆ y (1 μˆ μˆ /n) where n C 1 y 17

Posson Model: Goodness of Ft Statstcs The goodness-of-ft test statstcs are Pearson Ch-squared Test Statstc 2 C 1 y μˆ μˆ 2 Lkelhood Rato Test Statstcs (Devance) L 2 2 C 1 y log y μˆ 18

The null hypothess s H 0 : The model y ~ Posson(µ ) wth log(µ ) = α+β holds. Returnng to our example μˆ expected counts the model) are (under 19

Example 3: Recall of Stressful Events Month Count Obs Count Exp Month Count Obs Count Exp 1 15 15.1 10 10 7.1 2 11 13.9 11 7 6.5 3 14 12.8 12 9 6.0 4 17 11.8 13 11 5.5 5 5 10.8 14 3 5.1 6 11 9.9 15 6 4.7 7 10 9.1 16 1 4.3 8 4 8.4 17 1 4.0 9 8 7.7 18 4 3.6 Expected value calculated usng the model: mˆ = e x p ( aˆ + b ˆ ) 20

If H 0 s true X 2 ~ χ 2 df L 2 ~ χ 2 df where df = no. of cells no. of model parameters = C 2 = 18 2 = 16 X 2 = 22.71 wth 16 df. p-value = 0.1216 do not reject H 0. L 2 = 24.57 wth 16 df. p-value = 0.0778 do not reject H 0. 21

Concluson The Posson tme trend model s consstent wth the data. Partcpants are more lkely to report recent events. 22

Adjusted Resduals month adjusted resdual month adjusted resdual 1-0.05 10 1.11 2-0.89 11 0.18 3 0.36 12 1.26 4 1.61 13 2.42 5-1.86 14-0.98 6 0.34 15 0.64 7 0.28 16-1.70 8-1.58 17-1.60 9-0.09 18 0.20 23

Adjusted Resduals If the adjusted resduals follow N(0,1), we expect 18 0.05 = 0.9 1 adjusted resdual larger than 1.96 or smaller than -1.96. So, t s not surprsng that we see one resdual (month 13: 2.42) exceed 1.96 (.e. no evdence aganst H 0 ). Adjusted resduals show no obvous pattern based on plot by month. 24

Posson Regresson: Model Interpretaton Ftted values help wth nterpretaton Ftted values are obtaned by The coeffcent s nterpreted whether the explanatory varable s a categorcal or a contnuous varable. yˆ = mˆ = e x p ( aˆ + b ˆ ) bˆ 25

Posson Regresson: Model Interpretaton cont. For a bnary explanatory varable, where X=0 means absence and X=1 means presence, the rate rato (RR) for presence vs. absence RR E y P r e s e n c e E y x 1 E y A b s e n c e E y x 0 e x p ( ) For a contnuous explanatory varable a one unt ncrease n X wll result n a multplcatve effect of on μ e x p ( ) 26

Confdence Interval for the Slope Estmate of the slope s βˆ = -0.0838 wth estmated standard error = 0.0168. Therefore a 95% CI for β s: = -0.0838 ± 1.96 х 0.0168 = (-0.117 ; -0.051 ) (therefore negatve, CI does not nclude 0), the expected count µ decreases as month () ncreases se( βˆ ) 27

Estmated Change To add nterpretaton, the estmated change (from one month to the next) n proportonate terms s μˆ μˆ μˆ μˆ 1 1 Ê (Y ) exp( ) μˆ 1 exp( αˆ βˆ ( 1)) 1 exp( αˆ βˆ ) μˆ αˆ βˆ exp( βˆ ) 1 28

Estmated Change For our example Recall of Stressful Events ths means: mˆ - + 1 mˆ mˆ = e x p (- 0. 0 8 3 8 ) - 1 = 0.920-1 = - 0.08.e. the estmated change s an 8% decrease per month. 29

CI for the Estmated Change Lower Bound Upper Bound β 0.117 0.051 exp(β) exp(-0.117) = 0.89 exp(-0.051) = 0.95 exp(β) - 1 0.89-1 = -0.11 0.95-1 = -0.05-11% -5% Estmated change s -8% per month wth a 95% CI of (-11%, -5%). The number of stressful events decreases. 30

References Agrest, A (2013) Categorcal Data Analyss, 3 rd ed, New Jersey, Wley. Agrest, A. (2007) An Introducton to Categorcal Data Analyss. 2 nd ed, Wley. Haberman, S. J. (1978) Analyss of Qualtatve Data, Volume 1, Academc Press, New York. p. 3 for Recall of stressful events example. p. 87 for Sucdes example. 31