The Poisson trick for matched two-way tles a case for putting the fish in the bowl (a case for putting the bird in the cage) Simplice Dossou-Gbété1, Antoine de Falguerolles2,* 1. Université de Pau et des Pays de l Adour 2. Université Paul Satier (Toulouse III) * Antoine -at- Falguerolles.net 31 January 2011
Plan Key ideas Matched two-way tles Objectives Poisson trick The suicide data: age, method and gender Data CAs for the two matched tles Plots Bird Fish Bilinear models restricted two-way interaction Case of two matched tles Poisson-Multinomial trick for two matched tles References
Key ideas Matched two-way tles Analysis of dissimilarity/similarity between tles Poisson trick
Matched two-way tles matched two-way tles The m tles of counts classified by factor A (row) and factor B (column), Yk SAB, their margins Yk SA and Yk SB and total count Yk S y1 SAB y1 SA (y SB 1 ) y S 1... y SAB s y SA s (y SB s ) y S s... y SAB #S y SA #S (y SB #S ) y S #S The marginal two-way tle (and its margins) y AB y A (y B ) y
Objectives Objectives Similarity/Dissimilarity between tles row profiles or column profiles May involve some preprocessing of tles by unifying margins by biproportional fitting (RAS, Iterative Proportional Fitting, matrix Raking) row profiles (column profiles) by weighting tles, profiles into tles, common metric
Poisson trick Poisson trick Y SAB s independent Poisson E[Ys SAB SAB ] = var(ys ) E[Ys SAB ] = m(βab + restricted(βsab s )) Ys SAB #S s=1 Y s SAB = y AB multinomial with known parameter: y AB probilities: m(β AB + restricted(βsab s )) m(βab m k=1 m(βab + = restricted(βsab s )) + restricted(βsab s )) y AB
Poisson trick Poisson trick for two matched tles Particular case: two matched tles (#M = 2) independant Poisson counts E[Ys SAB ] (s = 1, 2) exponential mean function (log link function): m = exp, m 1 = log model: all two-way interactions of A, B and F E[Ys SAB ] = exp(β AB + βsa SA + βsb SB ) Y2 SAB binomial B(y AB, πab 2 ) model: additivity of effects of A and B logit(π2) AB = β2a SA + β2b SB Works also with the inclusion of a reduced rank interaction in the predictor
Data Male Method Age c1 c2 c3 c4 c5 c6 c7 c8 c9 10-15 4 0 0 247 1 17 1 6 9 15-20 348 7 67 578 22 179 11 74 175 20-25 808 32 229 699 44 316 35 109 289 25-30 789 26 243 648 52 268 38 109 226 30-35 916 17 257 825 74 291 52 123 281 35-40 1118 27 313 1278 87 293 49 134 268 40-45 926 13 250 1273 89 299 53 78 198 45-50 855 9 203 1381 71 347 68 103 190 50-55 684 14 136 1282 87 229 62 63 146 55-60 502 6 77 972 49 151 46 66 77 60-65 516 5 74 1249 83 162 52 92 122 65-70 513 8 31 1360 75 164 56 115 95 70-75 425 5 21 1268 90 121 44 119 82 75-80 266 4 9 866 63 78 30 79 34 80-85 159 2 2 479 39 18 18 46 19 85-90 70 1 0 259 16 10 9 18 10 90+ 18 0 1 76 4 2 4 6 2
Data Female Method Age c1 c2 c3 c4 c5 c6 c7 c8 c9 10-15 28 0 3 20 0 1 0 10 6 15-20 353 2 11 81 6 15 2 43 47 20-25 540 4 20 111 24 9 9 78 67 25-30 454 6 27 125 33 26 7 86 75 30-35 530 2 29 178 42 14 20 92 78 35-40 688 5 44 272 64 24 14 98 110 40-45 566 4 24 343 76 18 22 103 86 45-50 716 6 24 447 94 13 21 95 88 50-55 942 7 26 691 184 21 37 129 131 55-60 723 3 14 527 163 14 30 92 92 60-65 820 8 8 702 245 11 35 140 114 65-70 740 8 4 785 271 4 38 156 90 70-75 624 6 4 610 244 1 27 129 46 75-80 495 8 1 420 161 1 29 129 35 80-85 292 3 2 223 78 0 10 84 23 85-90 113 4 0 83 14 0 6 34 2 90+ 24 1 0 19 4 0 2 7 0
CA Two approaches in CA Peter s trick: ordinary CA of either tle [ M F ] [ M and/or Michael s trick: [ ] M F ordinary CA of tle equivalent to F M F ] ordinary CA of the average tle 1 2 M + 1 2 F adapted CA of tle M (resp. tle F) with respect to 1 2 M + 1 2 F.
CA Two approaches in CA (Continued) Implicit in the first stream of approaches are choice of a log-linear model between C + S R and R + S C where R, C, and S denote row, column, matching factors ordinary CA of the tle formed accordingly Implicit in the second stream of approaches are metric choice for the rows (the ages) and the columns (the causes): metrics attached to each tle M, F or (smoothed) metrics attached to the average tle 1 2 M + 1 2 F or...? Metric choice impacts plots and, to a lesser extent, patterns in graphs.
CA Peter s plot [ M F ]
CA Michael s trick [ M F ] F M
CA Peter s trick versus Michael s trick dissimilarity similarity
CA Peter s trick versus Michael s trick dissimilarity similarity
Bird bird and cage
Bird trick
Bird bird in cage
Fish fish and bowl
Fish fish in bowl
restricted two-way interaction Notation for a two-way tle Observed #A #B two-way tle y AB of counts cross classified by factor A (row) and factor B (column), and margins y AB y A Profiles: Weights: w AB A-profiles B-profiles = y y A a y y B b y (y B ) y y B A=a b = 1 ya A ya A B=b = 1 yb B y AB y AB can be generalized into γ γ A a γ B a
restricted two-way interaction Diet modeling The y AB with are observed values of independant r.v. Y AB expected value: E[Y AB] = µab bi-linear predictor = m(ηab ) reduced rank interaction {}}{ η AB = offset + [β + βa A + βb B +] δ k βk,a A βb k,b k=1,...,r with identification constraints for the βs variance: Var(Y AB) = V (µab ) = allows to replicate most models with rank restricted interaction. = has consequences on the distribution of the profiles.
restricted two-way interaction Implementations Current implementations are Benzécri s CA and Goodman s RC. But non-canonical crossovers are possible. CA: µ AB = w AB (1 + η AB ) and V (µab ) = w AB a taste of heteroscedastic Normal distribution with a zest of Poisson RC: µ AB GB: µ AB BG: µ AB = exp (ηab ) and V (µab ) = µab a definite taste of Poisson distribution = exp(ηab ) and V (µab ) = w AB = max{ɛ, w AB (1 + ηab )} and V (µab ) = µab
restricted two-way interaction Diet Poisson-Multinomial Y i (i {1,..., n}) are independent r.v. with E[Y i ] = µ i and Var(Y i ) = σ 2 i E[ 1 y [Y 1,..., Y n] Y 1 +... + Y n = y] = 1 y [µ 1,..., µ n] + 1 y i σ2 i Var( 1 y [Y 1,..., Y n] Y 1 +... + Y n = y) = 0 {}}{ (y µ i )[σ1 2,..., σ2 n ] i 1 y 2 σ 2 1... σ 2 n 1 y 2 i σ2 i σ 2 1... σ 2 n [ σ 2 1,..., σ2 n ]
Poisson-Multinomial trick for two matched tles Poisson-Multinomial trick for two matched tles Poisson counts for the three way tle y SAB = (y1 SAB, y2 SAB ): log(λ SAB 2 ) = β + β2 M + βa A + βb B + βsa 2a + β2b SB + βab δ k ξak A ξb bk k log(λ SAB 1 ) = β + + βa A + βb B + + + βab ( δ k )ξak A ξb bk k + + Binomial model for the two way tle y SAB 2 given the tle y AB (sum of counts of matched cells): logit(π AB ) = log(λsab 2 λ SAB 1 ) = β S 2 + β SA 2a + β SB 2b + 2 k δ k ξ A ak ξb bk
Poisson-Multinomial trick for two matched tles What if CA is used? Three way tle y SAB = (y SAB 1, y SAB 2 ) and associated weights w AB = y y a A yb b y y CA of tle y2 SAB with respect to tle 1 2 y AB : offset 1 w AB E[Y2 SAB] = 1 {}}{ 1 w AB ( 2 y AB + k δ kξak A ξb bk ) Interpretation for the reduced rank interaction: 4 w AB yb AB k δ k ξ A ak ξb bk logit(π SAB 2 )
Poisson-Multinomial trick for two matched tles Log-odds
References Peter van der Heijden and Jan de Leeuw (1985): Correspondence analysis and complementary to loglinear analysis, Psychometrika, 50(4), 429-447. Michael Greenacre (2003): Singular value decomposition of matched matrices, Journal of Applied Statistics, 30, 1101-1113. Simplice Dossou-Gbété (2002): Reduced rank quasi-symmetry and biplots for matched two-way tles, Annales de la Faculté des Sciences, vol. XI (4), 469-483.
Thank you for your atention