The Poisson trick for matched two-way tables

Similar documents
Correspondence Analysis of Longitudinal Data

A Hierarchical Perspective on Lee-Carter Models

Generalized logit models for nominal multinomial responses. Local odds ratios

STAT5044: Regression and Anova

INFORMATION THEORY AND STATISTICS

Links Between Binary and Multi-Category Logit Item Response Models and Quasi-Symmetric Loglinear Models

Single-level Models for Binary Responses

PhD Qualifying Examination Department of Statistics, University of Florida

Generalized Linear Models

FACTOR ANALYSIS AS MATRIX DECOMPOSITION 1. INTRODUCTION

The following postestimation commands are of special interest after ca and camat: biplot of row and column points

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

9 Generalized Linear Models

BIOS 625 Fall 2015 Homework Set 3 Solutions

Stat 587: Key points and formulae Week 15

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Generalized Linear Models Introduction

Using Estimating Equations for Spatially Correlated A

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Investigating Models with Two or Three Categories

Generalized Linear Models: An Introduction

Linear regression is designed for a quantitative response variable; in the model equation

Introducing Generalized Linear Models: Logistic Regression

WEIGHTED LIKELIHOOD NEGATIVE BINOMIAL REGRESSION

A Practitioner s Guide to Generalized Linear Models

Discrete Multivariate Statistics

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

STAC51: Categorical data Analysis

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Correspondence Analysis

Lecture 13: More on Binary Data

Statistics 135 Fall 2008 Final Exam

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Thursday, August 30, 2018

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios

Appendix A. Math Reviews 03Jan2007. A.1 From Simple to Complex. Objectives. 1. Review tools that are needed for studying models for CLDVs.

Part 8: GLMs and Hierarchical LMs and GLMs

Chapter 22: Log-linear regression for Poisson counts

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Expectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Mark de Rooij leiden university. Willem J. Heiser leiden university

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

8 Nominal and Ordinal Logistic Regression

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

MSH3 Generalized linear model

Expectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

Semiparametric Generalized Linear Models

A Guide to Modern Econometric:

11. Generalized Linear Models: An Introduction

Poisson regression: Further topics

A Few Special Distributions and Their Properties

Log-linear Models for Contingency Tables

13.1 Causal effects with continuous mediator and. predictors in their equations. The definitions for the direct, total indirect,

Lecture 1 Introduction to Multi-level Models

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Multinomial Logistic Regression Models

Parallel Singular Value Decomposition. Jiaxing Tan

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ZERO INFLATED POISSON REGRESSION

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

Supplemental materials to Reduced Rank Mixed Effects Models for Spatially Correlated Hierarchical Functional Data

(c) Interpret the estimated effect of temperature on the odds of thermal distress.

Continuous RVs. 1. Suppose a random variable X has the following probability density function: π, zero otherwise. f ( x ) = sin x, 0 < x < 2

MATHEMATICS. Course Syllabus. Section A: Linear Algebra. Subject Code: MA. Course Structure. Ordinary Differential Equations

First Order Linear Ordinary Differential Equations

UNDERSTANDING ENGINEERING MATHEMATICS

LOWELL WEEKLY JOURNAL

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Threshold models with fixed and random effects for ordered categorical data

Mark Scheme (Results) June 2008

Generalized Linear Models. Kurt Hornik

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

STAT 7030: Categorical Data Analysis

Chapter 20: Logistic regression for binary response variables

Generalized Linear Models 1

Lecture 2: Linear and Mixed Models

MLE for a logit model

Categorical Data Analysis with a Psychometric Twist

Correspondence analysis and related methods

1 Exercises for lecture 1

Statistical Methods in HYDROLOGY CHARLES T. HAAN. The Iowa State University Press / Ames

General Regression Model

Linear Regression With Special Variables

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora

ij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as

Logistic Regression. Some slides from Craig Burkett. STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy

Categorical data analysis Chapter 5

Generalized Linear Models for Non-Normal Data

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

What is Probability? Probability. Sample Spaces and Events. Simple Event

arxiv: v1 [math.st] 7 Jan 2014

Linear Regression Models P8111

More Linear Algebra. Edps/Soc 584, Psych 594. Carolyn J. Anderson

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 11 Offprint. Discriminant Analysis Biplots

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

Generalized Linear Mixed-Effects Models. Copyright c 2015 Dan Nettleton (Iowa State University) Statistics / 58

Package HGLMMM for Hierarchical Generalized Linear Models

LECTURE 11: EXPONENTIAL FAMILY AND GENERALIZED LINEAR MODELS

Transcription:

The Poisson trick for matched two-way tles a case for putting the fish in the bowl (a case for putting the bird in the cage) Simplice Dossou-Gbété1, Antoine de Falguerolles2,* 1. Université de Pau et des Pays de l Adour 2. Université Paul Satier (Toulouse III) * Antoine -at- Falguerolles.net 31 January 2011

Plan Key ideas Matched two-way tles Objectives Poisson trick The suicide data: age, method and gender Data CAs for the two matched tles Plots Bird Fish Bilinear models restricted two-way interaction Case of two matched tles Poisson-Multinomial trick for two matched tles References

Key ideas Matched two-way tles Analysis of dissimilarity/similarity between tles Poisson trick

Matched two-way tles matched two-way tles The m tles of counts classified by factor A (row) and factor B (column), Yk SAB, their margins Yk SA and Yk SB and total count Yk S y1 SAB y1 SA (y SB 1 ) y S 1... y SAB s y SA s (y SB s ) y S s... y SAB #S y SA #S (y SB #S ) y S #S The marginal two-way tle (and its margins) y AB y A (y B ) y

Objectives Objectives Similarity/Dissimilarity between tles row profiles or column profiles May involve some preprocessing of tles by unifying margins by biproportional fitting (RAS, Iterative Proportional Fitting, matrix Raking) row profiles (column profiles) by weighting tles, profiles into tles, common metric

Poisson trick Poisson trick Y SAB s independent Poisson E[Ys SAB SAB ] = var(ys ) E[Ys SAB ] = m(βab + restricted(βsab s )) Ys SAB #S s=1 Y s SAB = y AB multinomial with known parameter: y AB probilities: m(β AB + restricted(βsab s )) m(βab m k=1 m(βab + = restricted(βsab s )) + restricted(βsab s )) y AB

Poisson trick Poisson trick for two matched tles Particular case: two matched tles (#M = 2) independant Poisson counts E[Ys SAB ] (s = 1, 2) exponential mean function (log link function): m = exp, m 1 = log model: all two-way interactions of A, B and F E[Ys SAB ] = exp(β AB + βsa SA + βsb SB ) Y2 SAB binomial B(y AB, πab 2 ) model: additivity of effects of A and B logit(π2) AB = β2a SA + β2b SB Works also with the inclusion of a reduced rank interaction in the predictor

Data Male Method Age c1 c2 c3 c4 c5 c6 c7 c8 c9 10-15 4 0 0 247 1 17 1 6 9 15-20 348 7 67 578 22 179 11 74 175 20-25 808 32 229 699 44 316 35 109 289 25-30 789 26 243 648 52 268 38 109 226 30-35 916 17 257 825 74 291 52 123 281 35-40 1118 27 313 1278 87 293 49 134 268 40-45 926 13 250 1273 89 299 53 78 198 45-50 855 9 203 1381 71 347 68 103 190 50-55 684 14 136 1282 87 229 62 63 146 55-60 502 6 77 972 49 151 46 66 77 60-65 516 5 74 1249 83 162 52 92 122 65-70 513 8 31 1360 75 164 56 115 95 70-75 425 5 21 1268 90 121 44 119 82 75-80 266 4 9 866 63 78 30 79 34 80-85 159 2 2 479 39 18 18 46 19 85-90 70 1 0 259 16 10 9 18 10 90+ 18 0 1 76 4 2 4 6 2

Data Female Method Age c1 c2 c3 c4 c5 c6 c7 c8 c9 10-15 28 0 3 20 0 1 0 10 6 15-20 353 2 11 81 6 15 2 43 47 20-25 540 4 20 111 24 9 9 78 67 25-30 454 6 27 125 33 26 7 86 75 30-35 530 2 29 178 42 14 20 92 78 35-40 688 5 44 272 64 24 14 98 110 40-45 566 4 24 343 76 18 22 103 86 45-50 716 6 24 447 94 13 21 95 88 50-55 942 7 26 691 184 21 37 129 131 55-60 723 3 14 527 163 14 30 92 92 60-65 820 8 8 702 245 11 35 140 114 65-70 740 8 4 785 271 4 38 156 90 70-75 624 6 4 610 244 1 27 129 46 75-80 495 8 1 420 161 1 29 129 35 80-85 292 3 2 223 78 0 10 84 23 85-90 113 4 0 83 14 0 6 34 2 90+ 24 1 0 19 4 0 2 7 0

CA Two approaches in CA Peter s trick: ordinary CA of either tle [ M F ] [ M and/or Michael s trick: [ ] M F ordinary CA of tle equivalent to F M F ] ordinary CA of the average tle 1 2 M + 1 2 F adapted CA of tle M (resp. tle F) with respect to 1 2 M + 1 2 F.

CA Two approaches in CA (Continued) Implicit in the first stream of approaches are choice of a log-linear model between C + S R and R + S C where R, C, and S denote row, column, matching factors ordinary CA of the tle formed accordingly Implicit in the second stream of approaches are metric choice for the rows (the ages) and the columns (the causes): metrics attached to each tle M, F or (smoothed) metrics attached to the average tle 1 2 M + 1 2 F or...? Metric choice impacts plots and, to a lesser extent, patterns in graphs.

CA Peter s plot [ M F ]

CA Michael s trick [ M F ] F M

CA Peter s trick versus Michael s trick dissimilarity similarity

CA Peter s trick versus Michael s trick dissimilarity similarity

Bird bird and cage

Bird trick

Bird bird in cage

Fish fish and bowl

Fish fish in bowl

restricted two-way interaction Notation for a two-way tle Observed #A #B two-way tle y AB of counts cross classified by factor A (row) and factor B (column), and margins y AB y A Profiles: Weights: w AB A-profiles B-profiles = y y A a y y B b y (y B ) y y B A=a b = 1 ya A ya A B=b = 1 yb B y AB y AB can be generalized into γ γ A a γ B a

restricted two-way interaction Diet modeling The y AB with are observed values of independant r.v. Y AB expected value: E[Y AB] = µab bi-linear predictor = m(ηab ) reduced rank interaction {}}{ η AB = offset + [β + βa A + βb B +] δ k βk,a A βb k,b k=1,...,r with identification constraints for the βs variance: Var(Y AB) = V (µab ) = allows to replicate most models with rank restricted interaction. = has consequences on the distribution of the profiles.

restricted two-way interaction Implementations Current implementations are Benzécri s CA and Goodman s RC. But non-canonical crossovers are possible. CA: µ AB = w AB (1 + η AB ) and V (µab ) = w AB a taste of heteroscedastic Normal distribution with a zest of Poisson RC: µ AB GB: µ AB BG: µ AB = exp (ηab ) and V (µab ) = µab a definite taste of Poisson distribution = exp(ηab ) and V (µab ) = w AB = max{ɛ, w AB (1 + ηab )} and V (µab ) = µab

restricted two-way interaction Diet Poisson-Multinomial Y i (i {1,..., n}) are independent r.v. with E[Y i ] = µ i and Var(Y i ) = σ 2 i E[ 1 y [Y 1,..., Y n] Y 1 +... + Y n = y] = 1 y [µ 1,..., µ n] + 1 y i σ2 i Var( 1 y [Y 1,..., Y n] Y 1 +... + Y n = y) = 0 {}}{ (y µ i )[σ1 2,..., σ2 n ] i 1 y 2 σ 2 1... σ 2 n 1 y 2 i σ2 i σ 2 1... σ 2 n [ σ 2 1,..., σ2 n ]

Poisson-Multinomial trick for two matched tles Poisson-Multinomial trick for two matched tles Poisson counts for the three way tle y SAB = (y1 SAB, y2 SAB ): log(λ SAB 2 ) = β + β2 M + βa A + βb B + βsa 2a + β2b SB + βab δ k ξak A ξb bk k log(λ SAB 1 ) = β + + βa A + βb B + + + βab ( δ k )ξak A ξb bk k + + Binomial model for the two way tle y SAB 2 given the tle y AB (sum of counts of matched cells): logit(π AB ) = log(λsab 2 λ SAB 1 ) = β S 2 + β SA 2a + β SB 2b + 2 k δ k ξ A ak ξb bk

Poisson-Multinomial trick for two matched tles What if CA is used? Three way tle y SAB = (y SAB 1, y SAB 2 ) and associated weights w AB = y y a A yb b y y CA of tle y2 SAB with respect to tle 1 2 y AB : offset 1 w AB E[Y2 SAB] = 1 {}}{ 1 w AB ( 2 y AB + k δ kξak A ξb bk ) Interpretation for the reduced rank interaction: 4 w AB yb AB k δ k ξ A ak ξb bk logit(π SAB 2 )

Poisson-Multinomial trick for two matched tles Log-odds

References Peter van der Heijden and Jan de Leeuw (1985): Correspondence analysis and complementary to loglinear analysis, Psychometrika, 50(4), 429-447. Michael Greenacre (2003): Singular value decomposition of matched matrices, Journal of Applied Statistics, 30, 1101-1113. Simplice Dossou-Gbété (2002): Reduced rank quasi-symmetry and biplots for matched two-way tles, Annales de la Faculté des Sciences, vol. XI (4), 469-483.

Thank you for your atention