Methods of Psychological Research Online 1998, Vol.3, No.2 Internet: cæ 1998 Pabst Science Publishers Modeling Tur

Size: px
Start display at page:

Download "Methods of Psychological Research Online 1998, Vol.3, No.2 Internet: cæ 1998 Pabst Science Publishers Modeling Tur"

Transcription

1 Methods of Psychological Research Online 1998, Vol.3, No.2 Internet: Modeling Turn-Over Tables Using the GSK Approach æ Christof Schuster University of Michigan Alexander von Eye Michigan State University Abstract The GSK approach is reviewed and it is shown how a large variety of models for turn-over tables can be ætted by specifying functions in the cell probabilities that are zero if the model that is analyzed æts the data. It is demonstrated that the GSK approach is even able to easily handle functions that contain sums of cell probabilities. Such functions can usually not be easily dealt with in a log-linear modeling framework. 1 Introduction When a categorical variable is repeatedly observed the data are typically arranged in what is known as a turn-over table èhagenaars, 1990ë14ëè. Consider the simple case where a categorical variable is observed twice. Arranging the categorical outcomes observed at the ærst occasion as the rows and the categorical outcomes observed at the second occasion as the columns of a square contingency table we obtain a two-dimensional turn-over table. Suppose we have observed a variable with three diæerent outcomes, the turn-over table for the data has the following form: T 2 è1è T 2 è2è T 2 è3è T 1 è1è n 11 n 12 n 13 T 1 è2è n 21 n 22 n 23 T 1 è3è n 31 n 32 n 33 T k èlè denotes the lth category observed at the kth occasion, and the n ij in the cells of the table denote the frequencies of observations that fall in category i at the ærst and in category j at the second occasion. The marginal frequencies of the table are denoted as n i+ = P j n ij and n +j = P i n ij. The notation for three- and higher-dimensional turn-over tables is a simple generalization of the notation for two-dimensional tables. Turn-over tables have received much attention in the literature on log-linear models èsee, for example, Agresti, 1990; Bishop, Fienberg, & Holland, 1975; Clogg, Eliason, & Grego, 1990; Hagenaars, 1990è and the references therein. There is a considerable number of standard models of varying complexity available that can æ Address for correspondence: Christof Schuster, Institute for Social Research, Ann Arbor, MI Alexander von Eye's work on this article was supported in part by NIAAA grant è 2R01 AA7065

2 C. Schuster and A. von Eye: GSK Approach 40 be applied to turn-over tables. These are well known under the names of quasiindependence models, quasi-symmetry models, symmetry models, marginal homogeneity models and so forth. All these models can be applied to two-dimensional turn-over tables but can also be easily extended to examine three- and higherdimensional contingency tables. Most often these models are discussed in a log-linear model framework 1 when practical aspects such as models, evaluating their goodness of æt, and so forth, become relevant. This makes good sense because numerical ætting routines for loglinear models are widely distributed through their implementation in virtually all standard statistical computer packages. Most of these routines rely on one of two computer algorithms. These are the iterative proportional ætting algorithm èipfè èdeming & Stephan, 1940aë6ë, 1940bë7ëè and the Newton-Raphson algorithm. Both yield maximum-likelihood estimators of model parameters and hence, the resulting estimators have very desirable characteristics. In addition, both algorithms are numerically inexpensive and easy to apply. However, these algorithms are most useful when analyzing data using hierarchical log-linear models. Substantive questions addressed by these models are typically whether independence, conditional independence or some kind of homogeneous association holds among several variables. Questions of these types are usually not of interest when analyzing turn-over tables. Instead, the aim is to analyze the association among the repeated observations of a variable. When models for turn-over tables are ætted, both algorithms can usually still be applied but the ease with which hierarchical log-linear models are ætted is gone. Usually large design-matrices have to be set up by hand to employ the Newton-Raphson algorithm. How this is done can be found in von Eye and Spiel è1996èë26ë. The IPF algorithm often requires that turn-over tables are amended. For instance, Agresti è1990, p. 382èë1ë explains how to æt the quasi-symmetry model using the IPF algorithm: ærst, transform the I æ I table to an I æ I æ 2 table and then draw conclusions concerning the I æ I table by ætting log-linear models to the amended I æ I æ 2 table. Yet, there is another drawback when analyzing turn-over tables using log-linear models. Testing for marginal homogeneity can only be done by comparing the æt of the symmetry and the quasi-symmetry models, èsee, for instance, Agresti, 1990, p. 359ë1ëè. However, this assumes that the quasi-symmetry model is `true' and therefore this test is only of limited use. With the GSK approach a test that is not conditional in this sense can easily be performed as will be demonstrated later in this article. It is well known that besides the log-linear model framework there is another framework for analyzing categorical data that relies on a diæerent estimation rationale. 2 This is the weighted least-squares approach èwlsè suggested by Grizzle, Starmer, and Koch è1969èë13ë. To honor the authors of this classic article the approach is generally known in the literature as the GSK approach. This approach is even more æexible to set up and to æt models for categorical data because it is not limited to model the logarithm of cell probabilities or equivalently expected cell frequencies but allows modeling of very general functions of the cell probabilities. Hence, models that can be ætted by the IPF or Newton-Raphson algorithms can also be ætted by the WLS algorithm of the GSK approach, although estimates will diæer more or less, depending mainly on sample size. However, the preferred estimation method in log-linear models has been maximumlikelihood because of è1è the generally desirable characteristics of estimators, and 1 Note that we would like to restrict the meaning of the term log-linear model for the present paper to models for manifest variables, like those models contained in standard textbooks on log-linear models. The range of models that can be considered log-linear is considerably enlarged when mixtures of latent and manifest variables are included in models. 2 For yet another very general approach to analyzing categorical data, see Rindskopf è1992èë23ë.

3 C. Schuster and A. von Eye: GSK Approach 41 è2è the large generality of the GSK approach is not needed if only hierarchical loglinear models are considered. In addition, the GSK approach is most naturally used when categorical variables can be subdivided into dependent and independent variables, that is, within a regression framework. In this paper we show how the GSK approach to categorical data can be applied in a straightforward way to analyzing data given in turn-over tables. Although, in turn-over tables variables have equal status, that is, are not divided into dependent and independent variables, the GSK approach can be applied in a natural way because it allows specifying null-hypotheses that involve several almost arbitrary functions in the cell probabilities. Hence, by showing how models for turn-over tables translate into this kind of hypotheses we can show how the test statistics for these models can be calculated and evaluated. Since the GSK approach isvery æexible in the type of functions of the cell probabilities that can be handled sums of cell probabilities pose no special problems. In Section 2 we give a brief outline of the GSK approach. In Section 3 we show how several of the models mentioned above, including the model of marginal homogeneity, can be ætted with the GSK approach giving similar results as obtained by maximum-likelihood estimation. Section 4 discusses and summarizes the results. 2 The GSK Approach In this section, we give an outline of the main ideas of the GSK approach without proofs of main results. More detailed discussion of the approach can be found in the relevant literature. Easy presentations are given in Wickens è1989èë27ë, Christensen è1997èë4ë and Agresti è1990èë1ë while more advanced presentations are contained in Grizzle et al. è1969èë13ë, Koch, Landis, Freeman, Freeman, and Lehnen è1977èë16ë, Koch, Amara, Davis, and Gillingsë15ë, Landis and Koch è1979èë17ë, Landis and Miller è1988èë18ë. The GSK approach is fully implemented in SAS PROC CAT- MOD. Stokes, Davis, and Koch è1995èë25ë give numerous applications of the GSK approach with both cross-sectional and longitudinal data. Since we do not use the GSK approach in a regression framework where covariates or, equivalently, several populations are involved we describe the approach considering only a single population. However, knowing how the approach works for a single population, one is able to extend the approach to multiple populations. Nevertheless, we do not dwell on using the GSK approach as a tool to ætting regression models using categorical data. Consider the simple case of a two-dimensional contingency table where the ærst variable has J and the second variable has K categories. Let ç 0 =èç 11 ;ç 12 ;::: ;ç JK è. Note that P èjkè ç jk = 1 so that there are only JK, 1 diæerent ç's free to vary. The large generality of the GSK approach stems from the fact that it allows to test research hypotheses that can be expressed as vector-valued functions F èçè. The test procedure based on weighted least-squares allows evaluating whether F èçè = 0 holds. Written explicitly, F is given as F èçè =ëf 1 èçè;f 2 èçè;::: ;F m èçèë 0 ; where m ç JK, 1. Before further pursuing technical details, it should be noted that hypotheses for contingency tables are often expressed as functions of ç. Consider the case of a 2æ2-table. Testing for independence of the two categorical variables is equivalent to testing whether the odds ratio ç is equal to one, that is, H 0 : ç = ç 11ç 22 ç 12 ç 21 =1:

4 C. Schuster and A. von Eye: GSK Approach 42 Of course, this is easily tested with the GSK approach since for m = 1, that is, F is real valued, we specify the null hypothesis as H 0 : F èçè =F 1 èçè = log ç 11, log ç 12, log ç 21 + log ç 22 =0: Rather than specifying a log-linear model for the cell counts assuming independence between the two variables we focus on the functions in ç that are assumed to be zero in the population if independence holds. Another simple example of recasting hypotheses in terms of F is McNemar's test for determining whether there is systematic change in a dichotomous variable observed at two diæerent occasions. The hypothesis tested by McNemar's test is H 0 : ç 12 = ç 21 : Again, it can be seen that this can be easily rearranged to take the form F èçè =0 by expressing H 0 as H 0 : F èçè =F 1 èçè =ç 12, ç 21 =0: McNemar's test can be equivalently considered a test for symmetry or marginal homogeneity ina2æ2-table. Now, we consider testing for symmetry in a 3æ3-table. Here the null hypothesis is given as H 0 : ç jk = ç kj for all jék. For a 3æ3-table this means that ç 12 = ç 21, ç 13 = ç 31, and ç 23 = ç 32 are simultaneously true if symmetry holds. Expressing this null hypothesis as F èçè = 0 yields, H 0 : F èçè = A ç 12, ç 21 ç 13, ç 31 A 0 0 A : ç 23, ç 32 F 1èçè F 2 èçè F 3 èçè Although the function F èçè can in principle be very general èsee below for more details on regularity assumptions on F è it turns out that often the substantiveinteresting F èçè can be expressed as F èçè =A 1 ç or F èçè =A 2 logèa 1 çè, where A 1 and A 2 are suitably chosen matrices. The SAS system allows almost arbitrary functions in ç that can be composed of matrix multiplications, and application of the functions exp and log. Landis and Koch è1979, p. 260èë17ë give an example of a complex function in ç that can be expressed only using matrix multiplication combined with exp and log. The authors express for suitably chosen matrices A 1 ;A 2 ;A 3 ;A 4 Yule's Q as Q = expèa 4 logèa 3 expèa 2 logèa 1 çèèèè: Having outlined the basics of the GSK approach, we now present the inferential procedure. Since the ç's are unknown we estimate them using relative frequencies, that is, p 0 =^ç 0 = 1 N èn 11;n 12 ;::: ;n JK è 0 ; where N = n ++ = P jk n jk denotes the sample size. Standard asymptotic theory shows that p Nèp, çè follows a multivariate normal distribution with zero mean and covariance matrix V èçè. 0

5 C. Schuster and A. von Eye: GSK Approach 43 Under the distributional assumptions just given a test statistic for evaluating F èpè can be constructed by using the multivariate version of the so called `deltamethod' if F has partial derivatives up to the second order with respect to the ç jk èsee, for instance, Agresti, 1990ë1ë; Bishop et. al., 1975ë3ë; Sen & Singer, 1993ë24ëè. Without further derivations we now present the test statistic for evaluating whether F èçè = 0 is likely to hold. Let Hèpè be deæned as the matrix of partial derivatives of the lth response function with respect to ç evaluated at p, that is, and V èpè begiven as Hèpè jk æ çjk =p jk V èpè = 1 N ëdiagèpè, pp0 ë; where Diagèpè denotes the diagonal matrix that contains the values of p in its diagonal cells while having a value of zero in all oæ-diagonal cells. Then we can calculate Sèpè ashèpèv èpèhèpè 0 and the test statistic X 2 for testing whether F èçè =0is given as X 2 = F èpè 0 S,1 èpèf èpè: X 2 is asymptotically distributed as ç 2 with degrees of freedom equal to the number m of real-valued functions in F èçè. The development can easily be extended to situations where there are more than two response variables. 3 Fitting Models for Turn-Over Tables using the GSK Approach 3.1 Quasi-Symmetry The case of quasi-symmetry can be handled by noting that the corresponding hypothesis can be expressed in terms of odds-ratios. We consider a square twodimensional contingency table with I rows as well as I columns. For such a table Agresti è1990, p. 355ë1ëè gives the following conditions for quasi-symmetry to hold: ç ij ç II ç ii ç ji = ç jiç II ç ji ç Ii for i; j =1;::: ;I: Note that ç II appears both in the left as in the right hand side of the equation above. Hence, we can omit this term. We obtain ç ij = ç ji : ç ii ç ji ç ji ç Ii We can use this expression to derive restrictions of the form F èçè = 0 if quasisymmetry holds. In general, we have to derive as many restrictions in the ç's as there are degrees of freedom. The quasi-symmetry model has èi, 1èèI, 2è=2 degrees of freedom. Since the above expression has to hold for all i and j it contains redundancy. As can be seen from the following arguments there are only as many independent restrictions as there are degrees of freedom for the quasi-symmetry model. Consider the case where i = j. It can easily be checked that in this case equality holds regardless of the values of ç.

6 C. Schuster and A. von Eye: GSK Approach 44 Hence, from the I 2 restrictions we already have identiæed I restrictions that are superæuous. Now assume that either i = I or j = I but i 6= j. We assume without loss of generality that j = I. Then we have ç ii = ç Ii ; ç ii ç II ç II ç Ii from which we can readily see that this expression is trivially always true. This means that an additional 2 æèi,1è of the I 2 conditions are superæuous. Finally, we note that the restrictions in terms of odds-ratios for èi; jè =èk; lè and èj; iè =èl; kè are equivalent. Hence, we have another èi, 1èèI, 2è=2 superæuous restrictions. Calculating the number of superæuous restrictions and subtracting this number from I 2 results in I 2, èi +2èI, 1è + = 1 èi, 2èèI, 1è; 2 èi, 1èèI, 2è è 2 which is exactly equal to the number of degrees of freedom of the quasi-symmetry model. Hence, we can deduce that just the restrictions where i =1;::: ;èi, 1è and iéjholds are non-redundant. For a 3æ3-table there is only one condition that needs to be tested for quasi-symmetry. Fora4æ4-table there are 1 2 è4,2èè4,1è = 3 conditions that need to be tested. We already know which these conditions are. For quasi-symmetry we have to check for a 4æ4-table whether ç 12 ç 14 ç 42 = ç 13 ç 14 ç 43 = ç 23 ç 24 ç 43 = ç 21 ç 24 ç 41 ç 31 ç 34 ç 41 ç 32 ç 34 ç 42 are simultaneously true. Equivalently, we can take the logarithm of both sides of each equation and move all terms to the left hand side. This yields and and log ç 12, log ç 14, log ç 42, log ç 21 + log ç 24 + log ç 41 = 0 and log ç 13, log ç 14, log ç 43, log ç 31 + log ç 34 + log ç 41 = 0 and log ç 23, log ç 24, log ç 43, log ç 32 + log ç 34 + log ç 42 = 0: Hence, the function F èçè = 0 has been found. We now demonstrate the ætting of the quasi-symmetry model to a 4æ4-table. The data are given in Agresti è1990, p. 357ë1ëè and were obtained from a sample of 55,981 residences sampled by the U. S. Bureau of the Census, see Table 1. The four categories are the four regions, Northeast, Midwest, South, and West of the USA and interest lies in analyzing the migration that took place between 1980, when the ærst observation was made, and 1985, the year when the sample was revisited. Fitting the model of quasi-symmetry by maximum-likelihood yields a deviance value of G 2 = 2:99 based on df = 3. Now, let us æt the model with the GSK approach. The only problem is implementing the three response functions in a SAS job æle. Here is how the model can be speciæed and ætted in SAS. proc catmod data=sasuser.agr10_2; weight n; response , ,

7 C. Schuster and A. von Eye: GSK Approach 45 Residence Residence in 1985 in 1980 Northeast Midwest South West Northeast Midwest South West Table 1: Migration of U.S. residence between 1980 and log; model res80*res85 = ènoint; run; The data are assumed to be contained in a æle named agr10_2 in the sasuser library. The data æle contains three variables: res80 and res85, the indicator variables for geographical region in 1980 and 1985 respectively where the index for res85 changes fastest, and n which contains the numbers of residences that fall in the cells of the contingency table. Most important istheresponse-statement. First, we can see that it contains the log-keyword at the end. This means that the logarithm of the vector of estimated probabilities p is used for calculations. The previous three lines deæne three linear functions in the log cell probabilities according to the three single response functions F 1 èçè, F 2 èçè, and F 3 èçè given above. This can be seen by expressing log ç as log ç = logèç 11 ;::: ;ç 14 ;ç 21 ;::: ;ç 24 ;ç 31 ;::: ;ç 34 ;ç 41 ;::: ;ç 44 è è1è and calculating A 0 log ç where A is the 16 æ 3 matrix given in the responsestatement of the SAS commands. The model-statement merely tells the SAS system to test whether all the three response functions are simultaneously equal to zero. These SAS-commands yield the following slightly abbreviated output:

8 C. Schuster and A. von Eye: GSK Approach 46 CATMOD PROCEDURE Response: RES80*RES85 Response Levels èrè= 16 Weight Variable: N Populations èsè= 1 Data Set: AGR10_2 Total Frequency ènè= Frequency Missing: 0 Observations èobsè= 16 Response Functions Sample ANALYSIS-OF-VARIANCE TABLE Source DF Chi-Square Prob RESIDUAL The test statistic yields a value of 2.98 based on 3 degrees of freedom. This is almost exactly the result obtained from the maximum-likelihood analysis given above. In both cases we come to the same substantive conclusion that the migration of US residence between 1980 and 1985 can be nicely explained by a quasi-symmetry model. For symmetry and quasi-symmetry models log-linear models and the procedure just outlined in this section are closely related to each other. This relation stems from the fact that testing for F èçè = 0 is done by calculating linear functions of the logarithm of the cell probabilities, that is, F èçè = A 0 log ç = 0. Therefore, the following paragraphs apply also to the diagonals-parameter symmetry model of Goodman discussed in a later section of this paper. The connection between ætting log-linear models and the GSK approach can be seen by using the design-matrix approach to log-linear models èevers & Namboodiri, 1979ë9ëè. Let X be the design-matrix for ætting a log-linear model. Let ç jk = log m jk where the m jk are expected frequencies in a two-dimensional contingency table. Using matrix notation the log-linear can be written as ç = Xç; è2è where X is called the design-matrix. For explanations of how design-matrices for symmetry and quasi-symmetry models are set up, see von Eye and Spiel è1996èë26ë. Finding a log-linear model that æts the data well is done by adding or deleting parameters to the model equation and adding or deleting the corresponding column-vectors to the design-matrix X or in a more abstract sense, enlarging or diminishing the column space of X. If X has q rows and p columns and q ç p model selection of log-linear models can be seen as the problem of selecting a p-dimensional subspace in q-dimensional space. Positing that CèXè yields a good model-æt is equivalent to positing that the orthogonal complement ofcèxè, that is, CèXè? adds nothing to the model. Since the GSK approach tests for F èçè = 0 this is equivalent to examining whether CèXè? can be ignored. Hence, while log-linear modeling focuses on specifying CèXè the GSK approach focuses on ænding CèXè?. More speciæcally, let X be the design-matrix for a quasi-symmetry model and let A be a matrix such that F èçè = A 0 log ç = 0 then CèAè = CèXè? = CèI, XèX 0 Xè,1 X 0 è. In the example for the quasi-symmetry model from above, A was given by the three response functions that were assumed to be simultaneously equal to zero if the quasi-symmetry model holds, see the matrix given in the responsestatement of the last SAS command æle.

9 C. Schuster and A. von Eye: GSK Approach Symmetry Having already mentioned in Section 2 how the function F èçè for a symmetry model assuming 3æ3-table is set up we do not further comment on symmetry models for higher dimensional contingency tables. The setup of F èçè for these models can easily be generalized from two-dimensional tables, and the ætting of the models in SAS is most similar to ætting quasi-symmetry models. Mainly the responsestatement has to be altered. However, from examining F èçè it should be clear how this can be done. 3.3 Marginal Homogeneity Testing marginal homogeneity in log-linear models is diæcult because marginal probabilities are given as sums of cell probabilities. As has already been mentioned in Section 2, marginal homogeneity can be tested using log-linear models if it can be assumed that the quasi-symmetry model holds in the population. Therefore, this test is often called a `conditional' test. If quasi-symmetry does not hold in the population then marginal homogeneity can be tested using the GSK approach. Other unconditional tests for marginal homogeneity are given in Agresti è1990, p. 359èë1ë. The hypothesis of marginal homogeneity can be expressed in terms of the cell probabilities. If marginal homogeneity holds the H 0 : ç i+ = ç +i for i =1;::: ;èi, 1è is fulælled. 3 Note that the model of marginal homogeneity has èi, 1è degrees of freedom. Assuming again that data are given in a 4æ4-table this model can be written as ç i1 + ç i2 + ç i3 + ç i4 = ç 1i + ç 2i + ç 3i + ç 4i for i =1;::: ;3; or, expressed in the form F i èçè = 0, we have F 1 èçè = ç i2 + ç i3 + ç i4, ç 2i, ç 3i, ç 4i =0 and F 2 èçè = ç i1 + ç i3 + ç i4, ç 1i, ç 3i, ç 4i =0 and F 3 èçè = ç i1 + ç i2 + ç i4, ç 1i, ç 2i, ç 4i =0: Using the same data example as before the SAS statements for testing marginal homogeneity can be set up as: proc catmod data=sasuser.agr10_2; weight n; response , , ; model res80*res85 = ènoint; quit; The SAS statements are almost the same as in the data example for ætting the quasi-symmetry model. Only the setup of the response function has changed. Running these SAS commands yields the following output: 3 Note that this null hypothesis implies ç I+ = ç+i. Therefore, when this hypothesis is expressed in other presentations of marginal homogeneity, i often ranges from 1 to I.

10 C. Schuster and A. von Eye: GSK Approach 48 CATMOD PROCEDURE Response: RES80*RES85 Response Levels èrè= 16 Weight Variable: N Populations èsè= 1 Data Set: AGR10_2 Total Frequency ènè= Frequency Missing: 0 Observations èobsè= 16 Response Functions Sample ANALYSIS-OF-VARIANCE TABLE Source DF Chi-Square Prob RESIDUAL Since we already know that the quasi-symmetry model æts the data reasonably well, the conditional test should yield a very similar test statistic compared to the unconditional test presented here. Agresti è1990èë1ë reports a deviance of G 2 = 240:56 based on df = 3 for the conditional test procedure based on log-linear models. The ç 2 -value calculated by the GSK approach is 236:49 based on df = 3 which corresponds quite closely to the value of the conditional approach. Both tests suggest that marginal homogeneity does not hold for the 4æ4-table. Hence, there is a time trend in the migration patterns meaning that certain regions in the US have been preferred by the people who moved between 1980 and It is an interesting result of the above analysis that the ç 2 -value resulting from the GSK approach is identical to the ç 2 -value of the unconditional test of Bhapkar è1966èë2ë as reported in Agresti è1990, p. 360èë1ë. Note that all the reported test statistics for marginal homogeneity are almost identical because quasi-symmetry holds for the contingency table under study. If quasi-symmetry does not hold results from the conditional test based on log-linear models and the unconditional test based on the GSK approach are not necessarily asymptotically equivalent. If the test-statistics diæer, the result of the GSK approach should be used to draw substantive conclusions. 3.4 Diagonal-Parameter Symmetry Model We now present how the diagonal-parameter symmetry model, which is one of the many models Goodman proposed for the analysis of turn-over tables, can be ætted using the GSK approach. For a discussion of the model see Goodman è1979ë11ë,1981ë12ëè. Using Goodman's notation in this section the symmetry model can be expressed as F ij = ç ij for i 6= j; where F ij now denote the expected cell frequencies in the turn-over table. 4 The ç ij are called symmetry parameters. The symmetry model rarely æts the data well. If the variables of the turn-over table have ordinal scale quality it might be reasonable to examine whether a model that treats the change by one unit from, say, observation Time 1 to observation Time 2 equal, regardless whether it was a change from level 1 to level 2, or from level 2 to level 3 and so forth. Similarly, changes by two scale units are treated equal, regardless of whether the change was 4 Readers should not confuse the function F that is evaluated by the GSK approach with the expected cell frequencies.

11 C. Schuster and A. von Eye: GSK Approach 49 æ 1 æ 2 æ 3 æ 1,1 æ 1 æ 2 æ,1 2 æ,1 1 æ 1 æ 3,1 æ 2,1 æ 1,1 Table 2: Assignment of æ-parameters to cells in a 4 æ 4 turn-over table. from category 1 to category 3, or from category 2 to category 4 and so forth. The model that expresses this is F ij = ç ij æ k for i 6= j; k = j, i: The diagonal elements are not part of the model, rather they are completely ignored. We see, that in addition to the symmetry parameter we have 2èI, 1è additional æ-parameters, where I denotes the number ofrowsècolumns of the table. To make them uniquely deæned we have to impose three constraints on them. We choose æ,1 = æ 1,1 ;æ,2 = æ 2,1 ;æ,3 = æ 3,1 ;:::. Table 3.4 illustrates for a 4 æ 4 table to which cells each of the three diæerent æ-parameters belong. As can be seen from this table each of the upper minor diagonal cells is assigned a separate æ-parameter and without loss of generality the parameters used for the lower minor diagonals are the inverse of the corresponding upper minor diagonal parameters. The degrees of freedom for this model are calculated as follows. Let I be the number of rowsècolumns of the table. We have I 2, I oæ-diagonal cells and there are as many symmetry parameters as there are cells below èor equivalently aboveè the main diagonal, that is, IèI, 1è=2. In addition there are I, 1 æ-parameters. Hence, the degrees of freedom are èi 2, Iè, IèI, 1è 2 = 1 èi, 2èèI, 1è: 2, èi, 1è This is exactly equal to the number of degrees of freedom for the quasi-symmetry model. Fora4æ 4 table we have 3 degrees of freedom left. Hence, to use the GSK approach to æt this model we have to ænd three functions in the cell probabilities that are zero assuming the model holds true. They following arguments for a 4 æ 4 table show which functions these are. Assuming jéi,we see that F ij F ji = ç ijæ k ç ij æ,1 k = æ 2 k : This means that the ration F ij =F ji is a constant that depends only on k = j, i. Hence, for k =1wehave while for k =2we obtain, F 12 F 21 = F 23 F 32 = F 34 F 43 ; F 13 F 31 = F 24 F 42 : The ærst of these two equations deænes two equality restrictions and the second deænes an additional restrictions on the expected cell frequencies or, equivalently, on the cell probabilities.

12 C. Schuster and A. von Eye: GSK Approach 50 right eye left eye grade grade best second third worst best second third worst Table 3: Turn-over table of women according to right and left eye grade with respect to unaided distance vision. Taking logarithms and expressing the three restrictions using cell probabilities we obtain the function F as F 1 èçè = log ç 12, log ç 21, log ç 23 + log ç 32 =0 and F 2 èçè = log ç 23, log ç 32, log ç 34 + log ç 43 =0 and F 3 èçè = log ç 13, log ç 24, log ç 31 + log ç 42 =0: We now demonstrate the calculations using a data example. The data have been repeatedly analyzed in the literature, and are given, for instance, in Grizzle et al. è1969èë13ë or Goodman è1979ë11ë, 1981ë12ëè. Table 3 contains the data on unaided vision. The vector of probabilities is set up exactly as in Equation è1è which includes the probabilities for the diagonal elements although these are not part of the model. This is done to facilitate the presentation of the response-statement in the following SAS job æle using SAS CATMOD for evaluating the functions in F simultaneously. proc catmod data=sasuser.eyes; weight n; response , , log; model A*B = è noint; run; Except for the response-statement this job is almost identical to the ones given earlier. The only diæerences are that the SAS-æle containing the data has a diæerent name and the variables are now labeled A and B. The slightly abbreviated SAS output is CATMOD PROCEDURE Response: A*B Response Levels èrè= 16 Weight Variable: N Populations èsè= 1 Data Set: EYES Total Frequency ènè= 7477 Frequency Missing: 0 Observations èobsè= 16 Response Functions Sample

13 C. Schuster and A. von Eye: GSK Approach 51 ANALYSIS-OF-VARIANCE TABLE Source DF Chi-Square Prob RESIDUAL The value of the test statistic is 0.50 based on 3 degrees of freedom. This is exactly the value reported by Goodman è1979èë11ë for the diagonal-parameter symmetry model. This is an very small value that shows that this model æts the data almost perfectly. The diæerence between the test statistics based on maximumlikelihood or weighted least-squares are within rounding errors. 4 Discussion The GSK approach requires almost no distributional assumptions of the observed random variables. Only the conditions that are needed for the multivariate centrallimit-theorem to hold must be fulælled, as for instance, ænite variance and independence of observations. Therefore, the GSK approach can almost universally be applied if the sample size is large. However, the GSK approach is not a likelihood based approach. From a theoretical point of view this can be criticized. However, from an applied point of view the GSK approach is an extremely versatile technique. Its generality makes it well suited for ætting models that have been developed in other frameworks. In applied research the latest software is often not available or other practical problems make ætting the model by the most desirable estimation procedure diæcult or impossible. Then the GSK approach might help to obtain a reasonable approximation to the more advanced methods. Although this was not demonstrated in this paper, so-called `marginal models' èsee Liang, Zeger, & Qaqish, 1992ë20ë; Diggle, Liang, & Zeger, 1994ë8ëè that are nowadays usually ætted using GEE methodology èliang & Zeger, 1986ë19ë; Zeger & Liang, 1986ë28ëè can be analyzed using the GSK approach if the covariates are categorical. Another example is McCullagh's proportional odds model èmccullagh, 1980ë21ëè. Since this model considers functions of cumulative probabilities as response that are modeled it can not easily be rewritten as a log-linear model èfienberg, 1980, p. 111ë10ëè. If a software package does not include a procedure or an option to a procedure as in `PROC LOGISTIC' of the SAS program that can handle this model, ætting the proportional odds model must be done by programming ones own ætting routine. However, this model can also be easily ætted using the GSK approach. In addition, the GSK approach can be very helpful when new models are being developed that can be formulated by specifying almost arbitrary functions in the ç's. With the GSK approach such models can be ætted to the data, and parameter estimates can be obtained to evaluate whether a new model æts the data reasonably well. The results presented in Section 3.1 suggested that migration patters exist. A look at Table 1 suggests that the South of the U.S. may be the prime target for relocations. Meiser, von Eye, and Spiel è1997èë22ë presented log-linear methods for analysis of such trends. One may wonder why, in the light of the clear advantages of the GSK approach, development of log-linear methods progresses at a faster pace than development of methods based on the GSK approach. One answer was implicitly given above. There is only one full implementation of the GSK approach in general purpose statistical software packages, that is, the SAS CATMOD module. In addition, the manual for this module is perceived by many as hard to comprehend.

14 C. Schuster and A. von Eye: GSK Approach 52 Thus, there is less stimulation than for use and development of log-linear models for which there exist many and very user-friendly programs èspss, SYSTATè. Yet, there is a second answer to the question why development and application seems to emphasize log-linear models. In many textbooks on the analysis of categorical data, the GSK approach is only presented as an aside, èfor example, Agresti, 1990ë1ëè, while the main emphasis is given to log-linear models. Other textbooks play the usefulness of the GSK approach down by emphasizing that it is justiæed based on its large sample optimality. Christensen è1997, p. 376ë4ëè concludes that ëthere is no apparent reason to use GSK for small samples." One problem with these arguments is that, while they are correct, they apply, at least in part, also to the log-linear approach. We thus conclude that methods for log-linear modeling have experienced more rapid and farther going development than methods within the GSK approach. There can be no doubt, that the GSK approach allows one to test hypotheses that the log-linear approach can test only via awkward detours or not at all. Therefore, we propose that further development of the GSK approach is well worth our while. References ë1ë Agresti, A. è1996è. An introduction to categorical data analysis. New York: John Wiley & Sons. ë2ë Bhapkar, V. P. è1966è. A note on the evidence of two test criteria for hypotheses in categorical data. Journal of the American Statistical Association, 61, ë3ë Bishop, Y. M. M., Fienberg, S. E., & Holland, P. è1975è. Discrete multivariate analysis. Theory and practice. Cambridge, MA: MIT Press. ë4ë Christensen, R. R. è1997è. Log-linear models è2nd. ed.è. New York: Springer-Verlag. ë5ë Clogg, C. C., Eliason, S. R., & Grego, J. M. è1990è. Models for the analysis of change in discrete variables. In A. von Eye èed.è, Statistical methods in longitudinal research, Vol. II èpp è. New York: Academic Press. ë6ë Deming, W. E., & Stephan, F. F. è1940aè. On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. Annals of Mathematical Statistics, 11, ë7ë Deming, W. E., & Stephan, F. F. è1940bè. The sampling procedure of the 1940 population census. Journal of the American Statistical Association, 35, ë8ë Diggle, P. J., Liang, K., & Zeger, S. L. è1994è. Analysis of longitudinal data. Oxford: Clarendon Press. ë9ë Evers, M., & Namboodiri, N. K. è1979è. On the design matrix strategy in the analysis of categorical data. In K. F. Schuessler èed.è, Sociological methodology 1997 èpp è. San Francisco: Jossey-Bass. ë10ë Fienberg, S. E. è1980è. The analysis of cross-classiæed data è2nd ed.è. Cambridge: MIT Press. ë11ë Goodman, L. A. è1979è. Multiplicative models for square contingency tables with ordered categories. Biometrika, 66, ë12ë Goodman, L. A. è1981è. Three elementary views of log-linear models for the analysis of cross-classiæcations having ordered categories. Sociological Methodology, ë13ë Grizzle, J. E., Starmer, C. F., & Koch, G. G. è1969è. Analysis of categorical data by linear models. Biometrika, 25, ë14ë Hagenaars, J. A. è1990è. Categorical longitudinal data. Newbury Park, CA: Sage. ë15ë Koch, G. G., Amara, I. A., Davis, G. W., & Gillings, D. B. è1982è. A review of some statistical methods for covariance analysis of categorical data. Biometrics, 38,

15 C. Schuster and A. von Eye: GSK Approach 53 ë16ë Koch, G. G., Landis, J. R., Freeman, J. L., Freeman, D. H., & Lehnen, R. G. è1977è. A general method for the analysis of experiments with repeated measurement of categorical data. Biometrics, 33, ë17ë Landis, J. R., & Koch, G. G. è1979è. The analysis of categorical data in longitudinal studies of behavioral development. In J. R. Nesselroade & P. B. Baltes èeds.è, Longitudinal methodology in the study of behavior and development èpp è. New York: Academic Press. ë18ë Landis, J. R., & Miller, M. E. è1988è. Some general methods for the analysis of categorical data in longitudinal studies. Statistics in Medicine, 7, ë19ë Liang, K. Y., & Zeger, S. L. è1986è. Longitudinal data analysis using generalized linear models. Biometrika, 73, ë20ë Liang, K. Y., & Zeger, S. L., & Qaqish, B. è1992è. Multivariate regression analyses for categorical data. Journal of the Royal Statistical Society, B, 54, ë21ë McCullagh, P. è1980è. Regression models for ordinal data. Journal of the Royal Statistical Society, B, 42, ë22ë Meiser, T., von Eye, A., & Spiel, C. è1997è. Loglinear symmetry and quasi-symmetry models for the analysis of change. Biometrical Journal, 39 è3è, ë23ë Rindskopf, D. è1992è A general approach to categorical data analysis with missing data, using generalized linear models with composite links. Psychological Bulletin, 57, è1è, ë24ë Sen, P. K., & Singer, J. M. è1993è. Large sample methods in statistics. London: Chapman and Hall. ë25ë Stokes, M. E., Davis, C. S., & Koch, G. G. è1995è. Categorical data analysis using the SAS system. Cary, NC: SAS Institute Inc. ë26ë von Eye, A., & Spiel, C. è1996è. Standard and non-standard log-linear symmetry models for measuring change in categorical variables. The American Statistician, 50, ë27ë Wickens, T. è1989è. Multiway contigency tables for the social sciences. Hillsdale, NJ: Lawrence Erlbaum Associates. ë28ë Zeger, S. L., & Liang, K. Y. è1986è. Longitudinal data analysis for discrete and continous outcomes. Biometrics, 42,

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES REVSTAT Statistical Journal Volume 13, Number 3, November 2015, 233 243 MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES Authors: Serpil Aktas Department of

More information

Analysis of data in square contingency tables

Analysis of data in square contingency tables Analysis of data in square contingency tables Iva Pecáková Let s suppose two dependent samples: the response of the nth subject in the second sample relates to the response of the nth subject in the first

More information

Repeated ordinal measurements: a generalised estimating equation approach

Repeated ordinal measurements: a generalised estimating equation approach Repeated ordinal measurements: a generalised estimating equation approach David Clayton MRC Biostatistics Unit 5, Shaftesbury Road Cambridge CB2 2BW April 7, 1992 Abstract Cumulative logit and related

More information

Correspondence Analysis of Longitudinal Data

Correspondence Analysis of Longitudinal Data Correspondence Analysis of Longitudinal Data Mark de Rooij* LEIDEN UNIVERSITY, LEIDEN, NETHERLANDS Peter van der G. M. Heijden UTRECHT UNIVERSITY, UTRECHT, NETHERLANDS *Corresponding author (rooijm@fsw.leidenuniv.nl)

More information

HOW TO USE PROC CATMOD IN ESTIMATION PROBLEMS

HOW TO USE PROC CATMOD IN ESTIMATION PROBLEMS , HOW TO USE PROC CATMOD IN ESTIMATION PROBLEMS Olaf Gefeller 1, Franz Woltering2 1 Abteilung Medizinische Statistik, Georg-August-Universitat Gottingen 2Fachbereich Statistik, Universitat Dortmund Abstract

More information

Generalized Linear Models (GLZ)

Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the

More information

Longitudinal Modeling with Logistic Regression

Longitudinal Modeling with Logistic Regression Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to

More information

Modeling Joint and Marginal Distributions in the Analysis of Categorical Panel Data

Modeling Joint and Marginal Distributions in the Analysis of Categorical Panel Data SOCIOLOGICAL Vermunt et al. / JOINT METHODS AND MARGINAL & RESEARCH DISTRIBUTIONS This article presents a unifying approach to the analysis of repeated univariate categorical (ordered) responses based

More information

Yu Xie, Institute for Social Research, 426 Thompson Street, University of Michigan, Ann

Yu Xie, Institute for Social Research, 426 Thompson Street, University of Michigan, Ann Association Model, Page 1 Yu Xie, Institute for Social Research, 426 Thompson Street, University of Michigan, Ann Arbor, MI 48106. Email: yuxie@umich.edu. Tel: (734)936-0039. Fax: (734)998-7415. Association

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS

ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS Libraries 1997-9th Annual Conference Proceedings ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS Eleanor F. Allan Follow this and additional works at: http://newprairiepress.org/agstatconference

More information

Links Between Binary and Multi-Category Logit Item Response Models and Quasi-Symmetric Loglinear Models

Links Between Binary and Multi-Category Logit Item Response Models and Quasi-Symmetric Loglinear Models Links Between Binary and Multi-Category Logit Item Response Models and Quasi-Symmetric Loglinear Models Alan Agresti Department of Statistics University of Florida Gainesville, Florida 32611-8545 July

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

GLM models and OLS regression

GLM models and OLS regression GLM models and OLS regression Graeme Hutcheson, University of Manchester These lecture notes are based on material published in... Hutcheson, G. D. and Sofroniou, N. (1999). The Multivariate Social Scientist:

More information

Decomposition of Parsimonious Independence Model Using Pearson, Kendall and Spearman s Correlations for Two-Way Contingency Tables

Decomposition of Parsimonious Independence Model Using Pearson, Kendall and Spearman s Correlations for Two-Way Contingency Tables International Journal of Statistics and Probability; Vol. 7 No. 3; May 208 ISSN 927-7032 E-ISSN 927-7040 Published by Canadian Center of Science and Education Decomposition of Parsimonious Independence

More information

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction ReCap. Parts I IV. The General Linear Model Part V. The Generalized Linear Model 16 Introduction 16.1 Analysis

More information

Logistic Regression. Fitting the Logistic Regression Model BAL040-A.A.-10-MAJ

Logistic Regression. Fitting the Logistic Regression Model BAL040-A.A.-10-MAJ Logistic Regression The goal of a logistic regression analysis is to find the best fitting and most parsimonious, yet biologically reasonable, model to describe the relationship between an outcome (dependent

More information

A class of latent marginal models for capture-recapture data with continuous covariates

A class of latent marginal models for capture-recapture data with continuous covariates A class of latent marginal models for capture-recapture data with continuous covariates F Bartolucci A Forcina Università di Urbino Università di Perugia FrancescoBartolucci@uniurbit forcina@statunipgit

More information

Lecture 25: Models for Matched Pairs

Lecture 25: Models for Matched Pairs Lecture 25: Models for Matched Pairs Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture

More information

Trends in Human Development Index of European Union

Trends in Human Development Index of European Union Trends in Human Development Index of European Union Department of Statistics, Hacettepe University, Beytepe, Ankara, Turkey spxl@hacettepe.edu.tr, deryacal@hacettepe.edu.tr Abstract: The Human Development

More information

A Scaled Diæerence Chi-square Test Statistic. Albert Satorra. Universitat Pompeu Fabra. and. Peter M. Bentler. August 3, 1999

A Scaled Diæerence Chi-square Test Statistic. Albert Satorra. Universitat Pompeu Fabra. and. Peter M. Bentler. August 3, 1999 A Scaled Diæerence Chi-square Test Statistic for Moment Structure Analysis æ Albert Satorra Universitat Pompeu Fabra and Peter M. Bentler University of California, Los Angeles August 3, 1999 æ Research

More information

ij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as

ij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as page1 Loglinear Models Loglinear models are a way to describe association and interaction patterns among categorical variables. They are commonly used to model cell counts in contingency tables. These

More information

Ridit Score Type Quasi-Symmetry and Decomposition of Symmetry for Square Contingency Tables with Ordered Categories

Ridit Score Type Quasi-Symmetry and Decomposition of Symmetry for Square Contingency Tables with Ordered Categories AUSTRIAN JOURNAL OF STATISTICS Volume 38 (009), Number 3, 183 19 Ridit Score Type Quasi-Symmetry and Decomposition of Symmetry for Square Contingency Tables with Ordered Categories Kiyotaka Iki, Kouji

More information

Latent Class Analysis for Models with Error of Measurement Using Log-Linear Models and An Application to Women s Liberation Data

Latent Class Analysis for Models with Error of Measurement Using Log-Linear Models and An Application to Women s Liberation Data Journal of Data Science 9(2011), 43-54 Latent Class Analysis for Models with Error of Measurement Using Log-Linear Models and An Application to Women s Liberation Data Haydar Demirhan Hacettepe University

More information

An Approximate Test for Homogeneity of Correlated Correlation Coefficients

An Approximate Test for Homogeneity of Correlated Correlation Coefficients Quality & Quantity 37: 99 110, 2003. 2003 Kluwer Academic Publishers. Printed in the Netherlands. 99 Research Note An Approximate Test for Homogeneity of Correlated Correlation Coefficients TRIVELLORE

More information

A general non-parametric approach to the analysis of ordinal categorical data Vermunt, Jeroen

A general non-parametric approach to the analysis of ordinal categorical data Vermunt, Jeroen Tilburg University A general non-parametric approach to the analysis of ordinal categorical data Vermunt, Jeroen Published in: Sociological Methodology Document version: Peer reviewed version Publication

More information

A general non-parametric approach to the analysis of ordinal categorical data Vermunt, Jeroen

A general non-parametric approach to the analysis of ordinal categorical data Vermunt, Jeroen Tilburg University A general non-parametric approach to the analysis of ordinal categorical data Vermunt, Jeroen Published in: Sociological Methodology Document version: Peer reviewed version Publication

More information

Log-linear multidimensional Rasch model for capture-recapture

Log-linear multidimensional Rasch model for capture-recapture Log-linear multidimensional Rasch model for capture-recapture Elvira Pelle, University of Milano-Bicocca, e.pelle@campus.unimib.it David J. Hessen, Utrecht University, D.J.Hessen@uu.nl Peter G.M. Van der

More information

Statistical Models for Management. Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon. February 24 26, 2010

Statistical Models for Management. Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon. February 24 26, 2010 Statistical Models for Management Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon February 24 26, 2010 Graeme Hutcheson, University of Manchester GLM models and OLS regression The

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS

DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS Ivy Liu and Dong Q. Wang School of Mathematics, Statistics and Computer Science Victoria University of Wellington New Zealand Corresponding

More information

PACKAGE LMest FOR LATENT MARKOV ANALYSIS

PACKAGE LMest FOR LATENT MARKOV ANALYSIS PACKAGE LMest FOR LATENT MARKOV ANALYSIS OF LONGITUDINAL CATEGORICAL DATA Francesco Bartolucci 1, Silvia Pandofi 1, and Fulvia Pennoni 2 1 Department of Economics, University of Perugia (e-mail: francesco.bartolucci@unipg.it,

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

Chapter 3 ANALYSIS OF RESPONSE PROFILES

Chapter 3 ANALYSIS OF RESPONSE PROFILES Chapter 3 ANALYSIS OF RESPONSE PROFILES 78 31 Introduction In this chapter we present a method for analysing longitudinal data that imposes minimal structure or restrictions on the mean responses over

More information

LOG-MULTIPLICATIVE ASSOCIATION MODELS AS LATENT VARIABLE MODELS FOR NOMINAL AND0OR ORDINAL DATA. Carolyn J. Anderson* Jeroen K.

LOG-MULTIPLICATIVE ASSOCIATION MODELS AS LATENT VARIABLE MODELS FOR NOMINAL AND0OR ORDINAL DATA. Carolyn J. Anderson* Jeroen K. 3 LOG-MULTIPLICATIVE ASSOCIATION MODELS AS LATENT VARIABLE MODELS FOR NOMINAL AND0OR ORDINAL DATA Carolyn J. Anderson* Jeroen K. Vermunt Associations between multiple discrete measures are often due to

More information

Investigating Models with Two or Three Categories

Investigating Models with Two or Three Categories Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Sunil Kumar Dhar Center for Applied Mathematics and Statistics, Department of Mathematical Sciences, New Jersey

More information

Three-Way Tables (continued):

Three-Way Tables (continued): STAT5602 Categorical Data Analysis Mills 2015 page 110 Three-Way Tables (continued) Now let us look back over the br preference example. We have fitted the following loglinear models 1.MODELX,Y,Z logm

More information

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities

More information

Marginal log-linear parameterization of conditional independence models

Marginal log-linear parameterization of conditional independence models Marginal log-linear parameterization of conditional independence models Tamás Rudas Department of Statistics, Faculty of Social Sciences ötvös Loránd University, Budapest rudas@tarki.hu Wicher Bergsma

More information

Longitudinal Data Analysis. Michael L. Berbaum Institute for Health Research and Policy University of Illinois at Chicago

Longitudinal Data Analysis. Michael L. Berbaum Institute for Health Research and Policy University of Illinois at Chicago Longitudinal Data Analysis Michael L. Berbaum Institute for Health Research and Policy University of Illinois at Chicago Course description: Longitudinal analysis is the study of short series of observations

More information

Assessing GEE Models with Longitudinal Ordinal Data by Global Odds Ratio

Assessing GEE Models with Longitudinal Ordinal Data by Global Odds Ratio Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS074) p.5763 Assessing GEE Models wh Longudinal Ordinal Data by Global Odds Ratio LIN, KUO-CHIN Graduate Instute of

More information

An Overview of Methods in the Analysis of Dependent Ordered Categorical Data: Assumptions and Implications

An Overview of Methods in the Analysis of Dependent Ordered Categorical Data: Assumptions and Implications WORKING PAPER SERIES WORKING PAPER NO 7, 2008 Swedish Business School at Örebro An Overview of Methods in the Analysis of Dependent Ordered Categorical Data: Assumptions and Implications By Hans Högberg

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

Confidence intervals for the variance component of random-effects linear models

Confidence intervals for the variance component of random-effects linear models The Stata Journal (2004) 4, Number 4, pp. 429 435 Confidence intervals for the variance component of random-effects linear models Matteo Bottai Arnold School of Public Health University of South Carolina

More information

Log-linear Modelling with Complex Survey Data

Log-linear Modelling with Complex Survey Data Int. Statistical Inst.: Proc. 58th World Statistical Congress, 20, Dublin (Session IPS056) p.02 Log-linear Modelling with Complex Survey Data Chris Sinner University of Southampton United Kingdom Abstract:

More information

GMM Logistic Regression with Time-Dependent Covariates and Feedback Processes in SAS TM

GMM Logistic Regression with Time-Dependent Covariates and Feedback Processes in SAS TM Paper 1025-2017 GMM Logistic Regression with Time-Dependent Covariates and Feedback Processes in SAS TM Kyle M. Irimata, Arizona State University; Jeffrey R. Wilson, Arizona State University ABSTRACT The

More information

A COEFFICIENT OF DETERMINATION FOR LOGISTIC REGRESSION MODELS

A COEFFICIENT OF DETERMINATION FOR LOGISTIC REGRESSION MODELS A COEFFICIENT OF DETEMINATION FO LOGISTIC EGESSION MODELS ENATO MICELI UNIVESITY OF TOINO After a brief presentation of the main extensions of the classical coefficient of determination ( ), a new index

More information

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University A SURVEY OF VARIANCE COMPONENTS ESTIMATION FROM BINARY DATA by Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University BU-1211-M May 1993 ABSTRACT The basic problem of variance components

More information

action statements, which are called classiæers, and a corresponding list of real numbers, called the strengths of the classiæers. Classiæers bid their

action statements, which are called classiæers, and a corresponding list of real numbers, called the strengths of the classiæers. Classiæers bid their Reinforcement learning and dynamic optimization Erdem Baçsçcç and Mehmet Orhan 1 Department of Economics, Bilkent University, 06533, Bilkent, Ankara, Turkey Abstract This paper is about learning in dynamic

More information

A bias-correction for Cramér s V and Tschuprow s T

A bias-correction for Cramér s V and Tschuprow s T A bias-correction for Cramér s V and Tschuprow s T Wicher Bergsma London School of Economics and Political Science Abstract Cramér s V and Tschuprow s T are closely related nominal variable association

More information

ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION

ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION Ernest S. Shtatland, Ken Kleinman, Emily M. Cain Harvard Medical School, Harvard Pilgrim Health Care, Boston, MA ABSTRACT In logistic regression,

More information

A. Motivation To motivate the analysis of variance framework, we consider the following example.

A. Motivation To motivate the analysis of variance framework, we consider the following example. 9.07 ntroduction to Statistics for Brain and Cognitive Sciences Emery N. Brown Lecture 14: Analysis of Variance. Objectives Understand analysis of variance as a special case of the linear model. Understand

More information

Determining the number of components in mixture models for hierarchical data

Determining the number of components in mixture models for hierarchical data Determining the number of components in mixture models for hierarchical data Olga Lukočienė 1 and Jeroen K. Vermunt 2 1 Department of Methodology and Statistics, Tilburg University, P.O. Box 90153, 5000

More information

Deænition 1 A set S is ænite if there exists a number N such that the number of elements in S èdenoted jsjè is less than N. If no such N exists, then

Deænition 1 A set S is ænite if there exists a number N such that the number of elements in S èdenoted jsjè is less than N. If no such N exists, then Morphology of Proof An introduction to rigorous proof techniques Craig Silverstein September 26, 1998 1 Methodology of Proof an example Deep down, all theorems are of the form if A then B. They may be

More information

Marginal Models for Categorical Data

Marginal Models for Categorical Data Marginal Models for Categorical Data The original version of this manuscript was published in 1997 by Tilburg University Press, Tilburg, The Netherlands. This text is essentially the same as the original

More information

Strati cation in Multivariate Modeling

Strati cation in Multivariate Modeling Strati cation in Multivariate Modeling Tihomir Asparouhov Muthen & Muthen Mplus Web Notes: No. 9 Version 2, December 16, 2004 1 The author is thankful to Bengt Muthen for his guidance, to Linda Muthen

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification, Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability

More information

An Introduction to Path Analysis

An Introduction to Path Analysis An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving

More information

Unit 9: Inferences for Proportions and Count Data

Unit 9: Inferences for Proportions and Count Data Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 1/15/008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)

More information

Models for Longitudinal Analysis of Binary Response Data for Identifying the Effects of Different Treatments on Insomnia

Models for Longitudinal Analysis of Binary Response Data for Identifying the Effects of Different Treatments on Insomnia Applied Mathematical Sciences, Vol. 4, 2010, no. 62, 3067-3082 Models for Longitudinal Analysis of Binary Response Data for Identifying the Effects of Different Treatments on Insomnia Z. Rezaei Ghahroodi

More information

GEE for Longitudinal Data - Chapter 8

GEE for Longitudinal Data - Chapter 8 GEE for Longitudinal Data - Chapter 8 GEE: generalized estimating equations (Liang & Zeger, 1986; Zeger & Liang, 1986) extension of GLM to longitudinal data analysis using quasi-likelihood estimation method

More information

Chapter 5. Introduction to Path Analysis. Overview. Correlation and causation. Specification of path models. Types of path models

Chapter 5. Introduction to Path Analysis. Overview. Correlation and causation. Specification of path models. Types of path models Chapter 5 Introduction to Path Analysis Put simply, the basic dilemma in all sciences is that of how much to oversimplify reality. Overview H. M. Blalock Correlation and causation Specification of path

More information

Logistic Regression Analysis

Logistic Regression Analysis Logistic Regression Analysis Predicting whether an event will or will not occur, as well as identifying the variables useful in making the prediction, is important in most academic disciplines as well

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

An Introduction to Mplus and Path Analysis

An Introduction to Mplus and Path Analysis An Introduction to Mplus and Path Analysis PSYC 943: Fundamentals of Multivariate Modeling Lecture 10: October 30, 2013 PSYC 943: Lecture 10 Today s Lecture Path analysis starting with multivariate regression

More information

Measure for No Three-Factor Interaction Model in Three-Way Contingency Tables

Measure for No Three-Factor Interaction Model in Three-Way Contingency Tables American Journal of Biostatistics (): 7-, 00 ISSN 948-9889 00 Science Publications Measure for No Three-Factor Interaction Model in Three-Way Contingency Tables Kouji Yamamoto, Kyoji Hori and Sadao Tomizawa

More information

The concord Package. August 20, 2006

The concord Package. August 20, 2006 The concord Package August 20, 2006 Version 1.4-6 Date 2006-08-15 Title Concordance and reliability Author , Ian Fellows Maintainer Measures

More information

The GENMOD Procedure. Overview. Getting Started. Syntax. Details. Examples. References. SAS/STAT User's Guide. Book Contents Previous Next

The GENMOD Procedure. Overview. Getting Started. Syntax. Details. Examples. References. SAS/STAT User's Guide. Book Contents Previous Next Book Contents Previous Next SAS/STAT User's Guide Overview Getting Started Syntax Details Examples References Book Contents Previous Next Top http://v8doc.sas.com/sashtml/stat/chap29/index.htm29/10/2004

More information

Negative Multinomial Model and Cancer. Incidence

Negative Multinomial Model and Cancer. Incidence Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence S. Lahiri & Sunil K. Dhar Department of Mathematical Sciences, CAMS New Jersey Institute of Technology, Newar,

More information

Testing Independence

Testing Independence Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1

More information

Obtaining the Maximum Likelihood Estimates in Incomplete R C Contingency Tables Using a Poisson Generalized Linear Model

Obtaining the Maximum Likelihood Estimates in Incomplete R C Contingency Tables Using a Poisson Generalized Linear Model Obtaining the Maximum Likelihood Estimates in Incomplete R C Contingency Tables Using a Poisson Generalized Linear Model Stuart R. LIPSITZ, Michael PARZEN, and Geert MOLENBERGHS This article describes

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Statistics 3858 : Contingency Tables

Statistics 3858 : Contingency Tables Statistics 3858 : Contingency Tables 1 Introduction Before proceeding with this topic the student should review generalized likelihood ratios ΛX) for multinomial distributions, its relation to Pearson

More information

Data Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA

Data Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA Data Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA ABSTRACT Regression analysis is one of the most used statistical methodologies. It can be used to describe or predict causal

More information

Correspondence Analysis

Correspondence Analysis Correspondence Analysis Q: when independence of a 2-way contingency table is rejected, how to know where the dependence is coming from? The interaction terms in a GLM contain dependence information; however,

More information

Regression models for multivariate ordered responses via the Plackett distribution

Regression models for multivariate ordered responses via the Plackett distribution Journal of Multivariate Analysis 99 (2008) 2472 2478 www.elsevier.com/locate/jmva Regression models for multivariate ordered responses via the Plackett distribution A. Forcina a,, V. Dardanoni b a Dipartimento

More information

Introducing Generalized Linear Models: Logistic Regression

Introducing Generalized Linear Models: Logistic Regression Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and

More information

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014 LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers

More information

Longitudinal Analysis. Michael L. Berbaum Institute for Health Research and Policy University of Illinois at Chicago

Longitudinal Analysis. Michael L. Berbaum Institute for Health Research and Policy University of Illinois at Chicago Longitudinal Analysis Michael L. Berbaum Institute for Health Research and Policy University of Illinois at Chicago Course description: Longitudinal analysis is the study of short series of observations

More information

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data Journal of Multivariate Analysis 78, 6282 (2001) doi:10.1006jmva.2000.1939, available online at http:www.idealibrary.com on Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone

More information

Canonical Correlation Analysis of Longitudinal Data

Canonical Correlation Analysis of Longitudinal Data Biometrics Section JSM 2008 Canonical Correlation Analysis of Longitudinal Data Jayesh Srivastava Dayanand N Naik Abstract Studying the relationship between two sets of variables is an important multivariate

More information

Poisson regression: Further topics

Poisson regression: Further topics Poisson regression: Further topics April 21 Overdispersion One of the defining characteristics of Poisson regression is its lack of a scale parameter: E(Y ) = Var(Y ), and no parameter is available to

More information

Time-Invariant Predictors in Longitudinal Models

Time-Invariant Predictors in Longitudinal Models Time-Invariant Predictors in Longitudinal Models Today s Class (or 3): Summary of steps in building unconditional models for time What happens to missing predictors Effects of time-invariant predictors

More information

Psychology 282 Lecture #4 Outline Inferences in SLR

Psychology 282 Lecture #4 Outline Inferences in SLR Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations

More information

Unit 9: Inferences for Proportions and Count Data

Unit 9: Inferences for Proportions and Count Data Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 12/15/2008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)

More information

LOGISTICS REGRESSION FOR SAMPLE SURVEYS

LOGISTICS REGRESSION FOR SAMPLE SURVEYS 4 LOGISTICS REGRESSION FOR SAMPLE SURVEYS Hukum Chandra Indian Agricultural Statistics Research Institute, New Delhi-002 4. INTRODUCTION Researchers use sample survey methodology to obtain information

More information

The equivalence of the Maximum Likelihood and a modified Least Squares for a case of Generalized Linear Model

The equivalence of the Maximum Likelihood and a modified Least Squares for a case of Generalized Linear Model Applied and Computational Mathematics 2014; 3(5): 268-272 Published online November 10, 2014 (http://www.sciencepublishinggroup.com/j/acm) doi: 10.11648/j.acm.20140305.22 ISSN: 2328-5605 (Print); ISSN:

More information

Lecture 3: Multiple Regression

Lecture 3: Multiple Regression Lecture 3: Multiple Regression R.G. Pierse 1 The General Linear Model Suppose that we have k explanatory variables Y i = β 1 + β X i + β 3 X 3i + + β k X ki + u i, i = 1,, n (1.1) or Y i = β j X ji + u

More information

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011) Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October

More information

COMPOSITIONAL IDEAS IN THE BAYESIAN ANALYSIS OF CATEGORICAL DATA WITH APPLICATION TO DOSE FINDING CLINICAL TRIALS

COMPOSITIONAL IDEAS IN THE BAYESIAN ANALYSIS OF CATEGORICAL DATA WITH APPLICATION TO DOSE FINDING CLINICAL TRIALS COMPOSITIONAL IDEAS IN THE BAYESIAN ANALYSIS OF CATEGORICAL DATA WITH APPLICATION TO DOSE FINDING CLINICAL TRIALS M. Gasparini and J. Eisele 2 Politecnico di Torino, Torino, Italy; mauro.gasparini@polito.it

More information

pairs. Such a system is a reinforcement learning system. In this paper we consider the case where we have a distribution of rewarded pairs of input an

pairs. Such a system is a reinforcement learning system. In this paper we consider the case where we have a distribution of rewarded pairs of input an Learning Canonical Correlations Hans Knutsson Magnus Borga Tomas Landelius knutte@isy.liu.se magnus@isy.liu.se tc@isy.liu.se Computer Vision Laboratory Department of Electrical Engineering Linkíoping University,

More information

Ron Heck, Fall Week 3: Notes Building a Two-Level Model

Ron Heck, Fall Week 3: Notes Building a Two-Level Model Ron Heck, Fall 2011 1 EDEP 768E: Seminar on Multilevel Modeling rev. 9/6/2011@11:27pm Week 3: Notes Building a Two-Level Model We will build a model to explain student math achievement using student-level

More information

Mixed- Model Analysis of Variance. Sohad Murrar & Markus Brauer. University of Wisconsin- Madison. Target Word Count: Actual Word Count: 2755

Mixed- Model Analysis of Variance. Sohad Murrar & Markus Brauer. University of Wisconsin- Madison. Target Word Count: Actual Word Count: 2755 Mixed- Model Analysis of Variance Sohad Murrar & Markus Brauer University of Wisconsin- Madison The SAGE Encyclopedia of Educational Research, Measurement and Evaluation Target Word Count: 3000 - Actual

More information

Models for Binary Outcomes

Models for Binary Outcomes Models for Binary Outcomes Introduction The simple or binary response (for example, success or failure) analysis models the relationship between a binary response variable and one or more explanatory variables.

More information

Three-Way Contingency Tables

Three-Way Contingency Tables Newsom PSY 50/60 Categorical Data Analysis, Fall 06 Three-Way Contingency Tables Three-way contingency tables involve three binary or categorical variables. I will stick mostly to the binary case to keep

More information

Repeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models

Repeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models Repeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models EPSY 905: Multivariate Analysis Spring 2016 Lecture #12 April 20, 2016 EPSY 905: RM ANOVA, MANOVA, and Mixed Models

More information

Estimated Precision for Predictions from Generalized Linear Models in Sociological Research

Estimated Precision for Predictions from Generalized Linear Models in Sociological Research Quality & Quantity 34: 137 152, 2000. 2000 Kluwer Academic Publishers. Printed in the Netherlands. 137 Estimated Precision for Predictions from Generalized Linear Models in Sociological Research TIM FUTING

More information