Quasi-Symmetric Graphical Log-Linear Models

Similar documents
Encapsulation theory: radial encapsulation. Edmund Kirwan *

Pearson s Chi-Square Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted Histograms

ON INDEPENDENT SETS IN PURELY ATOMIC PROBABILITY SPACES WITH GEOMETRIC DISTRIBUTION. 1. Introduction. 1 r r. r k for every set E A, E \ {0},

On the integration of the equations of hydrodynamics

arxiv: v1 [math.co] 4 May 2017

Fractional Zero Forcing via Three-color Forcing Games

The Substring Search Problem

A Multivariate Normal Law for Turing s Formulae

GENLOG Multinomial Loglinear and Logit Models

New problems in universal algebraic geometry illustrated by boolean equations

Encapsulation theory: the transformation equations of absolute information hiding.

A Bijective Approach to the Permutational Power of a Priority Queue

3.1 Random variables

4/18/2005. Statistical Learning Theory

Psychometric Methods: Theory into Practice Larry R. Price

Chem 453/544 Fall /08/03. Exam #1 Solutions

LET a random variable x follows the two - parameter

Surveillance Points in High Dimensional Spaces

6 PROBABILITY GENERATING FUNCTIONS

( ) [ ] [ ] [ ] δf φ = F φ+δφ F. xdx.

Multiple Criteria Secretary Problem: A New Approach

Information Retrieval Advanced IR models. Luca Bondi

Goodness-of-fit for composite hypotheses.

PROBLEM SET #1 SOLUTIONS by Robert A. DiStasio Jr.

Lecture 28: Convergence of Random Variables and Related Theorems

MAGNETIC FIELD AROUND TWO SEPARATED MAGNETIZING COILS

Quasi-Randomness and the Distribution of Copies of a Fixed Graph

On a quantity that is analogous to potential and a theorem that relates to it

THE JEU DE TAQUIN ON THE SHIFTED RIM HOOK TABLEAUX. Jaejin Lee

Alternative Tests for the Poisson Distribution

ANA BERRIZBEITIA, LUIS A. MEDINA, ALEXANDER C. MOLL, VICTOR H. MOLL, AND LAINE NOBLE

Supplementary information Efficient Enumeration of Monocyclic Chemical Graphs with Given Path Frequencies

MULTILAYER PERCEPTRONS

Introduction to Mathematical Statistics Robert V. Hogg Joeseph McKean Allen T. Craig Seventh Edition

Chapter 3: Theory of Modular Arithmetic 38

working pages for Paul Richards class notes; do not copy or circulate without permission from PGR 2004/11/3 10:50

Multiple Experts with Binary Features

ON THE TWO-BODY PROBLEM IN QUANTUM MECHANICS

Compactly Supported Radial Basis Functions

Topic 5. Mean separation: Multiple comparisons [ST&D Ch.8, except 8.3]

Temporal-Difference Learning

7.2. Coulomb s Law. The Electric Force

Relating Branching Program Size and. Formula Size over the Full Binary Basis. FB Informatik, LS II, Univ. Dortmund, Dortmund, Germany

Stanford University CS259Q: Quantum Computing Handout 8 Luca Trevisan October 18, 2012

Central Coverage Bayes Prediction Intervals for the Generalized Pareto Distribution

EM Boundary Value Problems

arxiv: v1 [physics.pop-ph] 3 Jun 2013

Perturbation to Symmetries and Adiabatic Invariants of Nonholonomic Dynamical System of Relative Motion

On decompositions of complete multipartite graphs into the union of two even cycles

Math 124B February 02, 2012

CERFACS 42 av. Gaspard Coriolis, Toulouse, Cedex 1, France. Available at Date: April 2, 2008.

A Comparison and Contrast of Some Methods for Sample Quartiles

EQUI-PARTITIONING OF HIGHER-DIMENSIONAL HYPER-RECTANGULAR GRID GRAPHS

QIP Course 10: Quantum Factorization Algorithm (Part 3)

Unobserved Correlation in Ascending Auctions: Example And Extensions

Hydroelastic Analysis of a 1900 TEU Container Ship Using Finite Element and Boundary Element Methods

Long-range stress re-distribution resulting from damage in heterogeneous media

I. CONSTRUCTION OF THE GREEN S FUNCTION

arxiv: v1 [math.co] 1 Apr 2011

2 Governing Equations

Do Managers Do Good With Other People s Money? Online Appendix

Introduction to Nuclear Forces

An Exact Solution of Navier Stokes Equation

As is natural, our Aerospace Structures will be described in a Euclidean three-dimensional space R 3.

ASTR415: Problem Set #6

MATH 415, WEEK 3: Parameter-Dependence and Bifurcations

Duality between Statical and Kinematical Engineering Systems

Probablistically Checkable Proofs

Upper Bounds for Tura n Numbers. Alexander Sidorenko

Appendix A. Appendices. A.1 ɛ ijk and cross products. Vector Operations: δ ij and ɛ ijk

Construction and Analysis of Boolean Functions of 2t + 1 Variables with Maximum Algebraic Immunity

On the Structure of Linear Programs with Overlapping Cardinality Constraints

q i i=1 p i ln p i Another measure, which proves a useful benchmark in our analysis, is the chi squared divergence of p, q, which is defined by

Analytical Solutions for Confined Aquifers with non constant Pumping using Computer Algebra

Likelihood vs. Information in Aligning Biopolymer Sequences. UCSD Technical Report CS Timothy L. Bailey

Matrix Colorings of P 4 -sparse Graphs

Parameter identification in Markov chain choice models

Identification of the degradation of railway ballast under a concrete sleeper

A STUDY OF HAMMING CODES AS ERROR CORRECTING CODES

Absorption Rate into a Small Sphere for a Diffusing Particle Confined in a Large Sphere

On the ratio of maximum and minimum degree in maximal intersecting families

A NEW VARIABLE STIFFNESS SPRING USING A PRESTRESSED MECHANISM

Web-based Supplementary Materials for. Controlling False Discoveries in Multidimensional Directional Decisions, with

Scattering in Three Dimensions

CALCULUS II Vectors. Paul Dawkins

Light Time Delay and Apparent Position

16 Modeling a Language by a Markov Process

Journal of Inequalities in Pure and Applied Mathematics

Syntactical content of nite approximations of partial algebras 1 Wiktor Bartol Inst. Matematyki, Uniw. Warszawski, Warszawa (Poland)

Lecture 7 Topic 5: Multiple Comparisons (means separation)

Lecture 18: Graph Isomorphisms

Chapter 3 Optical Systems with Annular Pupils

Failure Probability of 2-within-Consecutive-(2, 2)-out-of-(n, m): F System for Special Values of m

KOEBE DOMAINS FOR THE CLASSES OF FUNCTIONS WITH RANGES INCLUDED IN GIVEN SETS

The Chromatic Villainy of Complete Multipartite Graphs

Physics 2B Chapter 22 Notes - Magnetic Field Spring 2018

Chapter 5 Force and Motion

Lecture 8 - Gauss s Law

General Solution of EM Wave Propagation in Anisotropic Media

Chapter 5 Force and Motion

Transcription:

Scandinavian Jounal of Statistics, Vol. 38: 447 465, 2011 doi: 10.1111/j.1467-9469.2010.00713.x Published by Blackwell Publishing Ltd. Quasi-Symmetic Gaphical Log-Linea Models ANNA GOTTARD and GIOVANNI MARIA MARCHETTI Depatment of Statistics, Univesity of Floence ALAN AGRESTI Depatment of Statistics, Univesity of Floida ABSTRACT. We popose an extension of gaphical log-linea models to allow fo symmety constaints on some inteaction paametes that epesent homologous factos. The conditional independence stuctue of such quasi-symmetic (QS) gaphical models is descibed by an undiected gaph with coloued edges, in which a paticula colou coesponds to a set of equality constaints on a set of paametes. Unlike standad QS models, the poposed models apply with contingency tables fo which only some vaiables o sets of the vaiables have the same categoies. We study the gaphical popeties of such models, including conditions fo decomposition of model paametes and of maximum likelihood estimates. Key wods: conditional independence, decomposition, exchangeability, gaphical models, homologous vaiables 1. Intoduction Log-linea models ae useful in many fields, such as fo descibing multivaiate association stuctue among esponse vaiables measued in sample suveys fo social science eseach. Gaphical log-linea models, descibed by Daoch et al. (1980), ae a subclass of hieachical log-linea models. Each such model coesponds to an undiected gaph in which each vaiable is epesented by a node and the absence of an edge connecting nodes epesents conditional independence. In the log-linea model fomula, inteaction tems ae set to zeo accoding to the edges that ae missing in the gaph. In some applications, many o all the esponse vaiables ae measued with the same categoical scale; that is, they ae homologous. This is common in medical longitudinal studies that epeatedly obseve whethe some condition is pesent at vaious times, o attitudinal eseach that obseves subjects opinions about some issue unde a vaiety of conditions. In such cases, the contingency table that coss-classifies the esponse vaiables has specialized stuctue, and cetain models can be useful when we expect the joint distibution to exhibit a stuctue that is exchangeable in cetain aspects. An example is the standad quasi-symmetic (QS) stuctue by which the single-facto maginal distibutions may diffe but the two-way and highe-ode tems in the log-linea model have a symmetic fom (Caussinus, 1966; Bishop et al., 1975). To illustate, we pesent and late analyse two data sets fom the US Geneal Social Suvey. The fist pesents attitudes about legalized abotion and about the death penalty. The data, displayed in Table 1, consists of fou binay vaiables obseved in 2002, 2004 and 2006. Thee vaiables povide esponses to whethe abotions should be legal: when thee is a stong chance of a seious defect in the baby (D), when the woman s health is seiously endangeed (H) and when the pegnancy is the esult of a ape (R). The fouth vaiable is whethe a subject favous the death penalty fo people convicted of mude (P). All vaiables use the same scale, (yes, no). It is plausible that exchangeability could occu among D, H and R,

448 A. Gottad et al. Scand J Statist 38 Table 1. Data on attitudes about legalized abotion in thee cases (D: defect in the baby, H: woman s health poblems, R: ape) and whethe favou the death penalty (P) P Yes No D H R Yes No Yes No Yes Yes 1590 105 634 60 No 12 9 10 10 No Yes 159 128 66 93 No 30 172 9 131 but thee is no eason to expect exchangeability of these thee items with P except fo the fact that this item also deals with potential taking of a human life. The second data set elates to the degee of satisfaction with some aspects of US govenment policy in 2006. The data, shown in Table 2, contain homologous vaiables on how successful US govenment policy has been on potecting the envionment (E), fighting unemployment (U) and poviding a decent standad of living fo the old (O), and a nonhomologous vaiable, gende (G). The thee homologous vaiables have levels s, successful; n, neithe successful no unsuccessful; and u, unsuccessful. Fo such data sets, questions of inteest include the following: Do subsets of the vaiables have exchangeable stuctue fo inteactions? When cetain sets of vaiables fom cliques that have inteaction stuctue simple than the usual one fo gaphical models, do standad esults fom gaphical models still apply about popeties such as decomposability and collapsibility? In this aticle, we intoduce a new class of gaphical log-linea models, called QS gaphical models, in which cetain inteaction paametes ae esticted to be identical fo sets of vaiables. Diffeent sets of constaints have diffeent gaphical configuations, with the standad QS model being a paticula case. To depict the QS constaints in the gaph, we use coloued edges, adapting the idea of Højsgaad & Lauitzen (2008) of using colous to epesent symmeties in the association fom. QS gaphical models extend odinay QS models and othe types of genealized symmety models, such as a subclass fo hypecubic contingency tables poposed by Lovison (2000). Afte intoducing gaphical QS models, we study thei gaphical popeties. In paticula, we investigate the consequences on decomposition and collapsibility of imposing symmety constaints and equality constaints on cetain inteaction tems. Such popeties ae useful fo educing the complexity of a model and fo facilitating intepetation. We will see that, as in odinay models, sepaation implies paametic collapsibility, in tems of cetain paametes emaining constant when tables ae collapsed in cetain ways. Howeve, a new, Table 2. Data on degee of satisfaction with govenment policy O s n u E G U s n u s n u s n u s m 58 22 13 26 30 7 39 13 19 f 52 21 17 21 31 7 28 29 32 n m 18 12 8 27 31 14 10 22 34 f 9 13 11 26 55 23 13 39 39 u m 13 12 12 14 25 33 17 29 112 f 14 12 10 12 32 24 21 38 112

Scand J Statist 38 Gaphical quasi-symmetic models 449 coloued, vesion of decomposition implies collapsibility of the estimates fom maximum likelihood (ML) infeence. The aticle is oganized as follows. Section 2 defines QS gaphical models. Section 3 discusses model fitting and special issues that aise in the compaison of coloued gaphical models in model selection. In section 4, we descibe chaacteistics and popeties of coloued gaphical models. Section 5 uses coloued gaphical models to analyse the data intoduced befoe. Section 6 discusses possible extensions and makes some concluding emaks. 2. Gaphical QS models 2.1. Gaphical log-linea models Let X 1, X 2,..., X d be discete andom vaiables with X v taking values in [ v ] ={1,..., v },fo v V ={1,..., d}. Let I = v V [ v], bethed-dimensional table of cells fom coss-classifying the vaiables, with ι =(i 1,..., i d ) denoting a geneic cell in the table. The pobabilities p(ι) = P(X 1 = i 1,..., X d = i d ), ι =(i 1,..., i d ) I specify the joint distibution of (X 1, X 2,..., X d ). We shall assume that each p(ι) is stictly positive. Unde multinomial sampling with n obsevations, let n(ι) be the obseved count in cell ι, with μ(ι) = np(ι) its expected value. Fo each subset a of V, let I a = v A [ v] be a maginal table with ι a denoting a coesponding maginal cell. The joint pobability table admits a log-linea expansion log μ(ι) = λ a (ι a ), (1) a V whee λ a (ι a ) is a function, defining the log-linea paametes, indexed by the subset a of V. Thee is a one-to-one and smooth tansfomation between the joint pobabilities p and the log-linea paametes λ, and model (1) is the satuated log-linea model. The non-zeo paametes λ a (ι a ) ae commonly called log-linea inteactions of ode a. To ensue identifiability we shall adopt sum-to-zeo constaints, wheeby the sum ove values of any index fo a λ tem equals 0. Fo many models, one could equivalently use efeence-level coding, such as λ a (ι a ) = 0 wheneve at least one index in ι a is equal to 1. Howeve, fo cetain models in the class discussed in this aticle, the use of such coding esults in a non-equivalent model having unnatual constaints in pattens of association. We discuss this futhe in section 2.4. Submodels of the satuated log-linea model ae obtained by deleting some λ a (ι a ) o imposing equality constaints on them. An especially useful class is that of gaphical loglinea models, in which the tems deleted ae dictated by the stuctue of an undiected gaph G =(V, E), defined ove the set V of nodes and by a set of edges E. The appendix pesents a shot summay of gaph theoy. In the theoy of gaphical models, the nodes index the andom vecto X = (X v v V ) while absence of an edge vw = (v, w) ine implies conditional independence between X v and X w, given the othe vaiables. We summaize gaphical log-linea models biefly hee, efeing to Lauitzen (1996) fo futhe details. Given an undiected gaph G = (V, E), a gaphical loglinea model associated with G is defined by the set of stictly positive discete pobability distibutions with log-linea expansion (1) having constaints λ a (ι a ) = 0 wheneve a is a subset of nodes of G that is not complete, (2) in the sense that not all pais of nodes in a ae joined by an edge. It can be poved (Daoch et al., 1980; Lauitzen, 1996) that if p has the fom given by (1) and (2), then p satisfies

450 A. Gottad et al. Scand J Statist 38 X v X w X V \{v, w}, fo all vw E. (3) This is called the paiwise Makov popety. In wods, the vaiables coesponding to each missing edge in the gaph ae conditionally independent given the emaining vaiables. Identifying the geneating class of a gaphical log-linea model M(G) defined by constaints (2) by the list of the highe-ode inteaction paametes, it can be veified that this class is fomed by the cliques of the gaph G. We next pesent a simple example of a gaphical log-linea model. Example 1. Fo a thee-way contingency table with a geneic cell having indices (i, j, k), the satuated log-linea model (using supescipts to identify the vaiables) is: log μ k = λ + λ 1 i + λ 2 j + λ 3 k + λ 12 + λ 13 ik + λ 23 jk + λ 123 k. The satuated model coesponds to a complete gaph with thee nodes. Fo the gaph obtained by deleting the edge 13 fom the complete gaph (see the gaph late in this aticle in Fig. 3A), the associated gaphical log-linea model satisfies λ 123 k = 0 and λ13 ik = 0, fo all i, j, k. The esulting model is equivalent to the conditional independence model, X 1 X 3 X 2. 2.2. QS inteaction stuctue When some vaiables in vecto X ae homologous, log-linea models can add appopiate constaints fo paametes, such as exchangeability fo associations and inteactions. This esults in models that combine conditional independence between some pais of vaiables with symmety estictions fo the inteaction tems within othe sets of vaiables. QS models ae a subclass of log-linea models, specified fo contingency tables with homologous vaiables, defined by equality constaints on highe-ode inteaction paametes. This class of models has been defined by Caussinus (1966), also by Bishop et al. (1975) fo up to thee-dimensional tables, and by Bhapka & Daoch (1990) fo tables of geneal ode. With two homologous vaiables, the model has a symmetic inteaction tem. With thee homologous vaiables (X 1, X 2, X 3 ) foming an I I I contingency table, the odinay QS model is: log μ k = λ + λ 1 i + λ 2 j + λ 3 k + λ 12 + λ 13 ik + λ 23 jk + λ 123 k, (4a) λ 12 = λ 12 = λ 13 = λ 13, fo all i, j = 1,..., I, (4b) λ 123 k = λ123 pem(k) fo all i, j, k = 1,..., I, (4c) whee pem(k) denotes any pemutation of the set of indices in the agument. Unlike the complete symmety model, this model does not impose estictions on the single-facto tems, the consequence being that the obseved and fitted values ae identical in the one-dimensional magins of the table. Restictions such as (4b) and (4c) teat all vaiables alike, and do not allow only a subset of vaiables to be conditionally independent. In this aticle, we genealize this by taking a gaphical log-linea model and then imposing QS estictions on selected non-zeo paametes involving homologous vaiables. The following examples pesent possible genealizations of the QS model. Each poposed model is epesented by a gaph, whose chaacteistics will be descibed in the next section.

Scand J Statist 38 Gaphical quasi-symmetic models 451 Example 2. Suppose X 1, X 2 and X 3 ae homologous vaiables and that X 1 X 3 X 2. Then, we could conside an extended QS model log μ k = λ + λ 1 i + λ 2 j + λ 3 k + λ 12 + λ 23 jk, (5a) λ 12 = λ 12 fo all i, j = 1,..., I. (5b) This model is epesented by Fig. 1A, whee the QS constaints ae epesented by colous in the edges. Example 3. Conside fou vaiables, X i, i = 1,..., 4, such that the fist thee vaiables ae homologous with I levels and X 4 is non-homologous with J levels. Conside the conditional independence model, X 4 (X 1, X 2 ) X 3, specified by the gaphical log-linea model log μ kl = λ + λ 1 i + λ 2 j + λ 3 k + λ 4 l + λ 12 + λ 13 ik + λ 23 jk + λ 34 kl + λ 123 k, (6) which has two cliques, [123][34]. A simplification of this model uses QS stuctue fo the thee homologous vaiables, with the constaints defined by (4b) and (4c). This model is epesented by Fig. 1B. Sometimes it is convenient to elax some of the estictions of the standad QS model by emoving the assumption of homogeneity of the inteactions of the same ode. The following example pesents such a model. Example 4. When the data ae not compatible with the standad QS estictions (4b) and (4c), less sevee estictions impose symmety on the aay of thee-facto inteactions and on the sepaate two-facto inteactions, but without imposing homogeneity fo the twofacto tems, λ 12 = λ 12, λ 13 = λ 13, λ 23, fo all i, j = 1,..., I, (7a) λ 123 k = λ123 pem(k), fo all i, j, k = 1,..., I. (7b) Then, each pai of vaiables is conditionally QS of diffeent foms. This model is epesented by Fig. 1C, in the context of anothe special case of model (6), but one that is moe geneal than the model epesented by Fig. 1B. A 2 B 2 1 3 1 3 4 C 2 D 1 4 b g g 1 3 4 2 3 Fig. 1. Edge-coloued gaphs fo: (A) example 2, (B) example 3, (C) example 4 and (D) example 5.

452 A. Gottad et al. Scand J Statist 38 Some applications have moe than one subgoup of homologous vaiables. Fo such cases, thee ae othe possible extensions of QS models. Example 5. Conside a I 2 J 2 table fo two pais of homologous vaiables, X 1 and X 2 with I levels, and X 3 and X 4 with J levels. One possible log-linea gaphical model is: log μ kl = λ + λ 1 i + λ 2 j + λ 3 k + λ 4 l + λ 12 + λ 23 jk + λ 34 kl with geneatos [12][23][34], specifying the conditional independencies (8a) (X 1, X 2 ) X 4 X 3, (X 3, X 4 ) X 1 X 2. If each pai of homologous vaiables is QS, we can impose the estictions λ 12 = λ 12, λ 34 kl = λ34 lk, fo all i, j = 1,..., I, k, l = 1,..., J. (8b) This model is epesented by Fig. 1D. The models in these examples belong to a special kind of gaphical log-linea model defined by imposing both conditional independence constaints and equality and quasisymmety constaints on inteaction paametes. Each model has a elated undiected gaph encoding the conditional independencies, but without some modification the usual gaph does not potay all the details of the model. In fact, as we shall see, the pesence of the equality constaints patly modifies the statistical popeties of the gaphical model. It is theefoe convenient to enich the intepetation by giving a special gaphical code to edges affected by the QS constaints. We popose hee to use the class of coloued gaphs ecently utilized by Højsgaad & Lauitzen (2008) fo Gaussian undiected gaph models with constaints on the paametes. The following section will give the geneal definition of this new class of gaphical models and of thei gaphical epesentation. 2.3. QS gaphical models We define next a class of discete gaphical models with equality constaints on paticula subsets of inteaction paametes. The models apply to situations in which the vaiables can be patitioned into subsets of homologous vaiables and possibly a subset of nonhomologous vaiables. This class of models is defined by a gaph with coloued edges, whee edges with the same colou coespond to inteaction paametes constained to be equal. In this aticle, by a coloued gaph, we mean a tiplet G = (V, E, E ) whee V is a set of nodes, E is a set of unodeed pais of nodes, vw = (v, w), v, w V and E is a patition of E into disjoint colou classes E 0,..., E s, each collecting edges with the same colou. Conventionally, we assign black edges to E 0. The coloued edges could also be maked by specific symbols (such as the fist lette of a colou name) to facilitate the eading in black-and-white pints. We define the skeleton gaph G of a coloued gaph G to be the undiected gaph obtained by eplacing all colous by black. Moeove, we call the induced coloued subgaph G col a of a coloued gaph G = (V, E, E ), a V, the subgaph induced by a afte deleting all the black edges. We denote by col(a) the set of colous of the edges in Ga col. Note that black is not included in the set of colous. Theefoe, col(v ) is the set of colous which ae pesent in the gaph G, whose cadinality is the numbe of colou classes minus 1. Definition 1. A QS gaphical model with associated coloued gaph G = (V, E, E ) is a log-linea model fo the discete joint pobability distibution p(ι), with paametes λ a (ι a ) fo a V, such that:

Scand J Statist 38 Gaphical quasi-symmetic models 453 (i) the model satisfies the conditional independence constaints of the gaphical model associated with the skeleton gaph G, that is, λ a (ι a ) = 0 fo all subsets not complete in G ; (ii) fo each edge vw in colou class E k with k 0, λ vw = λ vw, fo all i, j = 1,..., I and all non-zeo highe-facto paametes λ a (ι a ) involving both v and w, with b = a \ {v, w} satisfy the QS equality constaints λ a (i v, i w, i b ) = λ a (i w, i v, i b ); (iii) Fo all complete subsets C 1,..., C k(s) in s 2 nodes with the same edge colou, the inteaction paametes of ode s satisfy the equality constaints λ C 1 (ι C1 ) = = λ C k(s) (ι Ck(s) ). Theefoe, equality constaints ae imposed fo all inteactions associated with complete subsets with the same colou in s = 2, 3,... nodes. To achieve identifiability, model paametes λ a (ι a ) ae constained to sum to 0 when summed ove any index. Vaiables associated with a coloued edge should be homologous, to have sensible QS constaints. Thus, if the set of nodes V is patitioned into subsets V 0, V 1,..., V m, whee V 1,..., V m efe each to a set of homologous vaiables, and V 0 efes to non-homologous vaiables, an edge vw may be coloued (i.e. not black) only if v, w ae in the same class V fo some 0. In addition, note that a black edge in a coloued gaph denotes a set of inteaction paametes with no constaints. Fo this eason, we sepaate the class E 0 of the black edges fom the othe colou classes and do not conside black as a colou. Notice that in QS gaphical models the colou of edges has a double meaning, epesenting both intenal and extenal constaints. The intenal constaints ae symmety constaints within an inteaction paamete [see definition 1(ii)]. The extenal constaints ae equality constaints, among sets of inteaction paametes [see definition 1(iii)]. In Højsgaad & Lauitzen (2008), instead, colous of edges imply exclusively extenal constaints, that is, equality between paametes, and each colou has to be shaed by at least two edges. This diffeence of semantics eflects the fact that association paametes in Gaussian gaphical models cannot be asymmetic. We next illustate definition 1 by examples, some of which continue those intoduced in the pevious section. Example 6 (example 2 continued). Conside the gaphical QS model associated with the gaph of Fig. 1A, with set of edges E ={12, 23}. By definition 1(i), it satisfies the conditional independence X 1 X 3 X 2, because of the missing edge 23. The two coloued edges 12 and 23 imply, by definition 1(ii), sepaate QS constaints on the two-facto inteaction tems: λ 12 = λ 12 and λ 23. Moeove, the fact that the two edges have the same colou implies, by definition 1(iii), that such inteactions ae equal, thus giving the constaints in (5b). Example 7. Conside an edge-coloued gaph coesponding to a QS gaphical model, having a complete subgaph in thee nodes with edges 12 and 23 coloued (i.e. not black). Then, the associated QS gaphical model has the constaints λ 12 = λ 12 and λ 23.Bythe second pat of definition 1(ii), the thee-facto tems must then satisfy

454 A. Gottad et al. Scand J Statist 38 λ 123 k = λ123 k and λ 123 k = λ123 ikj. Howeve, these two sets of constaints imply also λ k = λ jki and λ k = λ k. Thus, the aay λ k is symmetic with espect to all possible pemutations: λ k = λ pem(k). This esult suggests poposition 1. Poposition 1. Let M QS (G) be a QS gaphical model associated with a coloued gaph G. Let G c be a complete subgaph of G, with c V and c > 1. Let Gc col be the induced coloued subgaph of G c obtained by deleting all the black edges. Then, if Gc col is connected, the QS gaphical model M QS (G) contains the paametes λ c (ι c ) and this tem is fully symmetic with espect to all the indices, that is, λ c (ι c ) = λ c (pem(ι c )). Poof. Because of definition 1(ii), evey QS constaint coesponds to a tansposition to the indices of the inteaction tem λ c (ι c ). As Gc col is connected, evey node in c is connected to at least anothe node in c by a coloued edge, so that thee is at least one QS constaint imposed on λ c (ι c ) fo each index in ι c. Thus, each possible pemutation of the set ι c can be geneated by applying at least one sequence of tanspositions to the indices. Theefoe, the entie set of QS constaints is λ c (ι c ) = λ c (pem(ι c )). Notice that the equality λ c (ι c ) = λ c (pem(ι c )) is tivially tue wheneve G c is not complete, as all the inteaction paametes vanish. Example 8 (example 3 continued). The QS gaphical model fo this example is associated with the gaph potayed in Fig. 1B. Hee the set of nodes is V = {1, 2, 3, 4}, and, in the coesponding skeleton gaph, node 3 sepaates node 4 fom nodes 1 and 2. As a consequence, by definition 1(i), X 4 (X 1, X 2 ) X 3. The set of edges E is patitioned into two colou classes, {E 0, E 1 }, with E 0 = {34} and E 1 = {12, 13, 23}. Accoding to definition 1(ii), QS constaints ae imposed on the two-facto inteaction tems, while by definition 1(iii), λ 12 = λ 12 = λ 13 = λ 13. Moeove, poposition 1 implies that λ 123 k = λ123 pem(k), as in (7b). The black edge 34 indicates that the coesponding inteaction tems ae not involved in any QS constaint and ae set fee. Example 9 (example 4 continued). The QS gaphical model illustated in this example, in Fig. 1C, has the same skeleton gaph as the pevious model. Consequently by definition 1(i) the two models have the same conditional independence stuctue. The set of edges E is patitioned into fou colou classes, with E 0 ={34}, E 1 ={12}, E 2 ={13} and E 3 ={23}, so that each edge has a diffeent colou. By definition 1(ii), QS constaints ae sepaately imposed on the two-facto inteaction tems λ a = λa, with a = 12, 13, 23. The colous on these thee edges act also on the thee-facto tems, which satisfy the constaints in (4c) because of poposition 1. Note that this situation, with colou classes containing only single edges, is effective only when I > 2. This agees with odinay QS models being of inteest fo I I tables only when I > 2 (Bishop et al., 1975, p. 281). Example 10 (example 5 continued). The model pesented fo this case is epesented by the gaph in Fig. 1D. This gaph has E ={E 0, E 1, E 2 }, with E 0 ={23}, E 1 ={12} and E 2 = {34}, coesponding to the constaints in (8b).

Scand J Statist 38 Gaphical quasi-symmetic models 455 2.4. Paamete coding fo identifiability and sensible models In fitting standad log-linea models (i.e. without equality constaints), to ensue identifiability it is common in the model matix to use sum-to-zeo constaints on the paametes o else efeence-level coding by which a paticula level of each facto (usually, eithe the fist level o the last level) is the efeence categoy and has value 0 fo paametes at that index level. The choice is unimpotant, because the model-fitted values and the estimates of elevant paametes (such as odds atios) ae the same fo each choice. Fo QS gaphical models, howeve, in definition 1 we specified that the paametes satisfy sum-to-zeo constaints. This is because fo cetain models, diffeent foms of efeence coding do not satisfy a cetain invaiance popety and do not yield equivalent models to those with sum-to-zeo constaints but athe can esult in non-sensible models. Specifically, by the natue of the equality constaints on the paametes in QS gaphical models and the natue of the sum-to-zeo constaints fo identifiability, any QS gaphical model is invaiant to pemutations of categoies fo all vaiables that ae homologous. This is a basic popety of any sensible model fo nominal-scale vaiables (egadless of whethe vaiables ae homologous), and it applies fo any of the standad coding methods fo odinay log-linea models and fo standad QS models fo a set of vaiables in one clique fo which all edges have the same colou. Howeve, fo cetain models having the same colou in diffeent cliques o a common colou fo only some of the vaiables in a clique, this popety does not hold with efeence-level coding. Fo this eason, definition 1 fo QS gaphical models does not pemit such coding, as in some cases the entie meaning of the model would then depend on the paamete coding. To illustate, conside the model fo a fou-way coss-classification of binay vaiables, log μ hk = λ + λ 1 h + λ 2 i + λ 3 j + λ 4 k + λ 12 hi + λ 13 hj + λ 23 + λ 123 h + λ 34 jk, in which all edges have the same colou. (See model 3 in Table 3 fo the gaph of this case, consideed in an example discussed late.) In paticula, λ 12 pem(ab) = λ13 pem(ab) = λ23 pem(ab) = λ34 pem(ab), and we denote its common value by λ min(a, b), max(a, b). The conditional log odds atio between vaiables 3 and 4 at levels h and i of vaiables 1 and 2 equals (λ 11 + λ 22 2λ 12 ). This equals 4λ 11 with zeo-sum coding and λ 11 with (1, 0) efeence-level coding that sets paametes equal to 0 at the second level of a vaiable. We now compae this with the conditional log odds atio between vaiables 1 and 2 at levels j and k of vaiables 3 and 4, which equals [ ] μ11jk μ 22jk log = (λ 11 + λ 22 2λ 12 ) + (λ 11j 2λ 12j + λ 22j ). μ 12jk μ 21jk With zeo-sum coding, this equals 4(λ 11 + λ 11j ), which is 4(λ 11 + λ 111 ) when j = 1 and 4(λ 11 λ 111 ) when j = 2. In contast, with (1, 0) efeence-level coding, this equals (λ 11 + λ 111 ) when j = 1 and λ 11 when j = 2. That is, this coding constains the conditional log odds atio between vaiables 1 and 2 to equal the log odds atio between vaiables 3 and 4 when j = 2 but not when j = 1. Similaly, the (0, 1) efeence-level coding constains the conditional log odds atio between vaiables 1 and 2 to equal the log odds atio between vaiables 3 and 4 when j = 1 but not when j = 2. Because of this, the fit and statistical significance of vaious tems then depends on the coding used and is not invaiant to a like pemutation of categoies fo all the vaiables. In pactice, the use of a common colou with diffeent cliques would nomally be consideed mainly when cliques have the same case, such as the chain gaph in Fig. 1A. Howeve, this example shows that in moe complex models, using efeence-level coding can impose constaints unde which the model would not be consideed scientifically sensible. Anothe

456 A. Gottad et al. Scand J Statist 38 such example is the model with geneatos [123][234] and with all edges having the same colou, in which the 23 elationship is involved in two highe-ode tems. Using efeencelevel coding constains some of the 23 conditional odds atios to equal 12 and 34 odds atios and some to diffe, and the ones that ae equal diffe accoding to which categoy is the efeence categoy. In contast, in each of these models the sum-to-zeo constaints impose a sensible elationship among the odds atios and the fit is invaiant when categoies of all vaiables ae pemuted in the same way. It is an inteesting question fo futue eseach to detemine the class of QS gaphical models that, when instead used with efeence-level coding, would have identical fit to the QS gaphical model as we have defined it with sum-to-zeo constaints. One case in which it does seem to happen is fo a esticted class of such models in which all the a-facto inteactions with a > 2 equal zeo. 3. Fitting and selecting QS gaphical models 3.1. Model fitting In geneal no closed-fom expession of the ML estimates can be found fo QS gaphical models. Nevetheless, ML estimates can be easily computed by any package fo genealized linea models, by using a log link function assuming independent Poisson counts and modifying the model matix to epesent the assumed paamete constaints. Both analyses in section 5 wee pefomed using the softwae R (R Development Coe Team, 2009). In pactice, the model matix fo a cetain QS gaphical model M QS (G) can be constucted by suitably simplifying the model matix fo the satuated unconstained model, to satisfy the elements of definition 1. The satuated model is an odinay log-linea associated to a skeleton (black) complete gaph and can be witten as log μ(ι) = Zλ, whee Z is the model matix and λ is the set of log-linea paametes. Columns coesponding to zeo paametes because of missing edges in the skeleton gaph G must be deleted [definition 1(i)]. Columns coesponding to paametes constained to be equal, which coespond to coloued edges in G, must be summed to obtain new columns [definition 1(ii)]. Columns coesponding to paametes that efe to edges shaing the same colou must be summed to obtain new columns [definition 1(iii)]. The esulting model can be then witten as log μ(ι) = X λ = ZQλ, in which Q is a matix opeato summing up columns of Z accoding to paamete constaints. The ank of the esulting model matix X coesponds to the numbe of distinct paametes in the model. Similaly as fo odinay log-linea models, the likelihood equations equate the obseved counts to the expected counts, though μ X = n X, o, equivalently, μ ZQ = n ZQ. The deviance has a lage sample-appoximate chi-squaed distibution with degees of feedom equal to the diffeence between the ank of the satuated model matix Z and the ank of the educed model matix X. 3.2. Compaing models and model selection As in othe contexts, any paticula data set may be descibed well by seveal gaphical loglinea models. In some cases, existing theoy often suggests a paticula model, with the eseach study pehaps focusing on whethe the model needs a cetain inteaction o conditional dependence tem. In exploatoy studies, it is often useful to conduct a model selection pocedue to find a simple model that adequately descibes the data and to suggest questions fo futue eseach. As in odinay gaphical log-linea modelling, in eithe case such analyses may compae candidate models that diffe by the pesence/absence of some edges.

Scand J Statist 38 Gaphical quasi-symmetic models 457 In eithe type of study, once a tentative woking model of standad gaphical fom is selected, one can analyse whethe tems involving homologous vaiables can be simplified to a elevant QS gaphical model. As with odinay gaphical log-linea models, thee may be moe than one acceptable model. Intepetation is often simplified, as shown in the fist example in section 5, if we can emove at least one of the highe-ode inteactions, even if the esulting model is no longe gaphical. In a model selection pocess, typically the likelihood-atio test statistic is used to compae a model M(G 1 ) with a educed model M(G 2 ), whee G 1 =(V, E 1 ) and G 2 =(V, E 2 )ae such that E 2 E 1. In compaing QS gaphical models, howeve, caeful consideation must be given to whethe two models ae appopiately nested (one a special case of the othe), because one model may esult fom two kinds of linea constaints on the othe model: (i) a zeo constaint, when an edge is emoved; and (ii) an equality constaint, when an edge is taken to be coloued o cetain edges ae constained to have the same colou. A educed model esults wheneve one of the following actions is pefomed to a model: (a) a black edge is emoved (b) a black edge is coloued (c) all the edges of a same colou ae emoved (d) two o moe colou classes ae joined. Case (a) coesponds to imposing a zeo constaint on a set of paametes; case (b) coesponds to imposing equality constaints on a set of paametes, while case (c) coesponds to imposing a zeo constaint on a set of paametes that wee aleady constained to be equal. Finally, in case (d) two o moe sets of paametes, aleady equal inside the sets, ae fixed to be equal. Fo instance, in Fig. 2, gaph B coesponds to a educed model of gaph A but not of gaph C. Indeed, the models associated with gaphs B and C have the same numbe of paametes. 4. Popeties of QS gaphical models The intepetation of QS gaphical models is aided by the fact that they satisfy the Makov popeties of the skeleton gaph, accoding to definition 1(i). Howeve, thei behaviou egading decomposability and collapsibility equies consideation. In odinay gaphical log-linea models, it has been shown (see Asmussen & Edwads, 1983) that, given a decomposition a and b ofagaphg (as defined in the Appendix), μ(ι) = μ(ι a)μ(ι b ), (9a) μ(ι a b ) whee μ(ι a ) is the expected count in cell ι a of the maginal table I a. Equivalently, log μ(ι) = log μ(ι a ) + log μ(ι b ) log μ(ι a b ). Moeove, also the ML estimates, the deviance and the degees of feedom admit a simila decomposition, so that infeence fo a gaphical log-linea model M(G) can be based on the thee lowe-dimensional models on the subgaphs induced by the decompositions: M a (G a ), M b (G b ) and M a b (G a b ). In tems of estimated expected counts, ˆμ(ι) = ˆμ a(ι a )ˆμ b (ι b ) ˆμ a b (ι a b ), (9b) A B C 1 2 3 1 2 3 1 2 3 Fig. 2. Examples of undiected gaphs.

458 A. Gottad et al. Scand J Statist 38 whee ˆμ a (ι a ) is the ML estimate of μ(ι a ) obtained by fitting the lowe-dimensional model M a (G a ) to the maginal table I a. Such a decomposition in the ML estimates happens with QS gaphical models only in some special cases. Because of the equality constaints, the factoization of the joint distibution with espect to the vaiables in a model does not always coespond to a consequent paamete-based factoization of the likelihood function (Cox & Wemuth, 1999), and the ML estimates cannot be obtained by fitting sepaate lowedimensional models. 4.1. Coloued decomposition fo QS gaphical models We intoduce next a definition of decomposition fo QS gaphical models. Definition 2. Two subsets a and b of an edge-coloued gaph G = (V, E, E ) fom a coloued decomposition of V elative to a QS gaphical model M(G) if (i) a and b fom a decomposition fo the skeleton gaph G, and (ii) col(a) col(b) =. This definition adds a futhe condition to the common definition of decomposition to avoid the case of QS constaints coupling paametes fom diffeent pats of the decomposition. Notice that, because of this added condition, the subgaph induced by a b can have only black edges. Fo QS gaphical models satisfying this new condition fo decomposition, poposition 2 holds. Poposition 2. Suppose two subsets a and b of an edge-coloued gaph G =(V, E, E ) fom a coloued decomposition of V elative to a QS gaphical model M QS (G). Then log ˆμ(ι) = log ˆμ a (ι a ) + log ˆμ b (ι b ) log ˆμ a b (ι a b ) and log ˆμ(ι a ) = log ˆμ a (ι a ), whee ˆμ a (ι a ) denotes the ML estimates of μ(ι a ) obtained fo the maginal model M QS a (G a ). Poof. As a and b fom a coloued decomposition of V in the coloued gaph G, they also fom a decomposition in the skeleton gaph G that coesponds to the odinay gaphical log-linea model M(G ), which is the unconstained vesion of M QS (G). Denoting c = a b, accoding to Asmussen & Edwads (1983), the ML estimates fo M(G ) can be achieved by maximizing sepaately the thee tems of the decomposition, M a (G a), M b (G b ) and M c(g c ), each tem depending on the obseved counts of sepaate maginal tables. The QS model M QS (G) is obtained by adding constaints accoding to the colous of the edge in the coloued gaph G. Asa and b fom a coloued decomposition, colous ae added sepaately to the subgaphs G a\c and G b\c. No coloued edge in a (excluding the black edges) has the same colou as an edge in b. Consequently, no equality constaints couple paametes egading a with those in b. Theefoe, the sufficient statistics fo the constained paametes concening a pat of the decomposition, say G a, depend only on the sufficient statistics of the coesponding unconstained model, say M a (G a). The subgaph G c is complete and not coloued, so the model equates the fitted maginal table to the coesponding obseved maginal table. Then, the ML estimates can be obtained by sepaately maximizing the thee tems of the decompositions, M a (G a ), M b (G b ) and M c (G c ), as each tem depends only on the obseved counts of the sepaate maginal tables, as in the unconstained model.

Scand J Statist 38 Gaphical quasi-symmetic models 459 Coloued decompositions ae useful fo QS gaphical models, as they imply that the model and its fit can be obtained fom sepaate lowe-dimensional models. Example 11 illustates the use of decomposition in fitting. Example 11. Conside Fig. 3C; the gaph has V ={1, 2, 3} and E ={12, 23}. The two edges ae coloued by diffeent colous, so that the coesponding model is log μ k = λ + λ 1 i + λ 2 j + λ 3 k + λ 12 + λ 23 jk, with QS constaints λ 12 = λ 12 and λ 23. Its skeleton gaph, in Fig. 3A, is decomposable into its cliques [12][23]. Thus, in the odinay model epesented by the skeleton gaph, log μ k = log μ. + log μ.jk log μ.j., the dot denoting summation with espect to the index. This is tue also fo the QS gaphical model, in the sense that it can be specified by the decomposition of the model fomula associated with the skeleton gaph, and by two diffeent QS models fo the magins [12] and [23], that is, log μ. = α + α 1 i + α 2 j + α 12, α 12 = α 12 log μ.jk = γ + γ 2 j + γ 3 k + γ 23 jk, γ 23 jk = γ23 kj log μ.j. = ξ + ξ 2 j. Hee, λ = α + γ ξ and λ 2 j = α2 j + γ 2 j ξ 2 j, while the othe paametes coincide (e.g. α 1 i = λ1 i ). The same decomposition holds fo the ML estimates, so that, using the notation λ a fo the vecto of the components λ a (ι a ): ˆλ = ˆα + ˆγ ˆξ, ˆλ 1 = ˆα 1, ˆλ 3 = ˆγ 3, ˆλ 2 = ˆα 2 + ˆγ 2 ˆξ 2, ˆλ 12 = ˆα 12, ˆλ 23 = ˆγ 23. The possibility of obtaining the ML estimates by combining sepaate lowe-dimensional models on the cliques and sepaatos depends howeve on the diffeent pats of the decomposition having diffeent colous, as we can see in example 12. Example 12. Conside the QS gaphical model coesponding to the gaph in Fig. 3B. The model fomula is log μ k = λ + λ 1 i + λ 2 j + λ 3 k + λ 12 + λ 23 jk, with λ 12 = λ 12. Hee the ML estimates cannot be obtained fom the maginal tables as the QS constaints couple some paametes in the two diffeent QS models fo the magins [12] and [23]. In fact, the sufficient statistics fo this model ae {n i.. }, {n.i. }, {n..i } and {n. + n. + n. + n. } (see Bishop et al., 1975, sections 8.2 8.3), whee n k denotes the obseved count fo the cell (i, j, k). The ML estimates can be obtained by equating the sufficient statistics to thei expectation unde the imposed constaints. Theefoe, the likelihood equations fo λ 12 ae: ˆμ. + ˆμ. + ˆμ. + ˆμ. = n. + n. + n. + n. i < j = 1,...I depending both on the maginal tables [12] and [23]. A B C b 1 2 3 1 2 3 1 2 3 Fig. 3. (A) An undiected gaph; (B) a coloued gaph non-decomposable in cliques and colous; (C) a coloued gaph decomposable in cliques and colous.

460 A. Gottad et al. Scand J Statist 38 4.2. Collapsibility fo QS gaphical models Sometimes it is of inteest to educe a lage contingency table to a smalle one, focusing on a subset of vaiables of pimay inteest if the emaining vaiables have no effect on the inteaction stuctue among those pimay vaiables. Howeve, only in cetain cases is such inteaction stuctue not affected by collapsing ove othe vaiables. Diffeent definitions of collapsibility have been given in the liteatue to descibe the popety of having the maginal distibution of a subset of vaiables the same as in a lage model. See, fo example, Whittake (1990). Conside fist the case of paametic collapsibility with espect to log-linea paametes. Accoding to Whittemoe (1978), a discete pobability distibution p(ι) iscollapsible onto a maginal distibution p M (ι M )fom V, with espect to the log-linea inteaction paamete λ A, with A M, ifλ A coincides with the maginal log-linea paamete λ A M obtained fom p M (ι M ). Let now a, b fom a decomposition of the skeleton gaph G of a QS gaphical model with an edge-coloued gaph G =(V, E, E ). Fo A = a \ b, B = b \ a and C = a b, wehave the conditional independence X A X B X C. This conditional independence by Whittemoe (1978, coollay 2) is a sufficient condition fo log-linea paametic collapsibility onto a, λ A = λ A a fo all non-empty A A. These equalities ae peseved also by the QS gaphical model, because the added constaints do not affect the conditional independence stuctue, but involve only futhe equality estictions among non-zeo inteaction paametes. The pevious discussion ensues that afte maginalization, in the pesence of a decomposition, thee is paametic collapsibility with espect to selected paametes. Theefoe, QS gaphical models behave like standad log-linea gaphical models as fa as paametic collapsibility is concened. Two othe concepts used in gaphical models ae model collapsibility and estimate collapsibility. These concepts have been discussed by Asmussen & Edwads (1983) who also showed thei equivalence in the case of hieachical (and thus also gaphical) log-linea models. A hieachical log-linea model M(G) defined on V is model collapsible onto a subset a of V wheneve fo all μ(ι) M(G), μ(ι a ) M a (G a ), that is, the model fo the maginal distibution of a does not diffe fom the one epesented by the induced subgaph G a.on the othe hand, we say that thee is estimate collapsibility if the ML estimates fo the maginal table I a in M(G) coincide with the ML estimates fo the maginal table I a in the maginal model M a (G a ), that is ˆμ(ι a ) = ˆμ a (ι a ), fo all ι a. In the case of QS gaphical models, model collapsibility is implied by paametic collapsibility, but, by simila aguments as given fo decompositions in the pevious section, the equivalence between model collapsibility and estimate collapsibility does not occu in finite samples. To have estimate collapsibility we need the futhe condition that the QS constaints fo the set we ae collapsing onto must be disjoint fom the QS constaints fo the complementay set. Poposition 3. Let a be a subset of V in a QS gaphical model M QS (G), G =(V, E, E ). Fo a c = V \a, let cl(a c ) denote the closue of a c. A necessay and sufficient condition fo estimating collapsibility onto a is that (i) the bounday of evey connected component of a c is complete, and (ii) col(a) col(cl(a c )) =. Poof. Sufficiency: Let b = cl(a c ). Then, a and b fom a decomposition, with a b = bd(a c ) complete and sepaating a \ b fom b \ a. Moeove, condition (ii) implies that this

Scand J Statist 38 Gaphical quasi-symmetic models 461 is a coloued decomposition. Thus, the esult follows because, by poposition 2, we have ˆμ(ι a ) = ˆμ a (ι a ). Necessity: Suppose that M QS (G) is collapsible onto a in its estimates. Then, it is also model collapsible and condition (i) necessaily holds (Asmussen & Edwads, 1983, theoem 2.3). Suppose now that condition (i) holds, while condition (ii) is false. As col(a) col(cl(a c )), some constaints involve both elements of a and cl(a c ). Then, some of the likelihood equations fo model M QS (G) involve both sufficient statistics and obseved magins in a and in cl(a c ). This esults, in geneal, in estimates that diffe fom those deived fom the likelihood equations fo model M QS a (G a ), which ae based only on the sufficient statistics and obseved magins in a, unless the set of constaints is empty. This completes the poof. To illustate the concept of collapsibility, we etun to example 11. Example 13 (example 11 continued). Let us conside again the gaph in Fig. 3C. Maginal analysis on the collapsed tables can be conducted both fo the magins [12] and [23]. Fo instance, conside the magin [12]. The set of nodes can be patitioned into a ={1, 2} and a c ={3}. The closue cl(a c ) consists in the clique [23], which is complete. Moeove, the colou of the edge 12 (the only edge in G a ) is ed, while the colou of 23 is blue. As a consequence, both conditions of poposition 3 ae fulfilled and the model can be collapsed onto the maginal table [12]. Simila aguments ae valid fo magin [23]. These conditions would not be satisfied if the gaph had both edges of the same colou, as, fo example, in Fig. 3B. Notice that, as a consequence of condition (ii) in poposition 3, nodes in the bounday of a c can be connected only by black edges. In paticula, the bounday of a c belongs to a, so that a coloued edge between two nodes in the bounday implies that col(a) col(cl(a c )). 5. Examples In the following two sections, the two data sets intoduced in section 1 ae analysed with QS gaphical models. 5.1. Opinions about abotion and the death penalty Table 1 concens attitudes of 3218 US citizens who gave valid esponses to fou vaiables consideed: attitude towads death penalty (P), and attitude towads abotion if thee is a stong Table 3. Models compaison fo data on attitudes towads abotion and the death penalty No. of Model Gaph paametes df Deviance Model 1 D 10 6 6.65 H R P Model 2 Model 3 D 8 8 8.80 H R P D 7 9 304.2 H R P

462 A. Gottad et al. Scand J Statist 38 chance of a seious defect in the baby (D), if the woman s health is seiously endangeed (H) and if the pegnancy is a esult of a ape (R). A peliminay analysis using odinay gaphical log-linea models leads to a gaphical model (model 1 in Table 3) by which (D, H) P R: log μ kl = λ + λ H i + λ D j + λ R k + λ P l + λ RP kl + λ HD + λ HR ik + λ DR jk + λ HDR k. (10) This model coesponds to the undiected gaph G 1 = (V, E) in which E contains HR, HD, DR, RP. To summaize the fit of this and othe models, we will use the deviance only as an ovely stingent indication of lack of fit that would povide lowe bounds on actual P-values, because the sampling design fo the Geneal Social Suvey is moe complex than simple andom sampling and the code book (Davis et al., 2009) fo the suvey suggests that sampling vaiances tend to be about 50 pe cent lage than fo simple andom sampling. Fo this model, the deviance is 6.65 with df = 6, suggesting an adequate fit. All fou vaiables ae homologous, but the death penalty opinion is of a diffeent natue fom the abotion items, so it seems sensible to conside simplifying by imposing QS stuctue among the abotion items, leaving the association between P and R to be diffeent. This gives the QS gaphical model, model 2 in Table 3, coesponding to the coloued gaph G 2 =(V, E, E), with E ={E 0, E 1 }, E 0 ={RP} and E 1 ={HR, HD, DR}, fo which λ HD = λ HR = λ DR. This model also fits adequately (deviance = 8.80, df = 8). Fo model 2, achieving identifiability by using zeo-sum constaints, we obtain ML estimate ˆλ 111 = 0.046(SE = 0.031) fo the thee-facto tem. Afte emoving this non-significant inteaction, we obtain a model which is no longe gaphical but simple to intepet. This educed model has a deviance of 11.0 on nine degees of feedom. The ML estimate of the common two-facto tem is then ˆλ 11 = 0.616 (SE = 0.016). Thus, fo those with a given esponse on one of the abotion items, the estimated odds atio between the othe two items is exp[4(0.616)] = 11.7. Fo this model, the estimated odds atio between R and P is only 1.77. As an aside, we mention model 3 in Table 3, which also has exchangeability including the association between R and P. On subject-matte gounds, one would not expect this association to be simila to the othe thee in that model, as P deals with a diffeent subject. That model indeed fits pooly, with a deviance of 304.2 (df = 9). The poo fit eflects the quite diffeent estimated odds atio fo R and P in the model just discussed. 5.2. Degee of satisfaction with US govenment policy In this second example, we analyse the data displayed in Table 2 on the degee of satisfaction on some aspects of US govenment policy about the envionment (E), the unemployment (U) and a decent standad of living fo the old (O), all coss-classified with gende (G). A peliminay analysis with odinay gaphical log-linea models leads to model A, showing an adequate fit (deviance = 20.85, df = 24). The model assumes (E, O) G U, with log μ kl = λ + λ E i + λ O j + λ U k + λ G l + λ UG kl + λ EO + λ EU ik + λ OU jk + λ EOU k. (11) This model ignoes that E, O and U ae homologous. We can so adapt a QS gaphical model adding the following equality constaints to model (11) λ EO = λ EO = λ EU = λ EU = λ OU = λ OU and λ EOU k = λ EOU pem(k) i, j, k = 1,..., I. This includes in the same colou class all the edges in the subgaph G EOU. The so-defined QS gaphical model is model C in Table 4. This model implies a symmetic and common conditional association among the homologous vaiables, given all the othe vaiables, including the non-homologous gende. Model C fits adequately, with a deviance of 43.06 (df = 37). In

Scand J Statist 38 Gaphical quasi-symmetic models 463 Table 4. Models compaison fo data on the degee of satisfaction with US govenment policy No. of Model Gaph paametes df Deviance Model A O 30 24 20.85 E U G Model B Model C O g b 23 31 30.03 E U G O 17 37 43.06 E U G model B we elax the model C assumption of a common conditional association among E, O and U by specifying sepaate QS constaints as follows: λ EO = λ EO, λ EU = λ EU, λ OU = λ OU λ EOU k = λ EOU pem(k), i, j, k = 1,..., I. i, j = 1,..., I The coesponding gaph has thee edges with diffeent colous and a black edge, UG. This model has a deviance of 30.03 with 31 degees of feedom. All thee models seem compatible with the data. All the models in this section and models 1 and 2 in section 5.1 have the two cliques foming a coloued decomposition. The estimates epoted befoe can be obtained by fitting sepaate models, using poposition 2. 6. Discussion We have poposed a class of gaphical log-linea models, called QS gaphical models, in which some inteaction paametes ae constained to be equal. These models have as a special case the QS model fo two-way tables poposed by Caussinus (1966) and its genealizations to highe-way tables by Bishop et al. (1975) and Bhapka & Daoch (1990). Ou model extends to othe stuctues, such as sets of vaiables fo which only some ae homologous o diffeent sets ae homologous. The symmety in the conditional association stuctue is epesented using coloued edges in a conditional independence gaph, in the spiit of Højsgaad & Lauitzen (2008) fo concentation gaphical models. The coloued gaph epesents both the conditional independence stuctue and the QS constaints, so that also the fom of the association is epesented in the gaph. Moeove, the gaph is useful fo detemining the conditions fo decomposability and collapsibility. Specifically, decomposable QS models ae models with a tiangulated stuctue fo the skeleton gaphs that have also cliques not shaing colous. In odinay log-linea models, decomposability ensues that the ML estimates can be calculated fom maginal tables using closed-fom expessions. In QS gaphical models, closed-fom expessions ae not available in the pesence of symmety constaints, but decomposability can still be vey useful to educe model complexity, fo example, in lage gaphs. In geneal, QS gaphical models can be estimated as genealized linea models in which the model matix has a cetain stuctue. The standad Fishe scoing algoithm