COPULA PARAMETERIZATIONS FOR BINARY TABLES. By Mohamad A. Khaled. School of Economics, University of Queensland. March 21, 2014

Size: px

Start display at page:

Download "COPULA PARAMETERIZATIONS FOR BINARY TABLES. By Mohamad A. Khaled. School of Economics, University of Queensland. March 21, 2014"

Roderick Blake
5 years ago
Views:

1 COPULA PARAMETERIZATIONS FOR BINARY TABLES By Mohamad A. Khaled School of Economics, University of Queensland March 21, 2014 There exists a large class of parameterizations for contingency tables, such as marginal log-linear and graphical models. Copula models have only recently started to be widely used for modeling dependence among discrete variables but with no clear connection to the prevalent existing methodologies for contingency tables. This paper develops a rigorous mathematical framework showing some existence and identification criteria linking a sub-class of marginal log-linear models with copula parameterizations in binary contingency tables. Using combinatoric results such as Möbius inversion in lattices, a bijective mapping between the different parameterizations is derived. Several illustrative examples are given as well. 1. Introduction. Copula models have become very popular in statistics and most of its applied fields in the past two decades. Although Sklar s theorem is very general and applies, as a special case, to copulas with noncontinuous margins, it was only recently that copula models with such margins started to be widely used and their full implications in the presence of multiple discrete marginal distributions were carefully studied. Genest and Neŝlehová [2007] constitutes a milestone in that respect. It is especially note- AMS 2000 subject classifications: Primary 60E05; secondary 62E15 Keywords and phrases: contingency tables, copulas, identifiability, Möbius inversion, Parameterizations 1

2 MOHAMAD A. KHALED/COPULA PARAMETERIZATIONS 2 worthy for laying down a careful mathematical setting for the study of such models and for warning against the numerous pitfalls potentially arising in that setting. Several recently published papers studied inference both from the Bayesian and non-bayesian perspectives. Pitt, Chan and Kohn [2006] introduces Bayesian methodologies for the purpose of estimating Gaussian copula models (although the inferential setting is general enough to be extendable to the whole class of elliptical copulas in a straightforward manner). Smith and Khaled [2012] provides a Bayesian perspective for the whole class of copulas with discrete margins (in particular, the paper considers Archimedean, elliptical and D-vines copulas). To cope with combinatoric complexity of maximum likelihood estimation, Panagiotelis, Czado and Joe [2012] introduces algorithms of quadratic complexity for a subclass of models dubbed discrete pair-copulas. In addition to these afore-mentioned articles many other papers consider discrete-margined low-dimensional copulas, especially in the bivariate and trivariate cases. In spite of that flurry of activity, a lot of open questions remain and here are the ones that will be explicitly addressed in the paper. Questions of identifiability are mostly yet to be resolved. Also, discrete-margined copula models have an unclear link to the previous literature on the subject. Considering contingency table models, the exact relationship between marginal log-linear models and copula models are unknown. This includes the classical log-linear model (see for instance a whole-book treatment in Christensen [1990]), the multivariate logistic model of Glonek and McCullagh [1995] and Qaqish and Ivanova [2006] and the whole class of marginal log-linear models

3 MOHAMAD A. KHALED/COPULA PARAMETERIZATIONS 3 in between (see Bergsma and Rudas [2002a], Bergsma and Rudas [2002b], and Bartolucci, Colombi and Forcina [2007]). For a book-length treatment of that exciting field, see Bergsma et al. [2009]. As an analog to the approach in this paper, consider the graphical models approach to contingency tables (see Lauritzen [1996] and Whittaker [1990]). Although already wellestablished, the graphical model parameterizations of contingency tables and their link to marginal log-linear models has only been explored recently (see Drton and Richardson [2008], Forcina, Lupparelli and Marchetti [2010], Rudas, Bergsma and Németh [2010], Evans and Richardson [2011] and Wermuth [2011].) Our paper aims at constructing an analogous discussion on the above subjects by providing some results linking copula parameterizations to marginal models for binary tables. In addition, there seems to be a quite substantial number of pitfalls that could arise from directly applying the methodology of modeling copulas in the continuous case to that in the discrete case. The effect of marginalizing a copula model results in a much more intricate behavior than in the continuous case. The already quite well-established pitfall that ranks-based measures of association such as Kendall s tau or Spearman s rho pertaining to the copula is different from that pertaining to the discrete table (as shown in Genest and Neŝlehová [2007], Nešlehová [2007] or Denuit and Lambert [2005]) is only one facet among many that arise in the discrete-margined copula setting. That classical result is a special case of the characterizations in this paper. The primary objective of the paper is to lay down a rigorous mathematical framework for linking copula models to binary contingency tables, and by

4 MOHAMAD A. KHALED/COPULA PARAMETERIZATIONS 4 doing so, therefore uniting two strands of literature that seemed to be operating independently so far. That is accomplished by the intermediate objective of deriving marginalizations, characterizing parameterizations and showing identification for discrete-margined copula models. Finally, implications of the previous features are illustrated through applications. This includes a discussion of the advantages of the copula approach. For instance, through copulas, it is possible to put non-trivial patterns of sparsity over higherdimensional contingency tables that does not involve putting higher-order interaction terms to zero. In section 2, the general setup of copula modeling and the main aspects of the combinatorial approach used in this paper are sketched in section 2.2. Section 3 provides a framework for identification results. Section 4 introduce the main results about deriving mapping between copulas and marginal probabilities. Several theoretical applications are given in 5, among them that of subsection 5.3, where the apparatus developed so far is applied to the log-odds representation in one of the extrema of the class of marginal log-linear models. Section 6 concludes. 2. General setup. This section presents the general setup of copula models for random binary vectors and introduces some notions and results from enumerative combinatorics that will be used throughout the paper Copulas for binary tables. A copula is the cumulative distribution function of a vector of uniform margins. In particular, let U = (U 1,..., U m ) be an m-dimensional vector where each individual random variable is uniform U j U [0, 1]. Then there exists a function C : [0, 1] m [0, 1] such

5 MOHAMAD A. KHALED/COPULA PARAMETERIZATIONS 5 that (2.1) C (u 1,..., u m ) = Pr [U 1 u 1,..., U m u m ] Using Sklar s theorem (see for instance Nelsen [2006] theorem ), it is possible to use C as a tool to construct multivariate distributions with arbitrary margins. Let F be the joint cumulative distribution function of some random vector Y = (Y 1,..., Y m ) and let F 1,..., F m be its associated marginal distributions, that is Y j F j for j = 1,..., m. According to Sklar s theorem, there always exists a copula function C such that (2.2) F (y 1,..., y m ) = C (F 1 (y 1 ),..., F m (y m )) Let R j be the range of F j for j = 1,..., m. According to Sklar s theorem, C is uniquely defined only on the product set R = R 1 R m [0, 1] m. For binary tables, R j will consist of a two element set R j = {Pr [Y j 0], 1} and therefore the cardinality of R will be R = 2 m. In the upcoming sections, we will be concerned with deriving conditions for determining whether C is identified both from non-parametric and parametric perspective (section 3), how to link marginalizations of C with marginalizations of F for binary tables (section 4) and how to derive copula parametrization for standard binary contingency table models (sections 4 and 5). Due to the finiteness of R in the case of binary contingency tables, we will need some combinatoric tools from lattice theory that we will introduce next.

6 MOHAMAD A. KHALED/COPULA PARAMETERIZATIONS Combinatorial aspects. The main combinatorial structure used here is that of an incidence algebra which is an abstract algebraic structure over a (locally) finite partially ordered set. And the main combinatorial tool used is that of Möbius inversion which dates back to at least Rota [1964]. Möbius inversion (and in general lattice theory, incidence algebras, poset theory...) has a long history in statistics and its combinatorial implications are quite pervasive. For instance, the textbook Constantine [1987], although focused on statistical design theory, illustrates several enumeration problems in probability and statistics where it is used. In McCullagh [1987], it is used in various computations deriving relationships between cumulants and moments. As an additional example, it is a major tool for proving the Hammersley-Clifford theorem used in graphical models (see Lauritzen [1996]) and Markov random fields in general and other spatial lattice processes (Cressie and Wikle [2011]). It is also a major tool in random set theory (Nguyen [2006]) where it is used for computation of densities, distributions and capacity functionals on finite random sets. That was only a short but not exhaustive list of textbooks showing the diversity of its usage. Section 3 serves in introducing notation and summarizing results used for the remainder of the paper. See Stanley [2012] and Bóna [2006] for an excellent introduction to the topic. See also Caspard, Leclerc and Monjardet [2012] for ample background on posets. Let (P, ) be some finite partially ordered set (or poset), that is, P is a finite set and is some partial order on it. As an example, ( 2 M, ) is a poset where 2 M is the power set of a finite set M, that is, it is the partial order induced by inclusion on the set of subsets of M. Let Int (P ) be the set

7 MOHAMAD A. KHALED/COPULA PARAMETERIZATIONS 7 of intervals on (P, ), which are defined by Int (P ) = {[x, y] : x, y P, x y} An interval is always a finite set in this paper. The incidence algebra I (P ) of the poset P is the set of all real functions having domain the intervals of P I (P ) = {f : Int (P ) R}. Multiplication on that algebra I (P ) is defined by f g (x, y) = x z y f (x, z) g (z, y) for all x, y P. The unit element is the delta function δ I (P ) defined by the indicator variable δ (x, y) = 1 (x = y) That is, f I (P ), f δ = δ f = f. A special function on the incidence algebra is the zeta function ζ defined as ζ I (P ) such that ζ (x, y) = 1 (x y) (This is a generalization of the zeta function in number theory, but that poset is countably infinite unlike the one studied here.) Zeta functions have very useful combinatoric properties. One of the most useful is that the inverse of the zeta function is the Möbius function µ I (P ) satisfying µ ζ = ζ µ = δ and defined by 1 if x = y µ (x, y) = x z y µ (x, z) otherwise We will sometimes write µ P instead of µ when there is a need to emphasize the poset in question.

8 MOHAMAD A. KHALED/COPULA PARAMETERIZATIONS 8 The previous indicates that [x, y] I (P ) µ (x, z) = 0 z [x,y] or similarly µ (x, y) = µ (z, y) z (x,y] Proof of the above can be found in Stanley [2012]. The following is a simple lemma used in the remainder of the paper Lemma 1. For ( 2 M, ), and for every S T M T S µ (S, T ) = ( 1) The notorious Möbius inversion formula is as follows. Let f, g : P R. If g (y) = x y f (x) then f (y) = x y g (x) µ (x, y) or, in its alternative dual form. If g (y) = x y f (x) then f (y) = x y g (x) µ (y, x) Specialized to ( 2 M, ), it is given by the following simple lemma.

9 MOHAMAD A. KHALED/COPULA PARAMETERIZATIONS 9 Lemma 2. Let f, g : 2 M R then if g (T ) = S T f (S) then f (T ) = S T T S g (S) ( 1) See sections in Stanley [2012] for a detailed derivation of lemmas 1 and 2. One additional result about product posets is needed. Let (P, 1 ) and (Q, 2 ) be two posets. Define their product as (P Q, ) where P Q is the Cartesian product and is a partial order defined as (p, q) (p, q ) if p 1 p and q 2 q. Lemma 3. The Möbius function of the product of two posets (P, 1 ) and (Q, 2 ) is the product of their individual Möbius functions µ P Q ( (p, q), ( p, q )) = µ P ( p, p ) µ Q ( q, q ) for every p, p P and q, q Q such that p 1 p and q 2 q. For the proof, see proposition in Stanley [2012]. Before turning to immediate applications, it is noteworthy to mention that the computational complexity of Möbius inversion computation for the case of the power set of an m-set could be reduced to O ( (m 1) 2 m 1) instead of the naive O ( 2 2m) if the so-called fast Möbius transform algorithm Kennes and Smets [1990] is used.

10 MOHAMAD A. KHALED/COPULA PARAMETERIZATIONS Identification and characterization. This section discusses some identification and characterization issues for copula parameterizations of binary tables. Although interesting in their own right, our main objective here is to prepare the ground for the material on marginalizations Nonparametric identification and copula uniqueness. The standard way of computing the copula function for a multivariate distribution F with absolutely continuous margins F 1,..., F m is to invert equation 2.2 through (3.1) C (u 1,..., u m ) = F ( F 1 1 (u 1 ),..., F 1 m (u m ) ) Unless u = (u 1,..., u m ) R, the above equation is not well-defined. The purpose of nonparametric identification is to compute the set of functions C : [0, 1] m [0, 1] compatible with a given distribution F on the support of the random variables under study (in this case, the finite set {0, 1} m ). It is possible to deduce from equation 3.1 that C is point-identified on R = R 1 R m [0, 1] m. Therefore, the set of identified copulas is identical to the set of m-increasing functions (see Nelsen [2006] for a definition) with the additional constraints that they are equal to the right-hand side of 3.1 on R and that they satisfy a) C(u) = 0 if at least one component in u is 0 and b) C(u) = u j if all components are 1 except u j. The set of identified points is uncountable 1 and is given by C = Z 0 R where 1 The number of points where the copula is identified is the larger than the cardinality R. When each variable is finite and has k j categories, R = m j=1 kj. In the case where at least one margin is a count variable, the set R will be uncountable. For the purpose of this paper, since we only study binary tables, we will only concern ourselves with the finite case R = 2 m.

11 MOHAMAD A. KHALED/COPULA PARAMETERIZATIONS 11 Z 0 = {u [0, 1] m atleastoneu j = 0} and but the important point is that is not equal to [0, 1] m. Those well-known facts show that copulas with discrete margins are nonparametrically non-identifiable (for further details on nonparametric identification see Matzkin [2007] or Horowitz [2009]). The paper by Carley [2002] seems to be the only one that extensively studies partial identification but the non-sharp bounds for the non-parametrically identified set of possible functional forms is (possibly) much larger than the actual functional set of copulas. Besides the natural boundary conditions of a copula determined on the set Z 0 {1 m } where 1 m is an m-vector of ones, it is immediate that the only substantial information is coming from R and therefore non-parametric identification is an almost hopeless task. The next sub-section shows that the situation is even more drastic and that all the information is coming from a single point in the euclidean space subset [0, 1] m The copula of a multivariate binary model. Let Y be an m dimensional binary vector that has the marginal probabilities p j = Pr [Y j = 1] for j = 1,..., m and the dependence structure given by the copula C. It is possible to describe the dependence structure in a hierarchical fashion. First, some notation needs to be introduced. Let M = {1,..., m} be the set of indices of the margins of Y. Let C S be the marginal distribution obtained from C for a subset of the margins S {1,..., m}. For instance C 123 = C {1,2,3} is the margined copula associated with (Y 1, Y 2, Y 3 ).

12 MOHAMAD A. KHALED/COPULA PARAMETERIZATIONS 12 Theorem 1. Define the vector u = (1 p 1,..., 1 p m ) The copula C (u) (and its margins) is identified at u if and only either u = u or u satisfies one of the natural boundary conditions C(1 m ) = 1 and C(u) = 0 if any element of u is zero. Furthermore the number of unique values identified for C and its margins at u is m m j=2 and there is a one to one mapping between the nonempty and non-singleton subsets of M and those j points. Proof According to Sklar s theorem, the copula function C is identified only on the range of the marginal distributions. Since each one of the margins Y j for j = 1,..., m is a binary random variable, the range of each margin consists only of two points F j (0) = 1 p j and F j (1) = 1. Combining all margins together, it is therefore possible to uniquely write C on 2 m points as illustrated in subsection 3.1. However, F j (1) are special points on the support of copula functions because they yield the boundary conditions leading up to marginalization, i.e. for S M such that S = {j M : u(j) 1} Let S as before be the number of elements of S. It is clear that for any j S, the jth element of u would not be equal to 1 because it would be equal to F j (0), i.e. it would correspond to Y j = 0. Succinctly, the subset of

13 MOHAMAD A. KHALED/COPULA PARAMETERIZATIONS 13 u corresponding to S is a vector of constants u M S = 1 M S and u S = u S where M S is the complement of S in M and 1 k is a k-vector of ones. Then the boundary condition implies C(1 M S, u S ) = C S (u S ) where C S is the margin of the copula C over the elements in S. What remains to be done is to count the number of those sets. We start by showing why that number is smaller than 2 m, the number of all subsets of M. First, when S =, u M\S = 1 m and the copula is equal to one, so we subtract one element. When S = 1, C S (u S ) = u S because the margins of a copula function are uniform, so we subtract m elements since there are m singletons in 2 M. The number of uniquely identified values of C is what remains in 2 M. The number of subsets with given cardinality is a fairly elementary combinatorial identity that is given by the binomial coefficients. Therefore, the total number of identified elements is m m = 2 m (m + 1) j j=2 Example 1. For instance in two dimensions, only C 12 evaluated at (1 p 1, 1 p 2 ) is identified. Below is the picture for m = 4 (the coordinates of

14 MOHAMAD A. KHALED/COPULA PARAMETERIZATIONS 14 u are given by the first row) dimensions 1 1 p 1 1 p 2 1 p 3 1 p 4 2 C 12 C 13 C 14 C 23 C 24 C 34 3 C 123 C 124 C 134 C C 1234 The number of elements in the last three rows (i.e. the number of elements besides the margins) is Parametric identification of binary tables. In the parametric setting, the family of copula functions is indexed by a finite-dimensional parameter θ Θ where Θ is some subset of the p-dimensional euclidean space R p, that is C θ = {C θ : [0, 1] m [0, 1] ; θ Θ} There exists a wide variety of possible choices to pick C θ from. These include all copulas families generated by explicit inversion of known multivariate parametric distributions (such as all the elliptical copulas including Gaussian and t-copulas), the Archimedean or D-vine copulas (see for instance Joe [1997] or Kurowicka and Cooke [2006]). According to theorem 1, the information is completely captured in u. It is therefore possible to consider θ as a mapping θ : [0, 1] 2m 1 R p and identification simplifies to asking the question whether the above mapping is injective or not.

15 MOHAMAD A. KHALED/COPULA PARAMETERIZATIONS 15 A practical way of answering the above question will be addressed after results on marginalization are derived. However, the problem here is only going to be characterized and no explicit solutions for particular parametric families (such as elliptical or Archimedean) are going to be exhaustively enumerated. 4. Marginalizations. Using theorem 1, it was shown that computing the copula C over the range of the margins R was equivalent to marginalizing the copula over all subsets S of the index set M = {1,..., m} such that S 2. Counter-intuitively, this is not the same thing as marginalizing the distribution of Y. However, the two operations are linked to each other in the sense that they operate on the same function, the joint probability mass function, but through functions that operate on dual posets. We will begin by introducing some notation and definitions and then make the previous assertion precise through a theorem. We will be interested in two posets constructed from the power set of M, first the posets where all the subsets are ordered by set inclusion ( 2 M, ) and its dual ( 2 M, ). These will have the usual Zeta and Möbius functions as in lemma 2. We will work with three bijective functions π, m and C each defined from 2 M into [0, 1] which are constructed respectively from some of the information encoded in the joint probability mass function, the marginal probability mass functions and the copula function. Write k-subsets as {j 1,..., j k } = S 2 M and its complements as M S = {j m k+1,..., j m }. Now define π as π (S) = Pr[Y j1 = 1,..., Y jk = 1, Y jk+1 = 0,..., Y jm = 0]

16 MOHAMAD A. KHALED/COPULA PARAMETERIZATIONS 16 The above formulation clearly captures all the information in the joint probability mass function by enumerating all S 2 M. Similarly, define m as m (S) = Pr[Y j1 = 1,..., Y jk = 1] and clearly m is characterized by information in the S-margins of Y. The above definition of m follows standard lines in the literature (see Qaqish and Ivanova [2006]). We will modify the definition of C when it is understood as having domain 2 M instead of [0, 1] m in the following way. Consider the m- permutation σ that swaps (1,..., k, k + 1,..., m) into (j 1,..., j k, j k+1,..., j m ). Now define C as C (S) = C σ(m) (u S, 1 m k ) where as before u S is the S-sub-vector of u, 1 m k is an (m k)-dimensional vector of ones and C σ(m) is the copula function in formula 2.1 with permuted indices. Now it is possible to state the result linking m and C. The main idea of the simple proof is to link m to π through Möbius inversion on (2 m, ) and to link C to π through Möbius inversion on the dual poset. Since both inversions are one-to-one, this leads to a bijective relationship over 2 M. Theorem 2. The relationship between probabilities for marginal models and copula functions for binary tables is given by (4.1) m (S) = ( 1) T R C (R) T S R T with unique inverse (4.2) C (S) = ( 1) R T m (R) T S R T

17 For every S M. MOHAMAD A. KHALED/COPULA PARAMETERIZATIONS 17 Proof We will need to introduce some new posets. Consider ([0, 1], ) with the usual order inherited from the real line and consider the product poset ([0, 1] m, ) as in lemma 3. The above locally finite poset induces a partial order on R. If one writes (R, ), it is clear that the obtained finite poset is isomorphic to ( 2 M, ) and that could be uniquely pinned down through the arguments in the proof of 1. By the the usual definition of a copula C (u 1,..., u j,..., u m ) = R j b u j Pr [Y 1 = u 1,..., Y j = b,..., Y m = u j ] and therefore b u with both b, u R C (u) = b u Pr [Y = b] Using the isomorphism between R and 2 M (and lemma 3), it is is possible to write the above as (4.3) C (S) = T S π (T ) Using lemmas 1 and 2, it is possible to recover π as a function of C by Möbius inversion as follows (4.4) π (S) = T S ( 1) S T C (T ) That Möbius function over ( 2 M, ) is the product of the Möbius functions over (R j, ).

18 MOHAMAD A. KHALED/COPULA PARAMETERIZATIONS 18 Now, turning to the dual poset ( 2 M, ), it is immediate by a simple integration argument that m (S) = π (R S) R M S = T S π (T ) which also immediately yields by the dual Möbius inversion formula the following (4.5) π (S) = T S ( 1) T S m (S) Equations 4.4 and 4.5 give mappings C π m which are one-to-one. Combining them both yield the required equations 4.1 and 4.2. Equations 4.4 is essential for computing maximum likelihood estimators for copula models and for proving identification for particular parametric families. Equation 4.5 could be used to derive marginal distribution over all S-margins by a simple permutation argument. Define a function π S (S ) : 2 S [0, 1] as the one giving the probability mass function over all S subsets. Let b S : 2 M {0, 1} m be some one-to-one mapping that is fixed in advance in an obvious way. Then, π S (S) = m (S) ( π S S ) = ( 1) T S m ( b 1 S ) T b 1 S When no confusion would ensue, the above function would be denoted by π m.

19 MOHAMAD A. KHALED/COPULA PARAMETERIZATIONS 19 The major implications here are however in the relationships implied by equations 4.1 and 4.2. They will allow us to completely characterize the link between marginal log-linear models (such as multivariate logistic or standard log-linear models) and other standard contingency table models on one hand and copula models with discrete margins on the other. Furthermore, they completely characterize copula parameterizations for binary tables. 5. Applications. Multiple illustrations will be given to show the implications of the results in the previous sections Sparse parameterizations and another look at identification. Let π be the 2 m -vector obtained by stacking all values for π (S). Then determining whether 3.3 is injective is equivalent to proving that the following set of nonlinear equations (5.1) C θ (u ) = g (π) has a unique solution in Θ (note a solution may not exist in the first place, in which case the copula parametrization is impossible for that particular contingency table.) g here is deduced from equation 4.3. The particular details will depend on C θ and on how large m is. Example 2. The copula representation of a 2 2 contingency table is given by the following table where, besides the marginal probabilities given by p j = 1 F j (0) for j = 1, 2, the only (nonparametrically) identified point is that given by the joint probability at (0, 0) yielding C (1 p 1, 1 p 2 ).

20 MOHAMAD A. KHALED/COPULA PARAMETERIZATIONS 20 Y 1\Y C (1 p 1, 1 p 2) 1 p 1 C(1 p 1, 1 p 2) 1 p p 2 C(1 p 1, 1 p 2) 1 + p 1 + p 2 + C(1 p 1, 1 p 2) p 1 1 p 2 p 2 1 Table 1 The above table was derived from the identity p (y 1, y 2 ) = Pr [Y 1 = y 1, Y 2 = y 2 ] = C(F 1 (y 1 ), F 2 (y 2 )) + C(F 1 (y 1 ), F 2(y 2 )) C(F 1 (y 1 ), F 2 (y 2 )) C(F 1(y 1 ), F 2(y 2 )) where y j indicates y j ε for some ε (0, 1). As a simple illustration, consider the following model Y {0, 1} m W N m (µ, Σ) Y j = 1 (W j > 0) where j = 1,..., m and µ and Σ could have any particular structure such as one depending on covariates. For that special case, just replace the C function in table 1 are simply determined from the cumulative distribution function of the multivariate normal distribution N µ (µ, Σ). Example 3. Consider the latent variable model W = µ + X, µ = (1, 0.5), ρ = 0.7, X N 2 0 2, 1 ρ and Y j = 1 (W j > 0) for ρ 1 j = 1, 2. This model is a stochastic representation of a multivariate discrete

21 MOHAMAD A. KHALED/COPULA PARAMETERIZATIONS 21 distribution with two Bernoulli margins and a Gaussian copula. It leads to the following contingency table Y 1 \Y That is, the margins are Y 1 B (0.8413) and Y 2 B (0.3053). The Gaussian copula takes the value C ρ (0.1587, ) = at one of the points where it is identified (see equation 5.1). Fixing the margins at the two Bernoulli distributions above, the question becomes: Is it possible to find other copulas giving rise to the exact same multivariate distribution for (Y 1, Y 2 ) while keeping the margins fixed? The parameter ρ is identified in this case by the properties of C ρ. However, the answer determines that the Gaussian copula of the stochastic representation is not uniquely identified (if the class of parametric distributions extend beyond that of the Gaussian copulas) and is only one of a potential infinity of possible copulas. If we start by considering the class of all one-parameter copulas, then their parameter could be potentially identified by solving the following equation (see equation 5.1) C θ (0.1587, ) = The Clayton copula with θ = The Gumbel copula with θ =

22 MOHAMAD A. KHALED/COPULA PARAMETERIZATIONS 22 The Frank copula with θ = etc... One important aspect of the above example is that all the one-dimensional copulas considered (that is, if C θ is taken to be the union of the Archimedean and the Gaussian copula) are indistinguishable. That is, different people estimating these different copula models on the same two binary variables are really estimating the same object. Also, when the problem is analyzed from a contingency table non-parametric perspective (fixing π and computing θ as in the analysis above), the main issue becomes quickly the existence of copula parameterizations for a given table. That is, instead of having p fixed, it needs to be O (2 m ). From that perspective, making fixing p (as for Archemedean copulas), or making it grow at a polynomial rate (as for elliptical copulas) would yield constraints on joint mass functions that become harder and harder as m grows. Example 4. It is clear that Archemedean are fully flexible in two dimensions. However, when m = 3, fixing the parameter θ and the distribution of one of the sets of cardinality 2 (say {1, 2}) would automatically fix the probability of all sets with cardinality greater or equal to 2. That is, fixing θ and π (12) will automatically yield (by equation 4.4) π (13) = π (12) + C 13 (u 12; θ) C 12 (u 13; θ) + π (2) π (3) and so forth for {2, 3} and {1, 2, 3}. In a similar fashion to the example above, the Gaussian copula is fully

23 MOHAMAD A. KHALED/COPULA PARAMETERIZATIONS 23 flexible in two or three dimensions, but beyond m 4, constraints implied by equation 4.4 need to be imposed. When equation 5.1 is solved for π given θ, and in a sparsity modeling framework for contingency tables, this yields into a very attractive feature for copula modeling for binary tables. It yields a way of imposing sparsity on higher table without having to resort to put zeros everywhere, making the pattern of dependence more meaningful in a lot of settings Marginalization pitfalls. Consider the case when all the margins are absolutely continuous and denote by c the copula density obtained from C. The S-margin of Y is obtained from m p (y) = c (F 1 (y 1 ),..., F m (y m )) f j (y j ) by simply writing j=1 p (y S ) = c S (F j1 (y j1 ),..., F jk (y jk )) j S f j (y j ) where f j is the density of margin j and c S is the density of the marginal copula. This intuitive formula is in stark contrast to the formulas of theorem 2. As a matter of fact, from equation 4.1, it is clear that the marginal distributions for subset S not only depend on copula marginals through S, but also on copula marginal over all subsets of M. This aspect has resulted in several pitfalls in the past when trying to apply copula models to non-continuous margins. Note that the famous result about measures of association in say Denuit and Lambert [2005] or Nešlehová [2007] is a special case of the above. Indeed, since those measures depend on π m computed over margins associated with sets of cardinality 2, it is immediate that they also depend on

24 MOHAMAD A. KHALED/COPULA PARAMETERIZATIONS 24 margins associated with sets of cardinality 1 through formula 4.1. This is a very short proof of the main result of Denuit and Lambert [2005] Multivariate logistic models. The multivariate logistic model of Mc- Cullagh and Nelder [1989] and Glonek and McCullagh [1995] can be written as η = C log (Lπ) = C log (γ) where π is some vector of probabilities and η = Xβ is a vector of linear indices. The construction of matrices C and L is shown in many places (see for instance Forcina, Lupparelli and Marchetti [2010]). An alternative formulation is considered here. Mainly, the one formulated in Qaqish and Ivanova [2006]. In that paper, an algorithm for computing the mapping between joint probabilities and the complete list of odds-ratios satisfied by the multivariate logistic model is derived. That algorithm has the advantage of surely finding the mapping if it exists and failing if it does not. It exploits the fact that, given marginal probabilities, odds ratios could be recursively computed through the solution of polynomial equations of increasing complexity. Since it relies on marginal probabilities, theorem 2 automatically extends their algorithm to the case of copulas. The construction of the algorithm is based on the extension of the poset ( 2 M, ) into a completely ordered set and exploiting some inequalities when recursively deriving the polynomial equations. Let us now denote the completely ordered elements of 2 M by a j for j = 1,..., 2 m, i.e. 2 M = {a 1,..., a 2 m}

25 MOHAMAD A. KHALED/COPULA PARAMETERIZATIONS 25 where obviously a 1 = and a 2 m = M. Now, recursively define subsets of 2 M by S j = {a 1,..., a j } Now define the operation S j+1 = S j {j + 1} where A B A {a B : a A}. Starting from S 1 = {, 1}, this automatically creates one possible linear order. Construct the sets D j+1 = N j (D j {j + 1}) N j+1 = D j (N j {j + 1}) starting from D 1 = { } and N 1 = {1}. It is immediate that S j = D j N j and that D j = N j = S j 2 = 2 j 1. Let T 2 M such that S = j. Consider π m to be the joint probability over 2 S obtained through m ( 2 S). Now it is possible to completely enumerate the log odds-ratios using the following way. ψ (T ) = R S ( 1) 1(R N ) π m (R) where N = N (S) = N j and D = D (S) = D j were constructed from S j by setting it equal to S independently of T. The complexity of those models arise from the impossibility of finding an analytic solution for the inverse relationship expressing π m as a function of ψ. Now define, for R S r (R) = m (S) + π m (R) 2 1 (R N ) + 1

26 MOHAMAD A. KHALED/COPULA PARAMETERIZATIONS 26 to obtain the polynomial equation (of order 2 j = 2 S ) ( ) e ψ(s) {r (R) m (S)} { ( r R ) + m (S) } R D R N Solving these recursively for all S 2 M starting from j = 1 until reaching j = m yields as one possible root e ψ(s) (among the j roots) the one falling in the interval (if such a root exists) [ ( r (R)), R N R D r ( R )] Applying copula results obtained here to the algorithm in Qaqish and Ivanova [2006], it is now possible to extend it to obtain C as a function of ψ. In particular apply recursively the transformations ψ π m m C. In particular, in the case of a multivariate logistic regression model, it is possible to apply the transformation β ψ. Notice that, using theorem 2, the inverse transformation C ψ is available in closed-form. Example 5. As an illustrative example, consider m = 3, then D 3 = {, 12, 13, 23}, N 3 = {1, 2, 3, 123} and recursively apply First, for j = 2: r ( ) r (1) r (2) = 1 m (1) m (2) m (1) m (2) which satisfies the inequalities max (0, m (1) + m (2) 1) m (12) min (m (1), m (2)) yielding a quadratic polynomial to solve in m (12) with one solution in the above interval.

27 MOHAMAD A. KHALED/COPULA PARAMETERIZATIONS 27 And for j = 3: r ( ) 1 m (1) m (2) m (3) + m (12) + m (13) + m (23) r (1) m (1) m (12) m (13) r (2) m (2) m (12) m (23) r (12) = m (12) r (3) m (3) m (13) m (23) r (13) m (13) r (23) m (23) yielding the following inequalities 0 m (12) + m (13) m (1) m (12) + m (23) m (2) m (13) + m (23) m (3) m (123) m (12) m (13) m (23) 1 m (1) m (2) m (3) + m (12) + m (13) + m (23) It is tedious to write the cubic polynomial in m (123) but quite straightforward to use either a symbolic computing program written in say the Mathematica language or a program using simple numerical methods (say in C or matlab) to solve the above equations and compute the different mappings Further extensions. The analysis of subsection 5.3 is applicable to other marginal log-linear models since they all rely on specifying the func-

28 MOHAMAD A. KHALED/COPULA PARAMETERIZATIONS 28 tions m and π m. Using theorem 2, linking that to C yields a copula parameterization for these models. This allows for direct comparison between the different models and a way to weigh the advantages of each against each other. 6. Discussion. The paper studied identification, characterization and parametrization results for a large class of binary contingency table models. The apparatus elaborated so far is concrete and algorithmic aspects are immediately implementable for empirical problems. Further illustrations need to be elaborated for other cases such as the classical log-linear model and other important special cases of graphical models. A lot of questions remain open. That includes, specific identification results linking a particular parametric copula family to a particular marginal log-linear model, specific necessary and sufficient conditions for existence and uniqueness of maximum likelihood estimators, consequences of graph and order-theoretic structures such as decomposability,... These avenues of explorations are yet to be taken and provides exciting prospects for research in that area. APPENDIX A: SOME ADDITIONAL ELEMENTS In Matzkin [2007], the concept of non-parametric identification is presented through the following statement Rather than asking whether some parameters were identified, the question of interest became whether a function or distribution was identified within a general set of functions or distributions. Establishing such a nonparametric identification was recognized as an important first step in the econometric analysis of even parametric models.

29 MOHAMAD A. KHALED/COPULA PARAMETERIZATIONS 29 The objective of this appendix is to present clear definitions of the identification notions used throughout the paper. Let F be the set of all cumulative distribution functions of random m- dimensional binary vectors Y. It is possible to identify a single element F in F, F being a distribution of observables. F will automatically determine its marginals F 1,..., F m (by integration) and is a finite object determined by 2 m 1 quantities (F being isomorphic to the simplex 2m 1 ). Let C be some set of copula functions, that is, C is a set of multivariate cumulative distribution functions C : [0, 1] m [0, 1] such that each of the margins of C is uniform on [0, 1]. Each C is therefore an uncountable object ([0, 1] m being uncountable, and each of its margins being absolutely continuous over [0, 1]). If each C C is obtained by a one-to-one mapping from a subset of a finite-dimensional set Θ R p, then the model specified by C is parametric. Otherwise, if the object of inference is inherently infinitedimensional, then the model specified by C is non-parametric. C will be referred to as C θ in the parametric case. A copula model imposes restrictions on F that are determined by the equation F (y 1,..., y m ) = C (F 1 (y 1 ),..., F m (y m )) for some C C and some F 1,..., F m completely determined by the m quantities Pr [Y 1 = 1],..., Pr [Y m = 1], i.e. completely determined by any vector u [0, 1] m. The set of models is given by C [0, 1] m and, therefore, a particular model is given by the couple of elements (C, u). It imposes constraints on the set

30 MOHAMAD A. KHALED/COPULA PARAMETERIZATIONS 30 of possible distributions of the random vector Y through the mapping Ψ : C [0, 1] m 2m 1 where C = Θ in the parametric case. That is Ψ is a mapping between the set of models and the simplex. Definition 1. Non-parametric identification Fixing F F and therefore automatically fixing u (F ), the objective of non-parametric identification is therefore to determine C C such that C = {C C : Ψ (C, u (F )) = F } Definition 2. Parametric identification Fixing F F and therefore automatically fixing u (F ), the objective of parametric identification is therefore to determine Θ Θ such that Θ = {θ Θ : Ψ (C θ, u (F )) = F } If either C or Θ are singletons, then C or θ are said to be point-identified or simply identified given F. Similarly, one can say that the model (C, u) is identified given F. If F F, C (respectively θ) are identified, then the model is said to be non-parametrically (respectively non-parametrically) identified. If C or Θ are larger than a singleton (they could be infinite), then C or θ are partially identified given F. Similarly, one can say that the model is partially identified given F. If F F, C (respectively θ) are partially identified, then one can say that the model is non-parametrically (respectively parametrically) nonidentifiable or non-identified.

31 MOHAMAD A. KHALED/COPULA PARAMETERIZATIONS 31 If for C 1, C 2 C, Ψ (C 1, u (F )) = Ψ (C 2, u (F )) = F, then the (parameters) C 1 and C 2 are said to be indistinguishable given F. If C 1 C and C 2 C and F F, C 1 C 1, C 2 C 2, Ψ (C 1, u (F )) = Ψ (C 2, u (F )) = F then the models given by C 1 and C 2 and any marginal probabilities u [0, 1] m are indistinguishable. REFERENCES Bartolucci, F., Colombi, R. and Forcina, A. (2007). An extended class of marginal link functions for modelling contingency tables by equality and inequality constraints. Statistica Sinica Bergsma, W. P. and Rudas, T. (2002a). Marginal models for categorical data. The Annals of Statistics Bergsma, W. P. and Rudas, T. (2002b). Variation independent parameterizations of multivariate categorical distributions. Distributions with given marginals and related topics Bergsma, W., Croon, M. A., Hagenaars, J. A. and Corporation, E. (2009). Marginal Models. Springer. Bóna, M. (2006). A walk through combinatorics: an introduction to enumeration and graph theory. World Scientific. Carley, H. (2002). Maximum and minimum extensions of finite subcopulas. Communications in Statistics-Theory and Methods Caspard, N., Leclerc, B. and Monjardet, B. (2012). Finite ordered sets. Cambridge University Press. Christensen, R. (1990). Log-linear models. Springer Verlag. Constantine, G. M. (1987). Combinatorial theory and statistical design. Wiley. Cressie, N. and Wikle, C. K. (2011). Statistics for spatio-temporal data. Wiley. Denuit, M. and Lambert, P. (2005). Constraints on concordance measures in bivariate discrete data. Journal of Multivariate Analysis

32 MOHAMAD A. KHALED/COPULA PARAMETERIZATIONS 32 Drton, M. and Richardson, T. S. (2008). Binary models for marginal independence. Journal of the Royal Statistical Society: Series B (Statistical Methodology) Evans, R. J. and Richardson, T. S. (2011). Marginal log-linear parameters for graphical markov models. Arxiv preprint arxiv: Forcina, A., Lupparelli, M. and Marchetti, G. (2010). Marginal parameterizations of discrete models defined by a set of conditional independencies. Journal of Multivariate Analysis Genest, C. and Neŝlehová, J. (2007). A primer on copulas for count data. Astin Bulletin Glonek, G. F. V. and McCullagh, P. (1995). Multivariate logistic models. Journal of the royal statistical society. Series B (Methodological) Horowitz, J. L. (2009). Semiparametric and nonparametric methods in econometrics. Springer Verlag. Joe, H. (1997). Multivariate models and dependence concepts. Chapman & Hall/CRC. Kennes, R. and Smets, P. (1990). Computational aspects of the Möbius transform. In Uncertainty in artificial intelligence Kurowicka, D. and Cooke, R. (2006). Uncertainty Analysis with High Dimensional Dependence Modelling. John Wiley&Sons Ltd. Lauritzen, S. L. (1996). Graphical models. Oxford University Press. Matzkin, R. L. (2007). Nonparametric identification. Handbook of Econometrics McCullagh, P. (1987). Tensor methods in statistics. Chapman and Hall. McCullagh, P. and Nelder, J. (1989). Generalized Linear Models, Second ed. Chapman and Hall. Nelsen, R. B. (2006). An introduction to copulas, Second ed. Springer Verlag. Nešlehová, J. (2007). On rank correlation measures for non-continuous random variables. Journal of Multivariate Analysis Nguyen, H. T. (2006). An introduction to random sets. CRC press. Panagiotelis, A., Czado, C. and Joe, H. (2012). Pair Copula Constructions for Multivariate Discrete Data. Journal of the American Statistical Association (forthcoming). Pitt, M., Chan, D. and Kohn, R. (2006). Efficient Bayesian inference for Gaussian

33 MOHAMAD A. KHALED/COPULA PARAMETERIZATIONS 33 copula regression models. Biometrika Qaqish, B. F. and Ivanova, A. (2006). Multivariate logistic models. Biometrika Rota, G. C. (1964). On the foundations of combinatorial theory I. Theory of Möbius functions. Probability theory and related fields Rudas, T., Bergsma, W. P. and Németh, R. (2010). Marginal log-linear parameterization of conditional independence models. Biometrika Smith, M. S. and Khaled, M. A. (2012). Estimation of Copula Models With Discrete Margins via Bayesian Data Augmentation. Journal of the American Statistical Association Stanley, R. P. (2012). Enumerative combinatorics 1, Second ed. Cambridge University Press. Wermuth, N. (2011). Probability distributions with summary graph structure. Bernoulli Whittaker, J. (1990). Graphical Models in Applied Multivariate Statistics. Wiley. School of Economics The University of Queensland Brisbane, QLD 4072 Australia

ARTICLE IN PRESS. Journal of Multivariate Analysis ( ) Contents lists available at ScienceDirect. Journal of Multivariate Analysis

ARTICLE IN PRESS. Journal of Multivariate Analysis ( ) Contents lists available at ScienceDirect. Journal of Multivariate Analysis Journal of Multivariate Analysis ( ) Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Marginal parameterizations of discrete models