Matrix representations and independencies in directed acyclic graphs

Similar documents
MATRIX REPRESENTATIONS AND INDEPENDENCIES IN DIRECTED ACYCLIC GRAPHS

MATRIX REPRESENTATIONS AND INDEPENDENCIES IN DIRECTED ACYCLIC GRAPHS

Probability distributions with summary graph structure

PROBABILITY DISTRIBUTIONS WITH SUMMARY GRAPH STRUCTURE. By Nanny Wermuth Mathematical Statistics at Chalmers/ University of Gothenburg, Sweden

Distortion of effects caused by indirect confounding

arxiv: v2 [stat.me] 9 Oct 2015

Algebraic Representations of Gaussian Markov Combinations

Faithfulness of Probability Distributions and Graphs

Elementary maths for GMT

Changing sets of parameters by partial mappings Michael Wiedenbeck and Nanny Wermuth

Intrinsic products and factorizations of matrices

Likelihood Analysis of Gaussian Graphical Models

Matrices. Chapter Definitions and Notations

Directed acyclic graphs and the use of linear mixed models

Graphical Models and Independence Models

Learning Multivariate Regression Chain Graphs under Faithfulness

Learning Marginal AMP Chain Graphs under Faithfulness

7 Matrix Operations. 7.0 Matrix Multiplication + 3 = 3 = 4

Equivalence in Non-Recursive Structural Equation Models

Markov properties for graphical time series models

Definition: A binary relation R from a set A to a set B is a subset R A B. Example:

Markov properties for mixed graphs

Undirected Graphical Models

Section Summary. Relations and Functions Properties of Relations. Combining Relations

Notes. Relations. Introduction. Notes. Relations. Notes. Definition. Example. Slides by Christopher M. Bourke Instructor: Berthe Y.

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition

Relations Graphical View

Tutorial: Gaussian conditional independence and graphical models. Thomas Kahle Otto-von-Guericke Universität Magdeburg

A matrix over a field F is a rectangular array of elements from F. The symbol

Markov Independence (Continued)

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 1 x 2. x n 8 (4) 3 4 2

ARTICLE IN PRESS. Journal of Multivariate Analysis ( ) Contents lists available at ScienceDirect. Journal of Multivariate Analysis

Chapter 1: Systems of linear equations and matrices. Section 1.1: Introduction to systems of linear equations

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition

2. Matrix Algebra and Random Vectors

Marginal log-linear parameterization of conditional independence models

arxiv: v2 [stat.me] 5 May 2016

Chris Bishop s PRML Ch. 8: Graphical Models

arxiv: v1 [stat.me] 5 Apr 2017

The Coin Algebra and Conditional Independence

Orthogonality graph of the algebra of upper triangular matrices *

Systems of Linear Equations and Matrices

CSC Discrete Math I, Spring Relations

A graph theoretic approach to matrix inversion by partitioning

Matrix Algebra Determinant, Inverse matrix. Matrices. A. Fabretti. Mathematics 2 A.Y. 2015/2016. A. Fabretti Matrices

Systems of Linear Equations and Matrices

Central Groupoids, Central Digraphs, and Zero-One Matrices A Satisfying A 2 = J

Linear Algebra and Matrix Inversion

On Markov Properties in Evidence Theory

Fisher Information in Gaussian Graphical Models

Linear Algebra March 16, 2019

Causal Effect Identification in Alternative Acyclic Directed Mixed Graphs

STA 4273H: Statistical Machine Learning

Review of linear algebra

MATH 106 LINEAR ALGEBRA LECTURE NOTES

Diagonalization by a unitary similarity transformation

Probabilistic Graphical Models (I)

Linear Algebra. Linear Equations and Matrices. Copyright 2005, W.R. Winfrey

Fundamentals of Engineering Analysis (650163)

CS100: DISCRETE STRUCTURES. Lecture 3 Matrices Ch 3 Pages:

. a m1 a mn. a 1 a 2 a = a n

Boolean Inner-Product Spaces and Boolean Matrices

Chapter 2: Matrix Algebra

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 13

Matrix & Linear Algebra

Finite Mathematics Chapter 2. where a, b, c, d, h, and k are real numbers and neither a and b nor c and d are both zero.

Matrices. Chapter What is a Matrix? We review the basic matrix operations. An array of numbers a a 1n A = a m1...

Prof. Dr. Lars Schmidt-Thieme, L. B. Marinho, K. Buza Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course

Relations. We have seen several types of abstract, mathematical objects, including propositions, predicates, sets, and ordered pairs and tuples.

MAT 2037 LINEAR ALGEBRA I web:

Probabilistic Graphical Models

Phys 201. Matrices and Determinants

Decomposable and Directed Graphical Gaussian Models

On identification of multi-factor models with correlated residuals

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Paths and cycles in extended and decomposable digraphs

Linear Algebra Review

Regression. Oscar García

3. Vector spaces 3.1 Linear dependence and independence 3.2 Basis and dimension. 5. Extreme points and basic feasible solutions

Chapter 9: Relations Relations

Total positivity in Markov structures

LINEAR SYSTEMS, MATRICES, AND VECTORS

GROUPS. Chapter-1 EXAMPLES 1.1. INTRODUCTION 1.2. BINARY OPERATION

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

Kernels of Directed Graph Laplacians. J. S. Caughman and J.J.P. Veerman

Undirected Graphical Models: Markov Random Fields

MATH Topics in Applied Mathematics Lecture 12: Evaluation of determinants. Cross product.

Numerical Analysis Lecture Notes

MATH Mathematics for Agriculture II

Acyclic Digraphs arising from Complete Intersections

Vectors and Matrices

Chain Independence and Common Information

Math Camp II. Basic Linear Algebra. Yiqing Xu. Aug 26, 2014 MIT

Linear Algebra. Matrices Operations. Consider, for example, a system of equations such as x + 2y z + 4w = 0, 3x 4y + 2z 6w = 0, x 3y 2z + w = 0.

Matrices and Determinants

APPENDIX: MATHEMATICAL INDUCTION AND OTHER FORMS OF PROOF

ON MULTI-AVOIDANCE OF RIGHT ANGLED NUMBERED POLYOMINO PATTERNS

MATH 2030: MATRICES ,, a m1 a m2 a mn If the columns of A are the vectors a 1, a 2,...,a n ; A is represented as A 1. .

Linear Algebra. The analysis of many models in the social sciences reduces to the study of systems of equations.

Basic Concepts in Matrix Algebra

Transcription:

Matrix representations and independencies in directed acyclic graphs Giovanni M. Marchetti, Dipartimento di Statistica, University of Florence, Italy Nanny Wermuth, Mathematical Statistics, Chalmers/Göteborgs Universitet, Sweden Abstract For a directed acyclic graph, there are two known criteria to decide whether any specific conditional independence statement is implied for all distributions factorizing according to the given graph. Both criteria are based on special types of path in graphs. They are called separation criteria because independence holds whenever the conditioning set is a separating set in a graph theoretical sense. We introduce and discuss an alternative approach using binary matrix representations of graphs in which zeros indicate independence statements. A matrix condition is shown to give a new path criterion for separation and to be equivalent to each of the previous two path criteria. Key words: Conditional independence, edge matrix, parent graph, partial closure, partial inversion, separation criteria, stepwise data generating process. 1 Introduction We consider stepwise processes for generating joint distributions of random variables Y i for i = 1,..., d, starting with the marginal density of Y d, proceeding with the conditional density of Y d 1 given Y d, up to Y 1 given Y 2,..., Y d. The conditional densities are of arbitrary form but have the independence structure defined by an associated directed acyclic graph in d nodes, in which node i represents variable Y i. Furthermore, an arrow starting at node j > i and pointing to i, the offspring of j, indicates a non-vanishing conditional dependence of Y i on Y j. Node j is then called a parent node of i, the set of parent nodes is denoted by par i {i + 1,...,d} and the graph together with the complete ordering of the nodes as V = (1,...,d) is the parent graph G V par. The joint density f V (y) of the d 1 random variable Y factorizes as where f i pari (y i y pari ) = f i (y i ) whenever par i is empty. f V (y) = d i=1 f i par i (y i y pari ), (1) 1

For each node j > i, not a parent of i, the factorization (1) implies that the ij-arrow is missing in the graph and that Y i is conditionally independent of Y j given Y pari. The defining list of independencies for G V par, written compactly in terms of nodes, is i j par i for j > i not parent of i and i = 1,..., d 1. (2) One speaks of distributions generated over G V par. This parent graph is a special type of independence graph for which each missing edge corresponds to an independence statement. Typically, a distribution generated over G V par satisfies more independence statements than those given directly by the defining list (2). We take as an example all types of joint distribution generated over the following Markov chain graph in four nodes 1 2 3 4. The defining independence for node 1 is 1 {3, 4} 2, but for instance the additional independence statement 1 4 3 also holds. In general, for any disjoint subsets α, β and C of V we denote Y α conditionally independent of Y β given Y C by α β C. Methods have been developed to decide for any nonempty sets α and β whether a given parent graph implies that α β C holds for all distributions generated over it. Such methods have been called separation criteria because they check if the conditioning set C is a separating set, in the sense of graph theory. Two quite different but equivalent criteria have been derived. Both are based on special types of path in independence graphs. Geiger et al. (1990) has been called d-separation because it is applied to a directed acyclic graph, while the second by Lauritzen et al. (1990) uses the basic graph theoretic notion of separation in an undirected graph. Such a graph, derived from G V par, has been named moral graph. In this paper we use a different approach by associating first a joint Gaussian distribution of Y with G V par. For this distribution, the list of independencies (2) is equivalent to a set of zero population least-squares regression coefficients, i.e. to a set of linear independencies i j par i β i j.pari = 0, (3) where β i j.pari is an adaption of the Yule-Cochran notation for the regression coefficient of Y j, here in linear least-squares regression of Y i on Y pari and Y j for j > i not a parent of i. There are then two key results: (a) in distributions of arbitrary form generated over G V par, probabilistic independence statements combine in the same way as linear independence statements and (b) special types of path in G V par lead to dependence of Y α on Y β given Y C in the relevant family of Gaussian distributions generated over a given parent graph, i.e. in Gaussian distributions with parameters constrained only by the defining list of independencies (2) and having non-vanishing dependence of Y i on Y j given Y par\j for every i j arrow present in the parent graph. 2

In any Gaussian distribution of Y, α β C holds if and only if the population coefficient matrix of Y β in linear least-squares regression of Y α on both, Y β and Y C, is zero. This matrix is related to linear equations associated with G V par using a generalization of the sweep operator for symmetric matrices (Dempster, 1969), called partial inversion. With another operator for binary matrices, named partial closure, so-called structural zeros in this matrix are expressed in terms of a special binary matrix representation of the parent graph, named its edge matrix; see Wermuth and Cox (2004), Wermuth, Wiedenbeck and Cox (2006). The induced binary matrix is shown to be zero if and only if α β C holds in all distributions generated over G V par. The matrix criterion leads to a further path-based criterion for separation in directed acyclic graphs. Finally, equivalence of the new criterion to each of the two known separation criteria is established, after having given first a matrix formulation to each of these latter two path-based criteria. 2 Edge matrices and induced independence statements Every independence graph has a matrix representation called its edge matrix; Wermuth and Cox (2004). In this paper we are concerned with edge matrices derived from the one of the parent graph. They characterize what we call induced independence graphs. 2.1 Edge matrix of the parent graph and linear recursive regressions The edge matrix of a parent graph is a d d upper triangular binary matrix A = (A ij ) such that { 1 if i j in G V A ij = par or i = j, (4) 0 otherwise. This matrix is the transpose of the usual adjacency matrix representation of G V par, with additional ones along the diagonal. Because of the triangular form of the edge matrix A, densities (1) generated over G V par are called triangular systems. If the mean-centred random vector Y generated over the parent graph has a joint Gaussian distribution, then each factor f i pari (y i y pari ) of equation (1) is a linear leastsquares regression Y i = j par i β i j.pari \jy j + ε i where the residuals ε i are mutually independent with zero mean and variance σ ii pari. Then the joint model can be written in the form AY = ε, (5) where A is a real-valued d d upper triangular matrix with ones along the diagonal. The d 1 vector ε has zero mean and cov(ε) =, a diagonal matrix with elements ii = σ ii pari > 0. The covariance and concentration matrix of Y are then, respectively, Σ = A 1 A T and Σ 1 = A T 1 A. An upper off-diagonal element of the generating 3

matrix A is A ij = β i j.pari \j 0 if the ij-arrow is present in the parent graph and A ij = 0 if the ij-arrow is missing in G V par. In this Gaussian model, there is a one-to-one correspondence between missing ijedges in the parent graph and zero parameters, A ij = 0. As a consequence, any such zero coincides in this case with a structural zero, i.e. a zero that holds for the relevant family of Gaussian distributions generated over G V par. Therefore, the edge matrix A of the parent graph can be interpreted as the indicator matrix of zeros in A, i.e. A = In[A], where the In[ ] operator transforms every nonzero entry of a matrix to be equal to one. The linear generating matrix A and the edge matrix A of G V par are the starting point to find induced conditional independencies satisfied by all distributions generated over the parent graph. As we shall see, for any given Gaussian distribution generated over G V par independence statements are reflected by zeros in a matrix derived from A. Some of these zeros may be due to specific parametric constellations, others are consequences of the defining list of independencies (2). These latter zeros, i.e. the structural zeros in this matrix, show as zeros in an edge matrix derived from A, called an induced edge matrix which in turn provides the matrix representation of an induced independence graph. 2.2 Independence and structural zeros For any partitioning of the vertex set V into node sets M, α, β, C, where only M and C may be empty, a joint density that factorizes according to (1) is now considered in the form f V = f M αβc f α βc f β C f C. Marginalizing over M as well as conditioning on C and removing the corresponding variables gives f αβ C = f α βc f β C, so that α β C holds if and only if f α βc = f α C. For a Gaussian distribution, the independence α β C is equivalently captured by Π α β.c = 0 Σ αβ C = 0 where Π α β.c denotes the coefficient matrix of Y β in linear least-squares regression of Y α on Y β and Y C, and Σ αβ C the joint conditional covariance matrix of Y α and Y β given Y C. The essential point is then that independence of Y α and Y β given Y C is implied for all distributions generated over G V par if and only if for Gaussian models (5), the induced coefficient matrix Π α β.c is implied to vanish for all permissible values of the parameters. Now, Π α β.c can be viewed as a submatrix of Π a b, since with α = a\m and β = b\c ( ) Πα β.c Π Π a b = α C.β. (6) Π M β.c Π M C.β Therefore, the specific form of Π α β.c implied by A and its structural zeros implied by A may be derived from linear least-squares regression of Y a on Y b. Before turning to this task, we summarize how linear independence statements relate to conditional independencies specified with a parent graph. 4

2.3 Combining independencies in triangular systems of densities It had been noted by Smith (1989), example 2.8) that probabilistic and linear independencies combine in the same way. We prove a similar property, using assertions that have been discussed recently by Studený (2005), where we take V to be partitioned into a, b, c, d. Lemma 1. In densities of arbitrary form generated over G V par, conditional independence statements combine as in a non-degenerate Gaussian distribution. This means that they satisfy the following statements, where we write for instance bc for the union of b and c, (i) symmetry: a b c implies b a c; (ii) decomposition: a bc d implies a b d; (iii) weak union: a bc d implies a b cd; (iv) contraction: a b c and a d bc imply a b cd; (v) intersection: a b cd and a c bd imply a bc d; (vi) composition: a c d and b c d imply ab c d. Proof. The first four statements are basic properties of probabilities, see e.g. Dawid (1979). Densities (1) generated over a parent graph satisfy also the intersection and the composition property due to the full ordering of the node set and an ij-dependence if and only if there is an ij-arrow in G V par. The generating process implies for two nodes i < j that of a i jd and a j id, the statement a j id is not in the defining list (2). If a j id is to be satisfied, then a j d has to be in the defining list as well. And in this case, f ija d = f i jd f j d = f ij d so that a ij d is implied. Similarly for i < j, both of i c d and j c d can only be in the defining list of independencies if the statement i c jd is also satisfied. And, in this case, f ijc d = f i jd f j d = f ij d so that ij c d is implied. Equivalence of each of the assertions to statements involving least-squares coefficient matrices, proved in Lemma 6, Appendix A.2, completes the proof. 2.4 Partially inverted concentration matrices The induced parameter matrix Π α β.c is to be expressed in terms of the original parametrization (A, ). This is achieved in terms of the matrix operator partial inversion; see Appendix A.1 for a detailed summary of some of its properties. 5

Partial inversion with respect to any rows and columns a of the concentration matrix, partitioned as (a, b), transforms Σ 1 into inv a Σ 1 ( ) Σ Σ 1 aa Σ ab =. Σ bb, inv a Σ 1 = ( Σaa b Π a b Σ bb.a ), (7) where the. notation indicates entries in a symmetric matrix and the notation denotes entries in a matrix which is symmetric except the sign. The submatrix Π a b is as defined before, submatrix Σ aa b = (Σ aa ) 1 is the covariance matrix of Y a Π a b Y b and submatrix Σ bb.a = Σ 1 bb is the marginal concentration matrix of Y b. The following result relates these submatrices of inv a Σ 1 to those of B = inv a A and. Lemma 2. Wermuth and Cox (2004). For a linear triangular system defined by (5) with Σ = cov(y ), a any subset of V, and b = V \ a ( ) inv a Σ 1 Baa H = aa Baa T B ab + B aa H ab B bb Bbb TH, (8) bbb bb where H = inv b (L L T ) and ( Iaa 0 L = B ba I bb ). (9) Proof. With matrix L as defined in equation (9) and A, partitioned according to (a, b), the matrix products LA and U = A 1 L 1 are of upper block-triangular form and Σ = A 1 A T = U(L L T )U T. Then the result follows by partial inversion of this type of matrix product with respect to b, given with Lemma 5 in Appendix A.1. Lemma 2 leads, after expansion of the matrix H, to an explicit expression of the matrix of least-squares regression coefficients Π a b as a function of and B, used later for Proposition 1, Π a b = B ab B aa aa B T ba( bb + B ba aa B T ba) 1 B bb. (10) Similarly, the explicit expression of Σ aa b will be used for Proposition 2 and of Σ bb.a for Proposition 3. 2.5 Induced edge matrices To obtain the induced edge matrix P a b of Π a b, we display first the matrix A, partially inverted on a, i.e. B = inv a A, and the corresponding induced edge matrix derived from A, i.e. B = zer a A, see Appendix A.1 for some properties of this operator, B = ( A 1 aa A 1 aa A ab A ba A 1 aa A bb A ba A 1 aa A ab ) [( A, B = In aa A aa A )] ab A ba A aa A bb + A ba A aa A, (11) ab 6

with A aa = In[(k I aa A aa ) 1 ], where I aa denotes an identity matrix of dimension d a and k = d a + 1. The matrix A aa provides the structural zeros in A 1 aa. The graph with edge matrix A is often called the transitive closure of the graph with edge matrix A. The transition from a matrix of parameters, here B, in a linear system to its matrix of structural zeros and, equivalently, to the induced edge matrix, here B, can be generalized. Lemma 3. Wermuth and Cox (2007). Let an induced parameter matrix M ab, say, be defined by sums of products given by submatrices of (A, ) of a linear triangular system (5) such that the defining matrix products hide no matrix multiplied by its inverse. Let further the structural zeros of A be given by A. Then the edge matrix M ab of M ab induced by A is obtained by replacing in the given sum of products (i) every inverse matrix, say A 1 aa by the binary matrix of its structural zeros A aa, (ii) every diagonal matrix by an identity matrix of the same dimension, (iii) every other submatrix, say A ab or A ab, by the corresponding binary submatrix of structural zeros, A ab, and then applying the indicator function. Proof. Each submatrix of a linear parameter matrix is substituted by a non-negative matrix having the appropriate structural zeros. By multiplying, summing and applying the indicator function, all structural zeros are preserved and no additional zeros are generated, while some structural zeros present in A ab may be destroyed, i.e. be changed in M ab into ones. After applying Lemma 3 to equation (10), the edge matrix induced by A for Π a b results as P a b = In[B ab + B aa B T ba(i bb + B ba B T ba) B bb ]. (12) This edge matrix is rectangular and an ij-one represents induced conditional dependence of Y i on Y j given Y b\j in the relevant family of Gaussian distributions. In Figure 1 below such an ij-one is drawn as an arrow pointing from j in b to i in a. Equation (12) gives the following matrix criterion for separation with V partitioned as before into M, α, β, C. Lemma 4. The parent graph G V par implies that α β C holds in every density generated over it if and only if (P a b ) α,β = P α β.c = 0, where P α β.c is the edge matrix induced by A for Π α β.c. Proof. If there is an ij-edge in the graph with edge matrix P α β.c, then β i j.cβ\j 0 holds in the relevant family of Gaussian densities generated over G V par, see e.g. Wermuth and Cox (2007). If instead, the ij-edge is missing, then i j Cβ is implied for every member of the family of Gaussian distributions. For P α β.c = 0, the statement α β C results then with Lemma 1 for every distribution generated over G V par. 7

3 A path-based interpretation of the matrix criterion To give a path interpretation of the matrix criterion in Lemma 4, we summarize first some definitions related to paths and graphs. Two nodes i and j in an independence graph have at most one edge. If the ij-edge is present in the graph then the node pair i, j is coupled; if the ij-edge is missing the node pair is uncoupled. An ij-path is a sequence of distinct nodes, where each node pair in the sequence is coupled and i and j are the endpoint nodes. All nodes along a path except for the endpoint nodes are called the inner nodes of the path. An edge is regarded as a path without inner nodes. A subgraph induced by node set a is obtained from a graph with node set V a by removing all nodes and edges outside a. Both, a graph and a path are called directed if all its edges are arrows. Directed graphs can have the following V-configurations, i.e. subgraphs induced by three nodes and having two edges, i t j, i s j, i c j, where the inner node is called a transition (t), a source (s) and a collision node (c), respectively. A directed path is direction-preserving if all its inner nodes are transition nodes. If in a direction-preserving path an arrow starts at node j and points to i, then node j is an ancestor of i, node i a descendant of j, and the ij-path is called a descendant-ancestor path. Node j is an a-line ancestor of node i if all inner nodes in the descendant-ancestor ij-path are in set a. A directed path is an alternating path if the direction of the arrows changes at each inner node. This implies that no inner node is a transition node and that the inner nodes alternate between source and collision nodes. A parent graph is said to be transitive if it contains no transition-oriented V-configuration or, equivalently, if for each node i the set par i of its parents coincides with its set of ancestors. The edge matrix B = zer a A of equation (11), defines an induced graph called the partial ancestor graph with respect to nodes a, denoted by G V.a anc. Its edge matrix is obtained by partial closure applied to rows and columns a of the generating edge matrix A. Equivalently, it results from the parent graph by turning every a-line ancestor into a parent, that is by adding an arrow whenever j is an a-line ancestor of i in G V par. This is why the matrix operator zer a A, which closes all such path in A at once, is called partial closure. For large graphs, edge matrix computations may be carried out within the statistical environment R; see Marchetti (2006). Figure 1(a) shows a parent graph in six nodes, Figure 1(b) its partial ancestor graph with respect to a = {1, 2, 3}, and Figure 1(c) the induced graph for the conditional dependence of Y a given Y b. In this example, (a, b) is an order compatible partition of the node set V, i.e. a mere split of V = (1,...d) into two components without any reordering. For such a split, A ba = B ba = 0 and this implies that P a b = B ab. With a 8

Figure 1: (a) A parent graph G V par in six nodes. (b) Its partial ancestor graph GV.a anc with respect to a = {1,2,3}. (c) The induced graph with edge matrix P a b for the conditional dependence of Y a on Y b where a = {1,2,3} and b = {4,5,6}. transitive parent graph, there would be the additional simplification P a b = A ab for all order compatible partionings of V. If, instead, a is any subset of V so that some arrow in the parent graph points from a node in a to a node in b, then the graph with edge matrix P a b can contain additional edges compared to the graph with edge matrix B ab. These additional edges are due to the following types of path which have some inner nodes and are defined by the second term in equation (12) of the edge matrix P a b. In the partial ancestor graph G V.a anc, there is an active alternating ij-path if there is an edge coupling i and j or there is is an ij-alternating path with some inner nodes such that all its source nodes are in a and all its collision nodes are in b. Inner nodes of a path in a are marginalized over and drawn as crossed out nodes,, those in b are conditioned on and drawn as boxed in nodes, in Figure 2, which shows an active ij-alternating path with four inner nodes. a b Figure 2: An alternating ij-path in G V.a anc with inner nodes and active with respect to a since all its source nodes are in a and all its collision nodes are in b = V \ a. Inner nodes in a, denoted by, are marginalized over; inner nodes in b, denoted by, are conditioned on. Proposition 1. For node set V partitioned with a and b, and having node i in a and node j in b, the graph of conditional dependence of Y a given Y b induced by the parent graph has an ij-arrow if and only if there is an active alternating ij-path in the partial ancestor graph with respect to a. The essence of the proof is the interpretation of the sum of products of the edge matrices in equation (12) which defines the edges present in the induced graph with edge matrix P a b in terms of submatrices of B = zer a A. 9

Proof. From equation (12), there is an ij-one in the edge matrix P a b if and only if B ij = 1 or B ia B T ba (I bb + B ba B T ba ) B bj 1. The first condition, B ij = 1, denotes an ij-arrow pointing from b to a in G V.a anc and (B T ab ) i,j = 1 an arrow pointing from i in a to j in b. Whenever the arrow from i in a to j in b is absent, the second condition defines an active alternating path having some inner nodes. Its interpretation is illustrated with the following scheme in which the inner nodes are shown by their location in either a or b but no longer by an index Edge matrix B ia B T ba (I bb + B ba B T ba ) B bj Path i a a b b տ a ր b... b տ a ր b b j. Each arrow shown in the scheme corresponds to a non-zero off-diagonal entry in a matrix of this sum of edge matrix products. Such an ij-path induces a dependence of Y i on Y j given Y b\j in the relevant family of Gaussian distributions generated over G V par. This Proposition leads to the following path criterion illustrated with Figure 3. As before, α, β and C are three disjoint sets of nodes with C possibly empty and G V.a anc is the partial ancestor graph with a = V \ b, where b is the union of β and C. Criterion 1. If there is no active alternating path between α and β in the partial ancestor graph G V.a anc, then α β C in every joint density generated over the given parent graph. 5 5 5 5 1 3 1 3 1 3 1 3 6 6 6 6 2 4 7 2 4 7 (a) (b) (c) (d) Figure 3: Deciding on implied independencies using Criterion 1. If α = {5} and β = {7}, is α β C implied? (a) The generating parent graph. (b) For C = {3, 4}, the independence is not implied since the alternating path (5,3,6,4,7) in G V.a anc with a = {1,2,5,6} is active, having source node, 6, in a and collision nodes, 3,4, in b. (c) For C = {3,4,6}, the independence is implied since there is no edge for pair 5,7 and the alternating path (5,3,6,4,7) is inactive since its source node, 6, is conditioned on. (d) For C = {2, 3} the independence is not implied, because the alternating path (5,3,6,2,7), in G V.a anc with a = {1,4,5,6}, is active. Figure 3(a) shows a parent graph and how to use this criterion to establish that G V par implies 5 7 {3, 4, 6} (see, Figure 3(c)), but not 5 7 {3, 4} (see, Figure 3(b)) and also not 5 7 {2, 3} (see, Figure 3(d)). 2 4 7 2 4 7 10

4 Equivalence to known separation criteria For a discussion of the criteria available in the literature, to verify whether α β C is implied by a given parent graph G V par with edge matrix A, we take throughout the node set V to be partitioned into M, α, β, C, where only M and C may be empty. One known criterion uses definitions of a d-connecting path and of d-separation, where the letter d is to remind us that the definitions pertain to a directed acyclic graph. In G V par, a path is said to be d-connecting relative to C if along it every collision node is in C or has a descendant in C and every other node is outside C. And, two disjoint sets of nodes α and β are said to be d-separated by C if and only if relative to C there is no d-connecting path between α and β, i.e. between a node in α and a node in β. Criterion 2. (Geiger, Verma and Pearl (1990). If α and β are d-separated by C in the parent graph, then α β C in every joint density generated over the given parent graph. The following Proposition 2 gives a matrix criterion that is equivalent to d-separation and shows that a d-connecting ij-path relative to C implies that Y i is dependent on Y j given Y C in the relevant family of Gaussian distributions generated over G V par while i j C holds in all densities generated over the given parent graph if G V par contains no d-connecting ij-path relative to C. For this, we denote by g = V \ C the union of α, β and M, and by F = zer g A the edge matrix of G V.g anc, the ancestor graph with respect to g induced by A. From Lemma 2 and Lemma 3, by equating a to g and b to C, we know for nodes i and j in g that S gg C = In[F gg (I gg + F T CgF Cg ) F T gg], is the edge matrix induced by A for implied independence of Y i on Y j given Y C. Edges in this type of undirected graph, named the induced covariance graph of Y g given Y C (Wermuth and Cox, 2004), are drawn in Figure 5 as dashed lines. Proposition 2. In the parent graph G V par, sets α and β are d-separated by C if and only if (S gg C ) α,β = S αβ C = 0, where S αβ C is the edge matrix induced by A for Σ αβ C. The interpretation of the edge matrix S αβ C is illustrated with the following scheme Edge matrix F αg (I gg + F T Cg F Cg) F T βg Path α g g ց C ւ g... g ց C ւ g g β. Thus,the condition S αβ C = 0 is seen to mean the absence of any active alternating path in G V.g anc from α to β. 11

Proof. We show that presence of a d-connecting path in G V par relative to C is equivalent to presence of an active alternating graph in G V.g anc. There is an ij-arrow in this partial ancestor graph if and only if in the parent graph, nodes i < j are either coupled or node j is a g-line ancestor of i. If there is a d-connecting path in G V par between α and β relative to C with some inner nodes, then zer g A generates an active alternating path in G V.g anc with at least one inner node as follows. All source nodes and all collision nodes in C are preserved as inner nodes, transition nodes and arrows starting from them are removed. Furthermore, every collision node h outside C along the d-connecting path is replaced by a new collision node h C which is in G V par the first g-line descendant of h in C. Conversely, if there is an active alternating ij-path in G V.g anc then, by this construction, there had been a d-connecting path relative to C in G V par. Figure 4 shows a d-connecting path relative to the conditioning set C = { } and the corresponding active alternating path in G V.g anc with g = V \ C = { }. i 5 9 8 7 6 5 9 j i j 4 3 4 2 (a) 2 1 (b) Figure 4: (a) Example of a d-connecting ij-path in G V par relative to C with inner nodes 5,4,9,8,7,3,6 and (b) the corresponding active alternating ij-path in G V.g anc with inner nodes 5,4,9,2. As before, inner nodes in C are denoted by ; inner nodes in g by. The other known criterion uses an undirected graph called the moral graph of α, β, and C. This moral graph is constructed in three steps. First, obtain the subgraph induced by the union of the nodes α, β, and C and their ancestors. Second, join by a line every uncoupled pair of parents having a common offspring. Third, replace every arrow in the resulting graph by a line. Then, the separation criterion for undirected graphs is used to decide on separation in the parent graph: in an undirected graph set C separates set α from β if every path from a node in α to one in β has a node in C. Criterion 3. (Lauritzen, Dawid, Larsen and Leimer (1990). If α and β are separated by C in the moral graph of α, β, C, then α β C in every joint density generated over the given parent graph. The moral graph of α, β, C coincides with the graph induced by A for the concentration matrix of Y Q, where Q is the union of α, β and C and their ancestors. To see this, let q = V \ M, denote the union of α, β, and C, and r = Q \ q the set of the ancestors of q within M. Then O = V \ Q contains all the remaining nodes from which no path leads into Q so that for V = (O, Q), the linear recursive system generated over 12

the parent graph has ( ) AOO A A = OQ 0 A QQ, Σ QQ.O = A T QQ 1 QQ A QQ. (13) The edge matrix induced for Σ QQ.O is with Lemma 3 given by S QQ.O = In[A T QQ A QQ]. Thus, the induced graph contains an ij-edge if and only if in the parent graph either there is an ij-edge, or A hi A hj = 1 for some node h < i < j in Q, that is nodes i and j have a common offspring in Q. Now, suppose that there is no edge between α and β in the above moral graph. Then, there is no separation with respect to C if and only if there exists a r-line path between α and β. And, all such paths are closed at once by applying partial closure to this edge matrix, i.e. with zer r S QQ.O. As we shall see with the proof of the following Proposition 3, this defines the edge matrix induced by A for Σ qq.m and more generally, for implied independence of Y i and Y j given Y q\{i,j}. We recall that q = V \ M is the union of α, β, and C, and that Q is the union of q and r, the ancestors of q in M. We denote by Z = zer r A QQ the edge matrix of G Q.r anc, the ancestor graph with respect to r induced by A QQ. From (13) we get S qq.m = S qq.r, so that from Lemma 2 and Lemma 3, by equating a to r and b to q, the edge matrix induced for Σ qq.m is S qq.m = In[Zqq(I T qq + Z qr Zqr) T Z qq ]. Edges in this type of undirected graph, named the induced concentration graph of Y q, are drawn in Figure 5 as full lines. Proposition 3. In the moral graph of α, β, C, set α is separated from β by C if and only if (S qq.m ) α,β = S αβ.m = 0, where S αβ.m is the edge matrix induced by A for Σ αβ.m. The interpretation of the edge matrix S αβ.m is illustrated with the following scheme Edge matrix Z T qα (I qq + Z qr Z T qr ) Z qβ Path α q q տ r ր q... q տ r ր q q β. Thus, the condition S αβ.m = 0 is seen to mean the absence of any active alternating ij-path in G Q.r anc from α to β. Proof. We show that the edge matrix S qq.m can equivalently be obtained by alternating paths in G Q.r anc and by closing all r-line paths in the edge matrix of the moral graph for α, β, C. In the subgraph induced by q in the parent graph, the above induced edge matrix S qq.m closes by an ij-edge each of the following V-configurations i r j, i r j, i q j, i r j. All of i r j are closed with Z = zer r A QQ, all of i r j with Z qr Z T qr and all of i q j with (I qq + Z qr Z T qr). All of the fourth type are also closed, since after 13

partial inversion with respect to r, every collision node in r has been replaced by its first descendant in q; see Figure 4 for an illustration. On the other hand, for the induced edge matrix S QQ.O = In[A T QQ A QQ], all V - configurations of the last two types on the right are closed first, since Q is the union of q and r. Then, with zer r S qq.o = In[S qq.o + S qr.o (S rr.o ) S rq.o ] all r-line paths with endpoints within q in the concentration graph of Y Q are closed so that an additional ij-edge arises if and only if there is in the parent graph a V-configuration of the first or of the second type. Thus, with both constructions an identical list of V-configurations is closed by an edge. This implies that partial inversion with respect to r, applied to the edge matrix of the moral graph of α, β, C, gives zer r S qq.o = S qq.m, i.e. the edge matrix of the induced concentration graph of Y q. Our final result establishes the equivalence of the three path-based separation criteria for V partitioned as before into M, α, β, C and explains why proofs of equivalence become complex when they are based only on paths that induce edges in different types of graph. Proposition 4. The following assertions are equivalent. Between α and β there is (i) an active alternating path in the partial ancestor graph with respect to M and α; (ii) a d-connecting path relative to C in the parent graph; (iii) a M-line path in the moral graph of α, β, and C. Proof. By using Propositions 1 to 3, the results follows after noting that P α β.c = 0 S αβ C = 0 S αβ.m = 0. Figure 5 illustrates the result of Proposition 4 for the parent graph of Figure 1(a) with α = {2}, β = {3, 4}, C = {6} and M = {1, 5}. The independence statement 2 {3, 4} 6 is implied by the parent graph, since the subgraph induced by nodes α and β has no edge between α and β in the induced graph with edge matrix P a b in Figure 5(a), with edge matrix S gg C for g = V \ C in Figure 5(b), and with edge matrix S qq.m for q = V \ M in Figure 5(c). Acknowledgement We thank D.R. Cox and K. Sadeghi for helpful comments and SAMSI, Raleigh, for having arranged a topic-related workshop in 2006. The work of the first author was partially supported by MIUR, Rome, under the project PRIN 2005132307, the work of second author by the Swedish Research Society via the Chalmers Stochastic Center and by the Swedish Strategic Fund via the Gothenburg Mathematical Modelling Center. 14

Ma bc Mab abc 2 6 2 2 6 1 5 4 1 4 6 4 3 5 3 3 (a) (b) Figure 5: Graphs induced by the parent graph in Figure 1(a), each of which shows that α β C holds for α = {2}, β = {3,4} and C = {6} by all edges between α and β being missing. The graph induced by A (a) with edge matrix P a b for a = V \ b and b the union of β and C; (b) with edge matrix S gg C for g = V \ C; (c) with edge matrix S qq.m for q = V \ M. A Appendix, Operators and linear independencies A.1 Partial inversion and partial closure Two matrix operators, introduced and studied by Wermuth, Wiedenbeck and Cox (2006), permit stepwise transformations of parameters in linear systems and of edge matrices of independence graphs, respectively. Note that M denotes now a matrix. Let M be a square matrix of dimension d for which all principal submatrices are invertible. For a given integer 1 k d, partial inversion of M with respect to k, transforms M into a matrix N = inv k M of the same dimensions, where for all i, j k, N kk = 1/M kk (14) (c) N ik = M ik /M kk N kj = M kj /M kk N ij = M ij M ik M kj /M kk. Then, the matrix N of structural zeros in N = inv k M that remain after partial inversion of M on k is defined by N = zer k M N kk = 1 (15) N ik = M ik N kj N ij = = M kj { 1 if Mij = 1 or M ik M kj = 1 0 otherwise. Partial inversion of M with respect to a sequence of indices a applies the operator (14) in sequence to all elements of a, and is denoted by inv a M. Similarly, partial closure (15) of M with respect to a is denoted by zer a M and closes all a-line paths in the graph with edge matrix M. Partial inversion of M with respect to all indices V = {1,..., d} gives the inverse of M and partial closure of M with respect to V gives the edge matrix M of the transitive 15

closure of the graph with edge matrix M inv V M = M 1, zer V M = M, see also equation (11). Both operators are commutative, i.e. for three disjoint subsets a, b and c of V inv a inv b M = inv b inv a M, zer a zer b M = zer b zer a M, but partial inversion can be undone while partial closure cannot inv ab inv bc M = inv ac M, zer ab zer bc M = zer abc M. For V = {a, b}, M partially inverted on a coincides with M 1 partially inverted on b inv a M = inv b M 1. (16) Also the partial inverses inv a M and inv b M are, respectively, inv a M = inv b M = ( M 1 aa Maa 1 M ab M bb M ba M 1 ( Maa M ab M 1 bb M ba aa M ab ), M 1 aa M ab M 1 bb ), (17) where the symbol denotes again the entry in a matrix that is symmetric except for the sign. Finally, we give a result for partial inversion of a special matrix product that is used to prove Lemma 2. Lemma 5. For M as before a square matrix, whose principal submatrices are all invertible, M = UKU T with U upper block-triangular and K symmetric and positive definite, H = inv b K ( ) ( ) ( ) inv b (UKU T Uaa 0 U T ) = 0 U T H aa 0 0 Uab U 1 bb 0 U 1 bb + bb 0 ( ) Uaa H aa Uaa T U ab + U aa H ab U bb = Ubb TH, bbu bb and H aa = [inv a K 1 ] aa. Proof. The first two equalities are verified by applying equation (17) and condensing, the last results with property (16) of partial inversion. 16

A.2 Zero partial regression coefficients and independence A Gaussian random vector Y has a non-degenerate distribution if its covariance matrix Σ is positive definite. From Σ partitioned as (a, b, c, d), the conditional covariance matrix Σ ab c, of Y a and Y b given Y c, is obtained by partially inverting Σ with respect to c Σ ab c = Σ ab Σ ac Σ 1 cc Σ cb. The least-squares linear predictor of Y a given Y b is Π a b Y b with Π a b = Σ ab Σ 1 bb. For prediction of Y a with Y b and Y c, the matrix of the least-squares regression coefficients Π a bc is partitioned as Π a bc = ( Π a b.c Π a c.b ) = ( Σ ab c Σ 1 bb c ) Σ ac b Σ 1 cc b, (18) where for example Π a b.c Y b predicts Y a with Y b when both Y a and Y b are adjusted for linear dependence on Y c. Equation (18) is generalized by Π a bc.d = ( ) ( ) Π a b.cd Π a c.bd = Σ ab cd Σ 1 bb cd Σ ac bd Σ 1 cc bd. (19) By property (16) of partial inversion ( Πa b.cd Π a c.bd ) = (Σ aa ) 1 ( Σ ab Σ ac), (20) where Σ aa, Σ ab and Σ ac are submatrices of the concentration matrix Σ 1. From Σ 1, the concentration matrix Σ bc.a, of Y b and Y c after marginalizing over Y a is obtained by partially inverting Σ 1 with respect to a Σ bc.a = Σ bc Σ ba (Σ aa ) 1 Σ ac. A recursive relation for matrices of least-squares regression coefficients generalizes a result due to Cochran (1938) Π a b.cd = Π a b.c Π a d.bc Π d b.c. (21) and is obtained with partial inversion of Σ first with respect to b, c and then to d. Lemma 6. For non-degenerate Gaussian distributions, linear and probabilistic independencies combine equivalently as follows (i) symmetry: Π a b.c = 0 implies Π b a.c = 0 a b c implies b a c; (ii) decomposition: Π a bc.d = 0 implies Π a b.d = 0 a bc d implies a b d; (iii) weak union: Π a bc.d = 0 implies Π a b.cd = 0 a bc d implies a b cd; 17

(iv) contraction: Π a b.c = 0 and Π a d.bc = 0 imply Π a b.cd = 0 a b c and a d bc imply a b cd; (v) intersection: Π a b.cd = 0 and Π a c.bd = 0 imply Π a bc.d = 0 a b cd and a c bd imply a bc d; (vi) composition: Π a c.d = 0 and Π b c.d = 0 imply Π ab c.d = 0 a c d and b c d imply ab c d. Proof. Definition (18) implies that Π a b.c vanishes if and only if Σ ab c = 0 so that (i) results by the symmetry of the conditional covariance matrix. Property (ii) follows by noting that Π a bc.d = 0 is equivalent to the vanishing of both Σ ab d and Σ ac d and thus also of Π a b.d = Σ ab d Σ 1 bb d. Properties (iii) and (v) are direct consequences of (19), while (iv) follows with equation (21). Finally, (vi) is another consequence of the definition (18) and of partitionings as in (6) of matrices of partial regression coefficients. The proof is completed by the equivalence of linear and probabilistics independence statements in Gaussian distributions. References Cochran, W. G. (1938). The omission or addition of an independent variate in multiple linear regression. J. Roy. Statist. Soc. Suppl., 5, 171 176. Dawid, A.P. (1979). Conditional independence in statistical theory (with discussion). J. Roy. Statist. Soc. B 41, 1 31. Dempster, A.P. (1969). Elements of continuous multivariate analysis, Addison-Wesley, Reading, Mass. Geiger, D., Verma, T.S. & Pearl, J. (1990). Identifying independence in Bayesian networks. Networks, 20, 507 534. Lauritzen, S.L., Dawid, A. P., Larsen, B. N. & Leimer, H.-G. (1990). Independence properties of directed Markov fields. Networks, 20, 491 505. Marchetti, G. M. (2006). Independencies induced from a graphical Markov model after marginalization and conditioning: the R package ggm. J. Statist. Software, 15, 6. Smith, J. Q. (1989). Influence diagrams for statistical modelling. Ann. Statist., 17, 654 672. Studený, M. (2005). Probabilistic conditional independence structures. Springer Series in Information Science and Statistics, London. Wermuth, N. & Cox, D. R. (2004). Joint response graphs and separation induced by triangular systems. J. Roy. Statist. Soc. B, 66, 687 717. Wermuth, N. & Cox, D.R. (2007). Distortion of effects. Submitted. Wermuth, N., Wiedenbeck, M. & Cox, D. R. (2006). Partial inversion for linear systems and partial closure of independence graphs. BIT, Num. Math. 46, 883 901. 18