Equivalence in Non-Recursive Structural Equation Models

Size: px

Start display at page:

Download "Equivalence in Non-Recursive Structural Equation Models"

Jody Ball
5 years ago
Views:

1 Equivalence in Non-Recursive Structural Equation Models Thomas Richardson 1 Philosophy Department, Carnegie-Mellon University Pittsburgh, P 15213, US thomas.richardson@andrew.cmu.edu Introduction In the last decade, there has been considerable progress in understanding a certain class of statistical models, known as directed acyclic graph (DG) models, which encode independence, and conditional independence constraints. (See Pearl, 1988). This research has had fruitful results in many areas: there is now a relatively clear causal interpretation of these models, there are efficient procedures for determining the statistical indistinguishability of DG s, reliable algorithms for generating a class of DG models from sample data and background knowledge, etc. Two important elements in these investigations were: First, a purely graphical condition for calculating the conditional independence relations entailed by a DG. Second, a local characterization of equivalence between two graphs, in the sense that all of the same conditional independencies are entailed by each graph. Such a local characterization was essential in allowing the construction of efficient algorithms which could search the whole class of DG models and to find those which fitted the given data. 1 Non-Recursive Structural Equations and Cyclic Graphs 1.1 The limitations of DG models The DG formalism is very general: gamut of more familiar constructs such as recursive linear structural equation models with independent errors, regression models, factor analytic models, path models, and discrete latent variable models can be represented as DG models. However, as you might suspect from the name, directed acyclic graphs, DG models do exclude a kind of model familiar in engineering, economics, and the social sciences: those in which variable influences variable, and at the same time, influences. Economic and physical processes are often modelled by linear systems of this sort; so-called non-recursive structural equation models. nother use for such systems is modelling time series in which feedback is present, as well as structures in which causal influences propagate in different directions in certain subpopulations. 1 I thank P. Spirtes, C. Glymour, R. Scheines and C. Meek for helpful conversations. Research for this paper was supported by the Office of Naval Research through contract number N

2 This paper will give a survey of some recent research aimed at filling this gap; developing a theory of cyclic graphical models, that would allow the generalization of acyclic techniques and methods to the cyclic case. 1.2 Linear Structural Equation Models (linear SEMs) In an SEM, the variables are divided into two disjoint sets, the error terms, and the nonerror terms. ssociated to each non-error variable V there is a unique error term ε V. linear SEM contains a set of equations in which each non-error random variable V is written as a linear function of other non-error random variables and ε V. linear SEM also specifies a joint distribution over the error terms. In our discussion we will consider only linear SEMs with error terms that are jointly independent, but as we shall see, in an important sense, at least within the context of our discussion, nothing is lost by this restriction. The following is an example of such a model: X=ε X Y=ε Y = α 1 X + α 2 + ε = β 1 Y + β 2 + ε The ε V s are jointly independent standard normal error terms. structural equation model in which, for some ordering of the variables, the matrix of coefficients is in lower triangular form, is said to be recursive. 1.3 Graphs There is a directed graph, naturally associated with a given linear SEM, by the rule that there is an edge from X to Y (X Y) if and only if the coefficient of X, in the equation for Y, is non-zero. y convention we do not include error terms in the graph. Hence the graph relating to the model above is (here the error terms are omitted, being assumed jointly independent): X Y linear SEM with a jointly independent distribution over the error terms constitutes a parameterization of its associated graph. It is easy to see that the linear SEM associated with an acyclic graph will be a recursive structural equation model. 1.4 Linear Entailment directed graph containing disjoint sets of variables X, Y, and Z, 2 linearly entails that X is independent of Y given Z if and only if X is independent of Y given Z for all values 2 We use bold face letters (X) to denote sets of variables. -2-

3 of the non-zero linear coefficients and all distributions of the exogenous variables in which they have positive variances and are jointly independent. It is important to note that in any particular SEM with directed graph, there may be conditional independencies which hold even though they are not linearly entailed by. However, if a zero-correlation holds for some but not all parameterizations of, then the set of parameterizations in which this extra conditional independence holds, is of zero Lesbesgue measure over the set of all parameter value assignments to the non-zero linear coefficents. 1.5 Conditional Independencies and Equivalence in a graph. In an acyclic graph, there is a graphical path condition which holds between disjoint vertex sets X, Y and Z in the graph if and only if linearly entails that X Y Z. 3 Similarly, the same graphical path condition holds between X, Y and a set Z, not containing X or Y, if and only if in the partial correlation between X and Y controlling for Z, vanishes: ρ XY.Z = 0. We can calculate the partial correlations that are zero in all linear parametizations of in which X and Y have correlated errors in the following way. First, form a directed graph in which X and Y are the effects of a latent common cause T. The same graphical path condition holds in iff in every parameter assignment to in which X and Y have correlated errors, ρ XY.Z = 0. This observation is central to the usefulness of the graphical method. The task of generalizing this result to the cyclic case has already been accomplished: uilding on the work of Haavelmo(1943), Spirtes (1993) showed that the same graphical condition, the Geiger-Pearl-Verma d-separation criterion (defined in the ppendix) which determines whether a particular conditional independence relation or zero partial correlation is linearly entailed by a recursive structural equation model, can also be used with linear non-recursive models. Or equivalently, that the same technique used for reading conditional independencies from an acyclic graph can be applied in the cyclic case. We say that two graphs are equivalent if they both linearly entail the same set of conditional independencies. It is important to be clear what we are establishing when we work out the conditional independencies entailed by a given model: we are calculating the conditional independence consequences of having a certain form of linear equations, i.e. having linear equations in which certain coefficients are zero. We are not trying to estimate parameters, we are not making any distinction between latent and measured variables, and we are not constructing a model from data; though the development of efficient procedures for determining the equivalence of cyclic models will facilitate the construction of computer aids for model specification and updating. 3 X Y Z means that X is independent of Y given Z. -3-

4 2 Partial Results about Equivalence for Cyclic Graphs lthough there is an O(n 3 ) algorithm which can determine when two acyclic graphs are equivalent in our sense, a feasible algorithm which will establish this equivalence for cyclic graphs has not yet been obtained. Several preliminary results indicate that there is considerable heterogeneity in the equivalence class - much more than in the acyclic case. 2.1 Contrast with the cyclic case In the acyclic case, if and are dependent conditional on any subset of the other variables, then either, or, i.e. either is a direct cause of or is a direct cause of. However, this implication fails in the cyclic case. 4 For example: There are no independencies, conditional or otherwise entailed by this graph, and yet there is no edge between and C. This marks a significant difference between the cyclic and acyclic equivalence classes. C 2.2 Non-locality property In the acyclic case, if two graphs are not equivalent then there will be some conditional independence between variables separated by at most two edges, entailed by one graph, and not by the other. This means that we need only look at the structure of triples of adjacent vertices in order to establish that two graphs are equivalent. This is not true for the cyclic case, as the following example shows: 1 X X X X X X X X and 2 are not equivalent. lthough every conditional independence implied by 2, is also implied by 1, in 1,, while in 2, \. ut and are separated by more than two edges (in both graphs). It is simple to see that we could extend these graphs so that they continued to entail the same conditional independencies, with the exception of, while and were separated by arbitrarily many edges. This result, however leaves open whether a polynomial time algorithm exists for deciding equivalence. 4 similar point is made in Whittaker (1989). -4-

5 2.3 Criteria for detecting feedback: We give a set of conditions which are sufficient, though not necessary conditions for cyclicity. The following four conditional independence conditions can only be linearly entailed (simultaneously) by a cyclic graph. For some triple of vertices, the following hold: (i) For any set S, not containing or, \ S (ii) For any set S, not containing or C, \ C S (iii) There is a set T, with T and C T (iv) There is a set T *, with T * and C T *. s an illustration of this criterion, the following model, presented by Whittaker (1989), is not equivalent to any acyclic graph: X Here <,X,> and <,Y,> satisfy the conditions given above. Y 2.4 The Orientation of cycles Our last and possibly most interesting result says that given a graph with a cycle, there is an equivalent graph, in which is replaced by another cycle, having the opposite orientation to. Thus if is clockwise, is anti-clockwise, and vice-versa. C C E E 1 D 2 D In the above example, in 1, the cycle <C,,D,E> has anti-clockwise orientation, while in the equivalent graph 2, the corresponding cycle <C,,D,E> has clockwise orientation. One important consequence of this result is that it is not possible to orient a cycle merely using conditional independence information. ppendix Definition of d-separation: If there is an arrow from to or from to is called an edge between and. Given three vertices, and C such that there is an edge between and, and between and C, then if the edges collide at, (i.e. C) then we say is a -5-

6 collider between and C, relative to these edges. Otherwise we will say that is a noncollider between and C, relative to these edges. e.g. in the following cases is a noncollider: C, C, Β C. If there is an arrow from to ( ), then we say that is a parent of, and is a child of. We define the descendant relation as the transitive reflexive closure of child, and similarly, ancestor as the transitive reflexive closure of parent. sequence of distinct edges <E 1,,E n > in is an undirected path if and only if there exists a sequences of vertices <V 1, V n+1 > such that for 1 i n either <V i+1,v i >=E i or <V i,v i+1 >=E i. We are now in a position to define d-connection: For disjoint sets, X, Y and Z, X is d-connected to Y given Z if for some X X, and Y Y, there is a path from X to Y, satisfying the following conditions: (i) if, and C are adjacent vertices on the path, and Ζ, then is a collider between and C. (ii) If is a collider between and C, then there is a descendant D, of, and D Ζ. If X and Y are not d-connected given Z then X and Y are said to be d-separated by Z. The following important theorems gives the relationship between d-separation and the linear entailment of conditional independencies, and partial correlations. It was proved for the cyclic case by Spirtes (1993). Theorem (Spirtes): In a (cyclic or acyclic) graph, for disjoint sets, X, Y, Z, X and Y are d-separated given Z, if and only if linearly entails X Y Z. Theorem (Spirtes): In a (cyclic or acyclic) graph, for any set Z, not containing X or Y, X and Y are d-separated given Z, if and only if linearly entails ρ XY.Z =

7 REFERENCES: COX, D.R., and WERMUTH, N. (1993) Linear dependencies represented by chain graphs. In Statistical Science, 1993, 8 No.3, GEIGER, D. (1990). Graphoids: a qualitative framework for probabilistic inference. PhD dissertation, Univ. California, Los ngeles. HVELMO,T.(1943). The statistical implications of a system of simultaneous equations. Econometrica, 11, HEISE D.(1975). Causal nalysis. Wiley, New York. KIIVERI, H. and SPEED, T.P. (1982). Structural nalysis of multivariate data: review. In Sociological Methodology, 1982 (S. Leinhardt, ed.) Jossey ass, San Francisco. PERL, J. (1988) Probabilistic Reasoning in Intelligent Systems. Morgan Kaufman, San Mateo, C. SPIRTES, P. (1993) Directed Cyclic graphs, Conditional Independence and Non- Recursive Linear Structural Equation Models. Philosophy, Methodology and Logic Technical Report 35, Carnegie Mellon University SPIRTES, P., GLYMOUR C. and SCHEINES R. (1993), Causation, Prediction and Search. Lecture Notes in Statistics, Springer-Verlag. WHITTKER, J. (1989) Graphical Models in pplied Multivariate Statistics, Wiley. -7-

Abstract. Three Methods and Their Limitations. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables

Abstract. Three Methods and Their Limitations. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables N-1 Experiments Suffice to Determine the Causal Relations Among N Variables Frederick Eberhardt Clark Glymour 1 Richard Scheines Carnegie Mellon University Abstract By combining experimental interventions