The Inflation Technique for Causal Inference with Latent Variables

Similar documents
Uniformly Additive Entropic Formulas

Undirected Graphical Models: Markov Random Fields

Exhibition of Monogamy Relations between Entropic Non-contextuality Inequalities

Lecture 19 October 28, 2015

How to Use Undiscovered Information Inequalities: Direct Applications of the Copy Lemma

Symmetries in the Entropy Space

Name: Block: Unit 2 Inequalities

NP-problems continued

Inferring latent structures via information inequalities

Entropic No-Disturbance as a Physical Principle

Formal definition of P

The number 6. Gabriel Coutinho 2013

COS597D: Information Theory in Computer Science September 21, Lecture 2

Causal Models with Hidden Variables

Facets for Node-Capacitated Multicut Polytopes from Path-Block Cycles with Two Common Nodes

Induced Cycles of Fixed Length

Independence in Function Graphs

Relations Graphical View

Combinatorial Types of Tropical Eigenvector

Reverse mathematics and marriage problems with unique solutions

Physics 239/139 Spring 2018 Assignment 3 Solutions

Phylogenetic Networks with Recombination

Exploiting Symmetry in Computing Polyhedral Bounds on Network Coding Rate Regions

Notes. Relations. Introduction. Notes. Relations. Notes. Definition. Example. Slides by Christopher M. Bourke Instructor: Berthe Y.

The Cauchy-Schwarz inequality

Introduction to Causal Calculus

Tree sets. Reinhard Diestel

NP-problems continued

Enumeration of domino tilings of a double Aztec rectangle

KODAIRA DIMENSION OF LEFSCHETZ FIBRATIONS OVER TORI

On Weak Chromatic Polynomials of Mixed Graphs

Finite affine planes in projective spaces

Entropy Vectors and Network Information Theory

Markov properties for directed graphs

Planar Ramsey Numbers for Small Graphs

Probabilistic Graphical Models

COS597D: Information Theory in Computer Science October 5, Lecture 6

Quantum correlations and group C -algebras

CSE 311: Foundations of Computing. Lecture 23: Finite State Machine Minimization & NFAs

. Relationships Among Bounds for the Region of Entropic Vectors in Four Variables

Extremal Entropy: Information Geometry, Numerical Entropy Mapping, and Machine Learning Application of Associated Conditional Independences.

A Phylogenetic Network Construction due to Constrained Recombination

Notes on Freiman s Theorem

Asymptotically optimal induced universal graphs

FIRST ORDER SENTENCES ON G(n, p), ZERO-ONE LAWS, ALMOST SURE AND COMPLETE THEORIES ON SPARSE RANDOM GRAPHS

Conditional Information Inequalities for Entropic and Almost Entropic Points

4.1 Notation and probability review

CHAPTER 3: THE INTEGERS Z

Network Combination Operations Preserving the Sufficiency of Linear Network Codes

B490 Mining the Big Data

Group Actions Definition. Let G be a group, and let X be a set. A left action of G on X is a function θ : G X X satisfying:

CUMBERLAND COUNTY SCHOOL DISTRICT BENCHMARK ASSESSMENT CURRICULUM PACING GUIDE

Symmetry properties of generalized graph truncations

Regular actions of groups and inverse semigroups on combinatorial structures

Received: 1 September 2018; Accepted: 10 October 2018; Published: 12 October 2018

Generalized Measurement Models

Perfect matchings in highly cyclically connected regular graphs

Latin squares: Equivalents and equivalence

THE TRIANGULAR THEOREM OF THE PRIMES : BINARY QUADRATIC FORMS AND PRIMITIVE PYTHAGOREAN TRIPLES

Some Basic Concepts of Probability and Information Theory: Pt. 2

Math 105A HW 1 Solutions

Efficient Approximation for Restricted Biclique Cover Problems

The chromatic covering number of a graph

Non-isomorphic Distribution Supports for Calculating Entropic Vectors

Asymptotically optimal induced universal graphs

Beyond Log-Supermodularity: Lower Bounds and the Bethe Partition Function

Data Mining 2018 Bayesian Networks (1)

Some hard families of parameterised counting problems

1 Take-home exam and final exam study guide

Independencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks)

Bounding the Entropic Region via Information Geometry

MATH 433 Applied Algebra Lecture 22: Review for Exam 2.

Lecture 8: Equivalence Relations

Substitutions, Rauzy fractals and Tilings

Design Theory Notes 1:

Graph Transformations T1 and T2

Symmetry Groups 11/19/2017. Problem 1. Let S n be the set of all possible shuffles of n cards. 2. How many ways can we shuffle n cards?

1 The Cantor Set and the Devil s Staircase

Distinguishing Cartesian powers of graphs

Network Coding on Directed Acyclic Graphs

Lecture 4 October 18th

THE COMPLEXITY OF DISSOCIATION SET PROBLEMS IN GRAPHS. 1. Introduction

RECOGNIZING WEIGHTED DIRECTED CARTESIAN GRAPH BUNDLES

Cheng Soon Ong & Christian Walder. Canberra February June 2018

The Matching Polytope: General graphs

Linear Index Codes are Optimal Up To Five Nodes

Combinatorial and physical content of Kirchhoff polynomials

The Capacity of a Network

Prof. Dr. Lars Schmidt-Thieme, L. B. Marinho, K. Buza Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course

Foundations of Mathematics

Finite and Algorithmic Model Theory II: Automata-Based Methods

Matrix representations and independencies in directed acyclic graphs

Leaf Management. Jeffry L. Hirst. December 23, 2018

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

An Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees

Entanglement Inequalities

A new parametrization for binary hidden Markov modes

2MA105 Algebraic Structures I

Undirected Graphical Models

Entropy inequalities and linear rank inequalities

Transcription:

The Inflation Technique for Causal Inference with Latent Variables arxiv:1609.00672 (Elie Wolfe, Robert W. Spekkens, Tobias Fritz) September 2016

Introduction Given some correlations between the vocabulary of three languages, under what conditions does it follow that all three must have a common ancestor, as opposed to at most every two of them can have a common ancestor? Diagrammatically: A X Y C Z B

An example More generally: given a joint distribution P ABC, can it arise from the Triangle graph? A X Y C Z B Example Binary variables with perfect correlation: P ABC = [000] + [111] 2

To solve the example problem, let s consider a slightly different graph: A 2 C 1 B 1 Y 2 X 1 Z 1 Y 1 Assuming that the causal dependences are as in the original A X Y C Z B some marginals coincide: P A2 C 1 = P AC and P C1 B 1 = P CB. But: A 2 and C 1 are independent, P A2 C 1 = P A P C.

Hence A 2 and C 1 are perfectly correlated, as are C 1 and B 1, while A 2 and B 1 are independent. There is no joint distribution with these properties! P ABC is incompatible with the Triangle graph. We can make this quantitative by deriving a causal compatibility inequality. The existence of a joint distribution implies a constraint on marginal distributions, A 2 C 1 + C 1 B 1 1 + A 2 B 1. Write the marginals in terms of the distribution on the Triangle to show: Theorem Every P ABC with {±1}-valued variables compatible with the Triangle graph satisfies AC + BC 1 + A B.

Another example Is P ABC = [001] + [010] + [100] 2 compatible with the Triangle graph? Let s consider the Spiral inflation graph, where having the same causal dependences implies: A 2 P A1 B 1 C 1 = P ABC P A1 B 2 C 2 = P AB P C X 2 X 1 A 1 Y 1 Y 2 P A2 B 1 C 2 = P BC P A P A2 B 2 C 1 = P AC P B P A2 B 2 C 2 = P A P B P C. C 1 B 1 C 2 Z 1 B 2 Z 2

These marginals are such that A 2 = B 2 = C 2 = 1 has positive probability. Whenever this event happens, also one of the following must happen: A 1 = B 2 = C 2 = 1, A 2 = B 1 = C 2 = 1, A 2 = B 2 = C 1 = 1, A 1 = B 1 = C 1 = 0. However, all of these have probability zero! There is no joint distribution for all six variables that reproduces these marginals. The original distribution P ABC is not compatible with the Triangle graph.

Again one can make this inference quantitative by deriving an inequality. At the level of the Spiral inflation, the union bound implies that P A2 B 2 C 2 (111) P A1 B 2 C 2 (111) + P A2 B 1 C 2 (111) + P A2 B 2 C 1 (111) + P A1 B 1 C 1 (000) in every joint distribution. The above equations for the marginals translate this into P A (1)P B (1)P C (1) P AB (11)P C (1) + P BC (11)P A (1) + P AC (11)P B (1) + P ABC (000), which is another causal compatibility inequality for the Triangle graph.

The inflation technique So what is the general method? Definition Given a graph G, an inflation graph is a graph G together with a graph map π : G G such that its restriction to the ancestry of any node is a bijection. An equivalent condition is that G G must be a fibration, i.e. that every edge in G with a lift of its target to G lifts uniquely to G. In our figures, we specify the map by labelling each node in G by the label of its image in G and a copy index.

Every causal model on G inflates to a causal model on G. Definition A set of nodes U G is injectable if π U is bijective. The distribution on an injectable set in an inflation model is specified by the corresponding marginal distribution on G. Lemma If a distribution on observable nodes of G is compatible with G, then the associated family of distributions on injectable sets is compatible with G. This is the central observation that makes the inflation technique work.

Thus we can apply any method for causal inference on G, and the inflation technique translates it to causal inference on G. This can amplify the power of causal inference methods significantly. In the earlier examples, we have only used two things at the level of G, The existence of a joint distribution, Sets of nodes with disjoint ancestry are independent. For G, we obtain inequalities that do not just follow from the same requirements at the level of G! The sets that are relevant for applying the inflation technique in conjuction with the above two constraints on G are: Definition A set of nodes U G is pre-injectable if it is the union of injectable sets with disjoint ancestry.

The marginal problem When applying the inflation technique like this, we need to figure out when the resulting family of marginal distributions on pre-injectable sets can arise from a joint distribution. This is the marginal problem. The families of distributions that arise in this way form the marginal polytope. Its facet inequalities are what we turn into causal compatibility inequalities for G. Thus the computational problems are those of linear programming and facet enumeration for the marginal polytope.

Results Facet enumeration for the marginal polytope of the Spiral inflation with binary variables results in 37 symmetry classes of nontrivial inequalities: (#1*): 0 1 + B C + AB + AC (#2): 0 2 + A B C - C AB - 2 AC (#3*): 0 3 + A - B + C + B C + A B C + AB - C AB + 3 AC - B AC (#4*): 0 3 + A - B + C + B C - A B C + AB + C AB + 3 AC - B AC (#5): 0 3 + B - A B + A C + A B C + AB + C AB - B AC - 2 BC (#6): 0 3 + B + A C + A B C - C AB - 2 AC - B AC - 2 BC (#7): 0 3 + B + A B - A C - A B C + AB + C AB + B AC - 2 BC (#8*): 0 3 + A + B + A B + C + A C + B C - A B C + 2 AB + C AB + 2 AC + B AC + 2 BC + A BC - ABC (#9*): 0 3 + A + B - A B + C + A C + B C - A B C + C AB + 2 AC + B AC - 2 BC - A BC + ABC (#10 * ): 0 4 + 2 A B + 2 C + 2 B C - 2 AB + C AB - 2 AC + B AC + A BC - ABC (#11 * ): 0 4-2 B + B C + A B C - 2 AB + C AB - B AC - 3 BC + ABC (#12 * ): 0 4-2 B + 2 A C + B C - A B C - 2 AB + C AB - 2 AC + B AC - 3 BC + ABC (#13 * ): 0 4-2 A B - 2 A C - B C - A B C + 2 AB - C AB - 2 AC + B AC + BC + ABC (#14 * ): 0 4-2 A B + 2 A C - B C + A B C + 2 AB + C AB - 2 AC + B AC + BC + ABC (#15 * ): 0 4 + 2 A C + B C + A B C - C AB - 2 AC - B AC + 3 BC + ABC (#16 * ): 0 4-2 B + 2 A C - 2 AB + C AB - 2 AC + B AC - 2 BC - A BC + ABC (#17 * ): 0 4 + 2 A B + 2 A C + 2 B C - 2 AB + C AB - 2 AC + B AC - 2 BC + A BC + ABC (#18): 0 5 + A + B - 2 A B + C + B C + A B C + 3 AB + C AB + AC - B AC - 4 BC (#19 * ): 0 5 + A + B + 2 A B + C - 2 A C + B C - A B C + 3 AB + C AB - AC + B AC - 4 BC (#20 * ): 0 5 + A - B - 2 A B + C - A C + B C + AB + C AB + 2 AC - 2 B AC - 2 BC - 2 A BC - 2 ABC (#21 * ): 0 5 + A + B + C - A C - B C - 2 A B C + AB + 2 C AB + 2 AC + B AC - 2 BC + A BC - ABC (#22 * ): 0 5 - A + B - 2 A B + C - 2 A C + 2 B C + AB - 2 C AB + AC - 2 B AC - BC - 2 A BC + ABC (#23 * ): 0 5 + A + B - A B + C + 2 B C + A B C + 2 AB - C AB + AC - 2 B AC - BC - 2 A BC + ABC (#24 * ): 0 5 + A + B - 2 A B + C - A C - B C - 2 A B C - AB + 2 C AB + 2 AC + B AC + 2 BC - A BC + ABC (#25): 0 6 + 2 A B + A C + 2 B C + A B C - 4 AB - 2 C AB - 3 AC - B AC - 2 A BC (#26): 0 6-2 A + A B + 2 C + A C + 2 A B C - 5 AB - C AB - 3 AC + B AC - 2 A BC (#27): 0 6 + 2 A B + 2 C + A C + A B C - 4 AB - 2 C AB + 3 AC + B AC - 2 A BC (#28): 0 6 + A B + A C - 4 B C - 2 A B C + AB + C AB - 3 AC - B AC + 2 BC - 2 A BC (#29): 0 6 + 2 B + A B - 2 A C + B C - 2 A B C + 3 AB + C AB + 2 B AC - 5 BC + A BC (#30): 0 6 + 2 B - 2 A B + 4 A C - B C + A B C + 2 AB + 2 C AB - 2 AC + 2 B AC + BC + A BC (#31): 0 6 + A C + 4 B C + A B C - 2 AB - 2 C AB - 3 AC - B AC - 2 BC - 2 ABC (#32): 0 7 + A + B + A B + C - 2 A C + 2 B C - A B C + 2 AB + 3 C AB + AC + 2 B AC - 3 BC - 2 A BC + 3 ABC (#33): 0 8 + 2 A B + 4 A C - 2 B C + 2 A B C - 2 AB - C AB - 4 AC + B AC - 2 BC - 3 A BC - 3 ABC (#34): 0 8 + 2 A - 2 C - A C + 2 B C + 3 A B C - 6 AB + C AB + AC + 2 B AC - 3 A BC + ABC (#35): 0 8 + 2 A + A C + 2 B C + 3 A B C + 6 AB - C AB + AC - 2 B AC - 2 BC - 3 A BC + ABC (#36): 0 8-2 B + 2 A B - 2 C - B C - 3 A B C + 3 C AB - 6 AC + B AC + BC - 2 A BC + ABC (#37): 0 8 + 2 B + A B - 2 A C - 3 A B C + AB + 2 C AB + 2 AC + 3 B AC - 6 BC - A BC + ABC

Prospects Derive tighter constraints by using more than just ancestral independences on the inflation graph? Possibilities: conditional independences, coinciding distributions on isomorphic subgraphs. Necessary and sufficient constraints? Considering larger and larger inflation graphs should allow for reconstruction of the dependences on the hidden variables. Investigate the link to entropic inequalities!

Entropic inequalities The laws of information theory (Yeung): Submodularity, H(AB) + H(BC) H(B) + H(ABC). Non-Shannon-type inequalities, such as the Zhang Yeung inequality, 3H(AC) + 3H(AD) + H(BC) + H(BD) + 3H(CD) 4H(ACD) + H(BCD) + H(AB) + H(A) + 2H(C) + 2H(D), which are not consequences of submodularity. These are the inequalities that bound the entropy cone. Finding a complete list of non-shannon-type inequalities is an open problem.

The derivation of the known non-shannon-type inequalities relies on the copy lemma, which secretly is an application of the inflation technique: A C A 1 C 1 A 2 X X 1 X 2 B B 1 Problem Is it possible to derive new non-shannon-type inequalities by considering other inflation graphs?