Tutorial: Gaussian conditional independence and graphical models. Thomas Kahle Otto-von-Guericke Universität Magdeburg

Similar documents
Total binomial decomposition (TBD) Thomas Kahle Otto-von-Guericke Universität Magdeburg

Section 2: Classes of Sets

Undirected Graphical Models

arxiv: v2 [stat.ml] 21 Sep 2009

Likelihood Analysis of Gaussian Graphical Models

Total positivity in Markov structures

PRIMARY DECOMPOSITION FOR THE INTERSECTION AXIOM

Stat 451: Solutions to Assignment #1

Acyclic Digraphs arising from Complete Intersections

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

MATH 829: Introduction to Data Mining and Analysis Graphical Models I

Markov properties for undirected graphs

Chris Bishop s PRML Ch. 8: Graphical Models

Lecture 4 October 18th

Geometry of Gaussoids

Measures and Measure Spaces

Probabilistic Graphical Models (I)

Faithfulness of Probability Distributions and Graphs

CPSC 540: Machine Learning

CSC 412 (Lecture 4): Undirected Graphical Models

Toric Fiber Products

Decomposable Graphical Gaussian Models

REGULI AND APPLICATIONS, ZARANKIEWICZ PROBLEM

Markov properties for directed graphs

ACYCLIC DIGRAPHS GIVING RISE TO COMPLETE INTERSECTIONS

Graphical Models and Independence Models

The Maximum Likelihood Threshold of a Graph

Toric ideals finitely generated up to symmetry

18.175: Lecture 2 Extension theorems, random variables, distributions

2. Metric Spaces. 2.1 Definitions etc.

Algebraic Representations of Gaussian Markov Combinations

Independencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks)

Causality in Econometrics (3)

Chapter 3. Betweenness (ordering) A system satisfying the incidence and betweenness axioms is an ordered incidence plane (p. 118).

Notes on the Matrix-Tree theorem and Cayley s tree enumerator

Algebraic Statistics Tutorial I

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

Open Problems in Algebraic Statistics

Conditional Independence and Markov Properties

The intersection axiom of

β α : α β We say that is symmetric if. One special symmetric subset is the diagonal

Learning from Sensor Data: Set II. Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University

Markov properties for mixed graphs

Relations Graphical View

Probabilistic Graphical Models. Rudolf Kruse, Alexander Dockhorn Bayesian Networks 153

Decomposable and Directed Graphical Gaussian Models

MATH 829: Introduction to Data Mining and Analysis Clustering II

9.1 Eigenvectors and Eigenvalues of a Linear Map

arxiv: v3 [math.st] 7 Dec 2017

Supplementary Material for MTH 299 Online Edition

Chapter 1: Systems of Linear Equations and Matrices

MATH 51H Section 4. October 16, Recall what it means for a function between metric spaces to be continuous:

STAT 7032 Probability Spring Wlodek Bryc

Preliminaries Bayesian Networks Graphoid Axioms d-separation Wrap-up. Bayesian Networks. Brandon Malone

Classification of semisimple Lie algebras

Markov properties for undirected graphs

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

The doubly negative matrix completion problem

Multivariate Gaussians. Sargur Srihari

4 Elementary matrices, continued

Applications of the Lopsided Lovász Local Lemma Regarding Hypergraphs

Real Analysis Notes. Thomas Goller

Lebesgue measure on R is just one of many important measures in mathematics. In these notes we introduce the general framework for measures.

Locating the eigenvalues for graphs of small clique-width

SUMS PROBLEM COMPETITION, 2000

Identifiability assumptions for directed graphical models with feedback

Independence, Decomposability and functions which take values into an Abelian Group

Lecture Summaries for Linear Algebra M51A

Recitation 2: Probability

Homework 5 M 373K Mark Lindberg and Travis Schedler

STA 4273H: Statistical Machine Learning

* 8 Groups, with Appendix containing Rings and Fields.

Introduction to Probability and Statistics (Continued)

Matrix representations and independencies in directed acyclic graphs

FORBIDDEN MINORS FOR THE CLASS OF GRAPHS G WITH ξ(g) 2. July 25, 2006

Notes. Relations. Introduction. Notes. Relations. Notes. Definition. Example. Slides by Christopher M. Bourke Instructor: Berthe Y.

L03. PROBABILITY REVIEW II COVARIANCE PROJECTION. NA568 Mobile Robotics: Methods & Algorithms

Review: Directed Models (Bayes Nets)

arxiv: v1 [math.st] 6 Jun 2016

Fisher Information in Gaussian Graphical Models

Commuting birth-and-death processes

Capturing Independence Graphically; Undirected Graphs

Probabilistic Reasoning: Graphical Models

Algebraic Classification of Small Bayesian Networks

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

g 2 (x) (1/3)M 1 = (1/3)(2/3)M.

INFINITY: CARDINAL NUMBERS

CMPE 58K Bayesian Statistics and Machine Learning Lecture 5

MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018

Prof. Dr. Lars Schmidt-Thieme, L. B. Marinho, K. Buza Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course

BOOLEAN ALGEBRA INTRODUCTION SUBSETS

MATH 326: RINGS AND MODULES STEFAN GILLE

WUCT121. Discrete Mathematics. Logic. Tutorial Exercises

CHAPTER 1 SETS AND EVENTS

Markov properties for graphical time series models

MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models

1 Commutative Rings with Identity

LECTURE 10: THE ATIYAH-GUILLEMIN-STERNBERG CONVEXITY THEOREM

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)

Transcription:

Tutorial: Gaussian conditional independence and graphical models Thomas Kahle Otto-von-Guericke Universität Magdeburg

The central dogma of algebraic statistics Statistical models are varieties

The central dogma of algebraic statistics Statistical models are varieties Today Demonstrate algebraic approaches to conditional independence For Gaussian vectors X = (X 1,..., X m ) with values in R m. Source: Seth Sullivant s book manuscript Algebraic Statistics.

The density A random vector X = (X 1,..., X m ) has a Gaussian (or normal) distribution if its density with respect to the Lebesgue measure is { 1 f(x) = (2π) m/2 exp 1 } det Σ1/2 2 (x µ)t Σ 1 (x µ) for some µ R m and Σ PD m positive definite. Density wrt Lebesgue measure means Prob(X A) = f(x)dx µ is the mean Σ 1 the concentration matrix, and Σ the covariance matrix. A A R m

Marginals Let A [m] and X = (X 1,..., X m ) a Gaussian random vector. The marginal density f A (x A ) of X A = (X i ) i A is defined by f A (x A ) = f(x A, x [m]\a )dx [m]\a R [m]\a The marginal X A of a Gaussian X is itself Gaussian with mean µ A = (µ i ) i A and covariance Σ A A = (Σ ij ) i,j A. Independence Let A, B [m] be disjoint. X A is independent of X B (A B ), if f A B (x A, x B ) = f A (x A )f B (x B ) This happens if and only if Σ A B = 0.

Example Independence X 1 = delay of your flight to Atlanta, X 2 = delay of my flight to Atlanta. With no further information, a reasonable first assumption: X 1 X 2.

Example Independence X 1 = delay of your flight to Atlanta, X 2 = delay of my flight to Atlanta. With no further information, a reasonable first assumption: X 1 X 2. or maybe not? Assume our day of arrival sees a lot of rain (a variable X 3 takes high value). X 1 and X 2 show correlation (e.g. both more likely delayed) This correlation is explained by X 3 Conditionally on X 3 being large, X 1 and X 2 are still independent. Capture this by dividing by the marginal density of X 3.

Conditionals Let A, B [m] be disjoint. For each fixed x B R B, the conditional density f A B (x A, x B ) of A given X B = x B is defined by f A B (x A, x B ) = f A B(x A, x B ) f B (x B ) The conditional density of a Gaussian is Gaussian with mean and covariance µ A + Σ A B Σ 1 B B (x B µ B ) Σ A A Σ A B Σ 1 B B Σ B A.

Definition Let A, B, C [m] be pairwise disjoint and f be a Gaussian density. A is conditionally independent of B given C, written A B C if for all x A R A, x B R B, x C R C f AB C (x A, x B, x C ) = f A C (x A, x C )f B C (x B, x C ).

Definition Let A, B, C [m] be pairwise disjoint and f be a Gaussian density. A is conditionally independent of B given C, written A B C if for all x A R A, x B R B, x C R C f AB C (x A, x B, x C ) = f A C (x A, x C )f B C (x B, x C ). Convention: omit, i.e. AC = A C and so on.

Definition Let A, B, C [m] be pairwise disjoint and f be a Gaussian density. A is conditionally independent of B given C, written A B C if for all x A R A, x B R B, x C R C f AB C (x A, x B, x C ) = f A C (x A, x C )f B C (x B, x C ). Convention: omit, i.e. AC = A C and so on. Proposition A B C if and only if rk Σ AC BC = C.

A B C if and only if rk Σ AC BC = C. Proof Conditional distribution of X AB given X C = x c has covariance Σ AB AB Σ AB C Σ 1 C C Σ C AB Conditional independence happens if A B submatrix vanishes: S = Σ A B Σ A C Σ 1 C C Σ C B = 0 This matrix is the Schur complement in ( ) ΣA B Σ Σ AC BC = A C Σ C B Σ C C. ( ) S ΣA C. 0 Σ C C. (subtract right column times Σ 1 C C Σ C B from left column)

If you don t like densities, this can be your starting point Definition Let A, B, C [m] be pw. disjoint. The corresponding conditional independence (CI) ideal is I A B C = ( C + 1) minors of Σ AC BC The conditional independence model is M A B C = V (I A B C ) PD m. (note: this is a semi-algebraic set) Our goal: Study Gaussian conditional independence using conditional independence ideals

Proposition ( CI Axioms ) 1 A B C B A C (symmetry) 2 A B D C A B C (decomposition) 3 A B D C A B C D (weak union) 4 A B C D and A D C A B D C (contraction) Proof Proof for Gaussians is exercise in linear algebra. Can be proven for general (non-gaussian) densities

Special properties of Gaussian conditional independence The intersection axiom A B C D and A C B D A B C D holds for all strictly positive densities Gaussoid axiom A B {c} D and A B D A B {c} D or A {c} B D holds for Gaussians.

Special properties of Gaussian conditional independence The intersection axiom A B C D and A C B D A B C D holds for all strictly positive densities Gaussoid axiom A B {c} D and A B D A B {c} D or A {c} B D holds for Gaussians. Why are these lemmas called axioms? Q: Is there a finite axiomatization of Gaussian CI?

Conjunctions of CI statements Want to answer questions like: Given a density satisfies a set C = {A 1 B 1 C 1,... A n B n C n } of CI statements, what other properties does it have? Algebraic approach The covariances that satisfy C: M C = PD m V (I A1 B 1 C 1 ) V (I An B n C n ) Approach: Compute primary decomposition (or minimal primes) of I C = I A1 B 1 C 1 + + I An B n C n

Example Let s study the contraction property algebraically: A B C D and A D C A B D C With m = 3, A = {1}, B = {2}, C =, D = {3} we get C = {1 2 3, 1 3} Macaulay2.

Example Let s study the contraction property algebraically: A B C D and A D C A B D C With m = 3, A = {1}, B = {2}, C =, D = {3} we get C = {1 2 3, 1 3} Macaulay2. Primary decomposition has two components: V (I C ) = V (Σ 12, Σ 13 ) V (Σ 13, Σ 33 ). The second component does not intersect PD 3 The first component is the desired conclusion 1 {2, 3}

Success story (Sullivant, 2009) For n 4, consider the cyclic set of CI statements C = {1 2 3,..., n 1 n 1, n 1 2 } (Binomial) primary decomposition yields I C has two minimal primes Σ 12, Σ 23,..., Σ n1 corresponding to 1 2, 2 3,..., n 1 The toric ideal I C : ( ij Σ ij) whose variety does not contain positive definite matrices. No subset of C implies the marginal independencies.

Success story (Sullivant, 2009) For n 4, consider the cyclic set of CI statements C = {1 2 3,..., n 1 n 1, n 1 2 } (Binomial) primary decomposition yields I C has two minimal primes Σ 12, Σ 23,..., Σ n1 corresponding to 1 2, 2 3,..., n 1 The toric ideal I C : ( ij Σ ij) whose variety does not contain positive definite matrices. No subset of C implies the marginal independencies. Gaussian conditional independence cannot be finitely axiomatized.

A good source for CI ideals + problems: graphical models.

A good source for CI ideals + problems: graphical models. Graphical models Let G be a graph, either directed, undirected, or mixed, whose vertices are random variables, and edges represent dependency. A graphical model assigns a set of covariance matrices to G Use separation in the graph to define conditional independence Use connection in the graph to parametrize

Simplest example As the simplest example, consider an undirected graph G = (V, E). The pairwise Markov property of G postulates that v w V \ {v, w} for every non-edge (v, w) / E. The global Markov property of G postulates A B C whenever C separates A and B in G. Theorem Both Markov properties yield the same set of covariance matrices and this set is characterized by Σ 1 ij = 0 whenever (i, j) / E (which yields determinantal constraints on Σ by Kramer s rule).

For DAGs, there is are natural parametrization Let D be DAG (acyclic directed graph) on [m] (top. ordered). Postulate structural equations X j = λ ij X i + ɛ j, i pa(j) j [m], where ɛ j is Gaussian with variance φ j, and λ ij R. Then X = (X 1,..., X m ) is Gaussian with covariance Σ = (I Λ) T Φ(I Λ) 1 where Φ = diag(φ 1,..., φ m ) consists of the variances of X 1,..., X m, and Λ is upper triangular with entries λ ij and ones on the diagonal. The DAGical model consist of all such covariance matrices.

For DAGs, there is are natural parametrization Let D be DAG (acyclic directed graph) on [m] (top. ordered). Postulate structural equations X j = λ ij X i + ɛ j, i pa(j) j [m], where ɛ j is Gaussian with variance φ j, and λ ij R. Then X = (X 1,..., X m ) is Gaussian with covariance Σ = (I Λ) T Φ(I Λ) 1 where Φ = diag(φ 1,..., φ m ) consists of the variances of X 1,..., X m, and Λ is upper triangular with entries λ ij and ones on the diagonal. The DAGical model consist of all such covariance matrices. Question: What is the vanishing ideal in R[Σ]?

Separation gives valid conditional independence constraints A,B are d-separated by C if every path from A to B either contains a collider v... where neither v nor any descendent of v are contained in C contains a blocked vertex v C with v... Theorem A CI Statement A B C is valid for all covariances in the model if and only if C d-separates A and B in G.

Separation gives valid conditional independence constraints A,B are d-separated by C if every path from A to B either contains a collider v... where neither v nor any descendent of v are contained in C contains a blocked vertex v C with v... Theorem A CI Statement A B C is valid for all covariances in the model if and only if C d-separates A and B in G. Surprise There are more vanishing minors on the model, and all of these can be found using trek separation of Sullivant, Talaska, and Draisma.

Trek separation is still not all. In general there are non-determinantal constraints too ( exercise). General problem Characterization of graphs for which the vanishing ideal equals the global Markov ideal. Holds for trees and all graphs on 4 vertices.

Functionality of the graphical models package Creation of appropriate rings for conditional independence and graphical models in the Gaussian and discrete case: gaussianring, markovring. Deal with undirected, directed, and mixed graphs. Enumeration of separation statements: pairwise, local, global, d-, trek. Creation of conditional independence ideals from a list of statements: conditionalindependenceideal. Write out parametrizations of graphicalmodels as rational maps and compute vanishingideal. Solve Gaussian identifiability problems with identifyparameters.