Total positivity in Markov structures

Size: px
Start display at page:

Download "Total positivity in Markov structures"

Transcription

1 1 based on joint work with Shaun Fallat, Kayvan Sadeghi, Caroline Uhler, Nanny Wermuth, and Piotr Zwiernik (arxiv: ) Faculty of Science Total positivity in Markov structures Steffen Lauritzen 1 Department of Mathematical Sciences CRM, Montreal, July 2016 Slide 1/27

2 1 Positive association and multivariate total positivity 2 Multivariate Gaussian MTP 2 distributions 3 Conditional independence and Markov properties 4 Totally positive Markov distributions 5 Special instances of total positivity Slide 2/27

3 Positive dependence and Simpson s paradox Two real-valued random variables X and Y are positively associated if if Cov{f(X),g(Y)} 0 for all f, g which are non-decreasing. The Yule-Simpson paradox says that we may have X and Y positively associated but X and Y negatively associated conditionally on a third variable Z. Multivariate total positivity (MTP 2 ) ensures this not to happen: associations can never change sign due to changes of context. Hence might be of a causal nature... Slide 3/27

4 Multivariate total positivity for functions Let f : X = v V X v R where X v are either discrete or open subsets of R. Definition f is multivariate totally positive of order two (MTP 2 ) if f(x)f(y) f(x y)f(x y) for all x,y X. Here and should be applied coordinatewise. In the bivariate case, this property is known simply as total positivity or TP 2 (Karlin and Rinott, 1980). A function g is supermodular if g(x y)+g(x y) g(x)+g(y) for all x,y Z. Thus g is supermodular iff exp(g) is MTP 2. Slide 4/27

5 Example For d = 2, x 1 x 2,y 1 y 2 the condition for MTP 2 simply becomes or, alternatively f(x 1,y 2 )f(x 2,y 1 ) f(x 1,y 1 )f(x 2,y 2 ), det { f(x1,y 1 ) f(x 1,y 2 ) f(x 2,y 1 ) f(x 2,y 2 ) } 0. Slide 5/27

6 Multivariate total positivity for distributions For X = v V X v as before we adopt a standard base measure µ = v V µ v where µ v is counting measure if X v is discrete and Lebesgue measure if X v is an open subset of R. We then define Definition A distribution P is said to be multivariate totally positive of order two (MTP 2 ) if its density w.r.t. the standard base measure µ is MTP 2. Introduced and studied by Karlin and Rinott (1980) using results (FKG inequality) from fundamental paper by Fortuin et al. (1971). We shall occasionally say X is MTP 2 instead of the distribution of X is MTP 2. Slide 6/27

7 Example For d = 2, let f be the density of a Gaussian distribution with mean zero and covariance matrix { } σxx σ Σ = xy. σ yx σ yy Then f(x 1,y 2 )f(x 2,y 1 ) f(x 1,y 1 )f(x 2,y 2 ) if and only if σ yx 0, since this is equivalent to the mixed terms in the exponents satisfying (x 1 y 2 +x 2 y 1 )σ xy /det(σ) (x 1 y 1 +x 2 y 2 )σ xy /det(σ) and if σ xy > 0 this is equivalent to (x 1 y 1 +x 2 y 2 ) (x 1 y 2 +x 2 y 1 ) = (x 2 x 1 )(y 2 y 1 ) 0. Slide 7/27

8 Example Consider binary X and Y with p ij = P(X = i,y = j) for i,j {0,1}. Then P is MTP 2 if and only if p 01 p 10 p 00 p 11 i.e. iff the odds-ratio θ = p 00 p 11 /p 01 p 10 satisfies θ 1. For three MTP 2 binary variables X,Y,Z we have, for example, p 01k p 10k p 00k p 11k, k = 0,1, and thus the conditional odds-ratios satisfy θ k = p 00k p 11k /p 01k p 10k 1. Slide 8/27

9 Examples of MTP 2 distributions Mostly from Karlin and Rinott (1980): Characteristic roots of a Wishart matrix W, or of W 1 W 1 2, or W 1 (W 1 +W 2 ) 1, where W 1 W 2 (Dykstra and Hewett, 1978); Ferromagnetic (attractive) Ising models (Lebowitz, 1972); Multivariate logistic density (Gumbel, 1961); Gaussian free fields (random height landscapes) (Dynkin, 1980); Markov chains with TP 2 transition densities; Order statistics (X (1),...,X (n) ) if X 1...,X n are i.i.d. with density f; Gaussian latent tree models as in phylogenetics (Zwiernik, 2015); Many other examples... Slide 9/27

10 Fundamental properties A wealth of probability inequalities are satisfied for MTP 2 distributions (Karlin and Rinott, 1980). Also Proposition Assume X is MTP 2. Then If A V, then the marginal X A = (X v ) v A is MTP 2 ; If C V then the conditional distribution L(X V\C X C = x C ) is MTP 2 for almost all x C X C; If X is discrete and Y is obtained from X by collapsing neighboring states, then Y is MTP 2 ; If φ = (φ v ) v V are non-decreasing, then Y = φ(x) is MTP 2. Slide 10/27

11 Positive association and MTP 2 Proposition If X is MTP 2 and f and g are non-decreasing in each of its arguments, then X is positively associated Proof. Cov{f(X),g(X)} 0. Discrete case by Fortuin et al. (1971). General case by Sarkar (1969). Slide 11/27

12 Covariance and independence Proposition If X is positively associated and A,B V are disjoint, then X A X B Cov(X u,x v ) = 0 for all u A,v B. Proof. Shown in Lebowitz (1972). Such a result is usually special for the Gaussian distribution. So learning MTP 2 structure may be based on correlation analysis. Slide 12/27

13 Multivariate Gaussian MTP 2 distributions Proposition Let X N V (0,Σ). Then X is MTP 2 if and only if K = Σ 1 is a positive definite Minkowski matrix (M-matrix) i.e. iff Proof. k uv 0 for u v and u,v V. See Bølviken (1982) and Karlin and Rinott (1983). Since k uv is proportional to the negative partial correlation between X u and X v, X is MTP 2 if and only if all partial correlations are non-negative. Note also that this is a convex restriction in K. Slide 13/27

14 Mathematics marks Mechanics Vectors Algebra Analysis Statistics Mechanics Vectors Algebra Analysis Statistics Empirical partial correlations (below the diagonal) and concentrations ( 1000, on and above the diagonal) for 88 examination marks in five mathematical subjects. Essentially MTP 2. Slide 14/27

15 Mathematics marks under MTP 2 Fitting under the MTP 2 constraint yields ˆK which conforms with graphical model below Vectors Analysis Algebra Mechanics Statistics Slide 15/27

16 Abstract conditional independence An independence model σ is a ternary relation over subsets of V. It is semi-graphoid if for disjoint subsets A, B, C, D: (S1) if A σ B C then B σ A C (symmetry); (S2) if A σ (B D) C then A σ B C and A σ D C (decomposition); (S3) if A σ (B C) D then A σ B (C D) (weak union); (S4) if A σ B C and A σ D (B C), then A σ (B D) C (contraction). Any probabilistic independence model P is a semi-graphoid. It is a graphoid if (S1) (S4) holds and (S5) if A σ B (C D) and A σ C (B D) then A σ (B C) D (intersection). If X has a density f > 0 its associated independence model P is a graphoid. Slide 16/27

17 Conditional independence and total positivity A probability distribution on X defines an independence model P by A P B S X A P X B X S. Proposition (Fallat et al. 2016) If X is MTP 2, its independence model P satisfies (S6) (A P B C) (A P D C) = A P (B D) C (composition); (S7) (u P v C) (u P v (C w)) = (u P w C) (v P w C) (singleton transitivity) S(8) (A P B C) D V \(A B) = A P B (C D) (upward stability). These are all fulfilled for separation G in undirected graphs, but not necessarily for any probabilistic independence model P. Slide 17/27

18 Markov properties Let P be a probability distribution on X = v V X v. The pairwise independence graph G(P) = (V,E) is defined through the relation uv E u P v V \{u,v}. In other words, G(P) is the smallest graph G such that P is pairwise Markov w.r.t. G. We say that P is globally Markov w.r.t. a graph G if A G B S = A P B S where G is separation in the graph G. Further, we say that P is faithful to G if A G B S A P B S i.e. if the independence models P and G are identical. Slide 18/27

19 A main result Theorem (Fallat et al. 2016) Assume the distribution P of X is MTP 2 with strictly positive density f > 0. Then P is faithful to G(P). In other words, for MTP 2 distributions, the pairwise independence graph yields a complete picture of the independence relations in P. It also implies that if P is faithful to a DAG D and P is MTP 2, D must be perfect, i.e. all parents in the DAG are connected. So in this case, the undirected version of the DAG is chordal. Slide 19/27

20 Graph decompositions and total positivity Consider a chordal graph G and an associated junction tree T of cliques. Theorem (Fallat et al. 2016) If all separators S in T are singletons, a distribution P is MTP 2 if and only if all clique marginals P C,c C are MTP 2. Note in particular this covers trees. If the separators are not singletons, it is easy to construct counterexamples. And since the MTP 2 property is closed under marginalization, this implies that latent tree models with pairwise MTP 2 2 associations are MTP 2. Slide 20/27

21 Pairwise interaction models Theorem (Fallat et al. (2016)) A distribution of the form p(x) = 1 Z uv E ψ uv (x u,x v ), where ψ uv are positive functions and Z is a normalizing constant, is MTP 2 if and only if each ψ uv is an MTP 2 function. This covers, in particular, ferromagnetic Ising models. Slide 21/27

22 Higher order interactions Let X = (X v ) v V take values in X = v V X v where each X v is finite. D denote the power set of V. If p(x) > 0 for all x, we can expand log(x) = D Dθ D (x), (1) where interactions θ D depend on x through x D only. For uniqueness, we may w.l.o.g. assume 0 X v and require that θ D (x) = 0 whenever x d = 0 for some d D. In the binary case we may use simpler notation by letting θ D (1 D ) := θ D for all D D. Slide 22/27

23 Higher order interactions For a fixed pair u,w V, we define γ uw on X by γ uw (x) = θ D (x). D:{u,w} D Proposition (Fallat et al. (2016)) Let P be strictly positive. Then P is MTP 2 if and only if for all A V with A 2 and any given u,w V the function γ uw is non-negative, non-decreasing, and supermodular over X A, where X A are those with support A. Slide 23/27

24 Binary log-linear expansions For the binary case, the previous result specializes: Corollary (Bartolucci and Forcina (2000)) Let P be a binary distribution with logp(x) = D θ D Then P is MTP 2 if and only if for all A with A 2 and all {u,w} V we have D:{u,w} D A θ D 0. Slide 24/27

25 Causal betweenness Let X = (X 1 = 1 A,X 2 = 1 B,X 3 = 1 C ) be binary indicator functions of events A, B, C. Reichenbach (1956) says B is causally between A and C if P(C B A) = P(C B) and 1 > P(C B) > P(C A) > P(C) > 0, 1 > P(A B) > P(A C) > P(A) > 0. In general, causal betweenness does not imply MTP 2 ; if we let p 101 = 0, p 000 = 4/10, and p ijk = 1/10 for the remaining six possibilities, B is causally between A and C, but X is not MTP 2 since 0 = p 101 p 000 < p 100 p 001. However, if P(X = x) > 0 for all x and B is causally between A and C, then P is MTP 2. Conversely, if P(X = x) > 0 for all x, P is MTP 2, and the independence graph of P is then B is causally between A and C. This follows from the faithfulness of P. Slide 25/27

26 Some implications for structural learning A distribution is signed MTP 2 if sign changes σ v { 1,1} can be allocated to X v so that Y v = σ v X v,v V is MTP 2 ; The MTP 2 restriction is convex in logf, hence lends itself to convex optimization; So a potential learning strategy first finds a Chow-Liu tree, then changes signs so associations along edges are positive, and finally optimizes scoring function (e.g. penalized likelihood) under MTP 2 constraints. To be explored, so watch this space... Slide 26/27

27 There are many more things to be said... Thank you! Slide 27/27

28 Bartolucci, F. and Forcina, A. (2000). A likelihood ratio test for MTP 2 within binary variables. Ann. Statist., 28(4): Bølviken, E. (1982). Probability inequalities for the multivariate normal with non-negative partial correlations. Scand. J. Statist., 9: Dykstra, R. L. and Hewett, J. E. (1978). Positive dependence of the roots of a Wishart matrix. The Annals of Statistics, 6(1): Dynkin, E. (1980). Markov processes and random fields. Bulletin of the American Mathematical Society, 3(3): Fallat, S., Lauritzen, S., Sadeghi, K., Uhler, C., Wermuth, N., and Zwiernik, P. (2016). Total positivity in Markov structures. Annals of Statistics, page To appear. arxiv: Slide 27/27

29 Fortuin, C. M., Kasteleyn, P. W., and Ginibre, J. (1971). Correlation inequalities on some partially ordered sets. Comm. Math. Phys., 22(2): Gumbel, E. J. (1961). Bivariate logistic distributions. Journal of the American Statistical Association, 56(294): Karlin, S. and Rinott, Y. (1980). Classes of orderings of measures and related correlation inequalities. I. Multivariate totally positive distributions. J. Multiv. Anal., 10(4): Karlin, S. and Rinott, Y. (1983). M-matrices as covariance matrices of multinormal distributions. Linear Algebra Appl., 52: Lebowitz, J. L. (1972). Bounds on the correlations and analyticity properties of ferromagnetic Ising spin systems. Comm. Math. Phys., 28(4): Reichenbach, H. (1956). The Direction of Time. University of California Press, Berkeley, CA. Slide 27/27

30 Sarkar, T. K. (1969). Some lower bounds of reliability. Tech. Report, No. 124, Department of Operations Research and Department of Statistics, Stanford University. Zwiernik, P. (2015). Semialgebraic Statistics and Latent Tree Models. Number 146 in Monographs on Statistics and Applied Probability. Chapman & Hall. Slide 27/27

Likelihood Analysis of Gaussian Graphical Models

Likelihood Analysis of Gaussian Graphical Models Faculty of Science Likelihood Analysis of Gaussian Graphical Models Ste en Lauritzen Department of Mathematical Sciences Minikurs TUM 2016 Lecture 2 Slide 1/43 Overview of lectures Lecture 1 Markov Properties

More information

Markov properties for undirected graphs

Markov properties for undirected graphs Graphical Models, Lecture 2, Michaelmas Term 2011 October 12, 2011 Formal definition Fundamental properties Random variables X and Y are conditionally independent given the random variable Z if L(X Y,

More information

Conditional Independence and Markov Properties

Conditional Independence and Markov Properties Conditional Independence and Markov Properties Lecture 1 Saint Flour Summerschool, July 5, 2006 Steffen L. Lauritzen, University of Oxford Overview of lectures 1. Conditional independence and Markov properties

More information

Markov properties for undirected graphs

Markov properties for undirected graphs Graphical Models, Lecture 2, Michaelmas Term 2009 October 15, 2009 Formal definition Fundamental properties Random variables X and Y are conditionally independent given the random variable Z if L(X Y,

More information

Faithfulness of Probability Distributions and Graphs

Faithfulness of Probability Distributions and Graphs Journal of Machine Learning Research 18 (2017) 1-29 Submitted 5/17; Revised 11/17; Published 12/17 Faithfulness of Probability Distributions and Graphs Kayvan Sadeghi Statistical Laboratory University

More information

Undirected Graphical Models

Undirected Graphical Models Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates

More information

Lecture 4 October 18th

Lecture 4 October 18th Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations

More information

Graphical Models and Independence Models

Graphical Models and Independence Models Graphical Models and Independence Models Yunshu Liu ASPITRG Research Group 2014-03-04 References: [1]. Steffen Lauritzen, Graphical Models, Oxford University Press, 1996 [2]. Christopher M. Bishop, Pattern

More information

MATH 829: Introduction to Data Mining and Analysis Graphical Models I

MATH 829: Introduction to Data Mining and Analysis Graphical Models I MATH 829: Introduction to Data Mining and Analysis Graphical Models I Dominique Guillot Departments of Mathematical Sciences University of Delaware May 2, 2016 1/12 Independence and conditional independence:

More information

Learning Multivariate Regression Chain Graphs under Faithfulness

Learning Multivariate Regression Chain Graphs under Faithfulness Sixth European Workshop on Probabilistic Graphical Models, Granada, Spain, 2012 Learning Multivariate Regression Chain Graphs under Faithfulness Dag Sonntag ADIT, IDA, Linköping University, Sweden dag.sonntag@liu.se

More information

Tutorial: Gaussian conditional independence and graphical models. Thomas Kahle Otto-von-Guericke Universität Magdeburg

Tutorial: Gaussian conditional independence and graphical models. Thomas Kahle Otto-von-Guericke Universität Magdeburg Tutorial: Gaussian conditional independence and graphical models Thomas Kahle Otto-von-Guericke Universität Magdeburg The central dogma of algebraic statistics Statistical models are varieties The central

More information

Decomposable Graphical Gaussian Models

Decomposable Graphical Gaussian Models CIMPA Summerschool, Hammamet 2011, Tunisia September 12, 2011 Basic algorithm This simple algorithm has complexity O( V + E ): 1. Choose v 0 V arbitrary and let v 0 = 1; 2. When vertices {1, 2,..., j}

More information

Decomposable and Directed Graphical Gaussian Models

Decomposable and Directed Graphical Gaussian Models Decomposable Decomposable and Directed Graphical Gaussian Models Graphical Models and Inference, Lecture 13, Michaelmas Term 2009 November 26, 2009 Decomposable Definition Basic properties Wishart density

More information

Independencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks)

Independencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks) (Bayesian Networks) Undirected Graphical Models 2: Use d-separation to read off independencies in a Bayesian network Takes a bit of effort! 1 2 (Markov networks) Use separation to determine independencies

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Probabilistic Graphical Models (I)

Probabilistic Graphical Models (I) Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

Undirected Graphical Models: Markov Random Fields

Undirected Graphical Models: Markov Random Fields Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

Total positivity order and the normal distribution

Total positivity order and the normal distribution Journal of Multivariate Analysis 97 (2006) 1251 1261 www.elsevier.com/locate/jmva Total positivity order and the normal distribution Yosef Rinott a,,1, Marco Scarsini b,2 a Department of Statistics, Hebrew

More information

Geometry of Gaussoids

Geometry of Gaussoids Geometry of Gaussoids Bernd Sturmfels MPI Leipzig and UC Berkeley p 3 p 13 p 23 a 12 3 p 123 a 23 a 13 2 a 23 1 a 13 p 2 p 12 a 12 p p 1 Figure 1: With The vertices Tobias andboege, 2-faces ofalessio the

More information

CSC 412 (Lecture 4): Undirected Graphical Models

CSC 412 (Lecture 4): Undirected Graphical Models CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:

More information

arxiv: v2 [stat.me] 5 May 2016

arxiv: v2 [stat.me] 5 May 2016 Palindromic Bernoulli distributions Giovanni M. Marchetti Dipartimento di Statistica, Informatica, Applicazioni G. Parenti, Florence, Italy e-mail: giovanni.marchetti@disia.unifi.it and Nanny Wermuth arxiv:1510.09072v2

More information

Review: Directed Models (Bayes Nets)

Review: Directed Models (Bayes Nets) X Review: Directed Models (Bayes Nets) Lecture 3: Undirected Graphical Models Sam Roweis January 2, 24 Semantics: x y z if z d-separates x and y d-separation: z d-separates x from y if along every undirected

More information

4.1 Notation and probability review

4.1 Notation and probability review Directed and undirected graphical models Fall 2015 Lecture 4 October 21st Lecturer: Simon Lacoste-Julien Scribe: Jaime Roquero, JieYing Wu 4.1 Notation and probability review 4.1.1 Notations Let us recall

More information

Parameter estimation in linear Gaussian covariance models

Parameter estimation in linear Gaussian covariance models Parameter estimation in linear Gaussian covariance models Caroline Uhler (IST Austria) Joint work with Piotr Zwiernik (UC Berkeley) and Donald Richards (Penn State University) Big Data Reunion Workshop

More information

Markov properties for mixed graphs

Markov properties for mixed graphs Bernoulli 20(2), 2014, 676 696 DOI: 10.3150/12-BEJ502 Markov properties for mixed graphs KAYVAN SADEGHI 1 and STEFFEN LAURITZEN 2 1 Department of Statistics, Baker Hall, Carnegie Mellon University, Pittsburgh,

More information

Prof. Dr. Lars Schmidt-Thieme, L. B. Marinho, K. Buza Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course

Prof. Dr. Lars Schmidt-Thieme, L. B. Marinho, K. Buza Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course Course on Bayesian Networks, winter term 2007 0/31 Bayesian Networks Bayesian Networks I. Bayesian Networks / 1. Probabilistic Independence and Separation in Graphs Prof. Dr. Lars Schmidt-Thieme, L. B.

More information

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Problem Set 3 Issued: Thursday, September 25, 2014 Due: Thursday,

More information

Causal Effect Identification in Alternative Acyclic Directed Mixed Graphs

Causal Effect Identification in Alternative Acyclic Directed Mixed Graphs Proceedings of Machine Learning Research vol 73:21-32, 2017 AMBN 2017 Causal Effect Identification in Alternative Acyclic Directed Mixed Graphs Jose M. Peña Linköping University Linköping (Sweden) jose.m.pena@liu.se

More information

Learning Marginal AMP Chain Graphs under Faithfulness

Learning Marginal AMP Chain Graphs under Faithfulness Learning Marginal AMP Chain Graphs under Faithfulness Jose M. Peña ADIT, IDA, Linköping University, SE-58183 Linköping, Sweden jose.m.pena@liu.se Abstract. Marginal AMP chain graphs are a recently introduced

More information

Log-Convexity Properties of Schur Functions and Generalized Hypergeometric Functions of Matrix Argument. Donald St. P. Richards.

Log-Convexity Properties of Schur Functions and Generalized Hypergeometric Functions of Matrix Argument. Donald St. P. Richards. Log-Convexity Properties of Schur Functions and Generalized Hypergeometric Functions of Matrix Argument Donald St. P. Richards August 22, 2009 Abstract We establish a positivity property for the difference

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

Structure estimation for Gaussian graphical models

Structure estimation for Gaussian graphical models Faculty of Science Structure estimation for Gaussian graphical models Steffen Lauritzen, University of Copenhagen Department of Mathematical Sciences Minikurs TUM 2016 Lecture 3 Slide 1/48 Overview of

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

Identifying the Graphs of Polynomial Functions

Identifying the Graphs of Polynomial Functions Identifying the Graphs of Polynomial Functions Many of the functions on the Math IIC are polynomial functions. Although they can be difficult to sketch and identify, there are a few tricks to make it easier.

More information

Estimating Latent Variable Graphical Models with Moments and Likelihoods

Estimating Latent Variable Graphical Models with Moments and Likelihoods Estimating Latent Variable Graphical Models with Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University June 18, 2014 Chaganty, Liang (Stanford University) Moments and Likelihoods

More information

Probability Background

Probability Background Probability Background Namrata Vaswani, Iowa State University August 24, 2015 Probability recap 1: EE 322 notes Quick test of concepts: Given random variables X 1, X 2,... X n. Compute the PDF of the second

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

Markov properties for directed graphs

Markov properties for directed graphs Graphical Models, Lecture 7, Michaelmas Term 2009 November 2, 2009 Definitions Structural relations among Markov properties Factorization G = (V, E) simple undirected graph; σ Say σ satisfies (P) the pairwise

More information

Graphical Gaussian models and their groups

Graphical Gaussian models and their groups Piotr Zwiernik TU Eindhoven (j.w. Jan Draisma, Sonja Kuhnt) Workshop on Graphical Models, Fields Institute, Toronto, 16 Apr 2012 1 / 23 Outline and references Outline: 1. Invariance of statistical models

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models STA 345: Multivariate Analysis Department of Statistical Science Duke University, Durham, NC, USA Robert L. Wolpert 1 Conditional Dependence Two real-valued or vector-valued

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY

DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY OUTLINE 3.1 Why Probability? 3.2 Random Variables 3.3 Probability Distributions 3.4 Marginal Probability 3.5 Conditional Probability 3.6 The Chain

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning and Inference in DAGs We discussed learning in DAG models, log p(x W ) = n d log p(x i j x i pa(j),

More information

x log x, which is strictly convex, and use Jensen s Inequality:

x log x, which is strictly convex, and use Jensen s Inequality: 2. Information measures: mutual information 2.1 Divergence: main inequality Theorem 2.1 (Information Inequality). D(P Q) 0 ; D(P Q) = 0 iff P = Q Proof. Let ϕ(x) x log x, which is strictly convex, and

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed

More information

Parametrizations of Discrete Graphical Models

Parametrizations of Discrete Graphical Models Parametrizations of Discrete Graphical Models Robin J. Evans www.stat.washington.edu/ rje42 10th August 2011 1/34 Outline 1 Introduction Graphical Models Acyclic Directed Mixed Graphs Two Problems 2 Ingenuous

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Algebraic methods toward higher-order probability inequalities

Algebraic methods toward higher-order probability inequalities Algebraic methods toward higher-orderprobability inequalities p. 1/3 Algebraic methods toward higher-order probability inequalities Donald Richards Penn State University and SAMSI Algebraic methods toward

More information

Example: multivariate Gaussian Distribution

Example: multivariate Gaussian Distribution School of omputer Science Probabilistic Graphical Models Representation of undirected GM (continued) Eric Xing Lecture 3, September 16, 2009 Reading: KF-chap4 Eric Xing @ MU, 2005-2009 1 Example: multivariate

More information

Chapter 1 Vector Spaces

Chapter 1 Vector Spaces Chapter 1 Vector Spaces Per-Olof Persson persson@berkeley.edu Department of Mathematics University of California, Berkeley Math 110 Linear Algebra Vector Spaces Definition A vector space V over a field

More information

The Maximum Likelihood Threshold of a Graph

The Maximum Likelihood Threshold of a Graph The Maximum Likelihood Threshold of a Graph Elizabeth Gross and Seth Sullivant San Jose State University, North Carolina State University August 28, 2014 Seth Sullivant (NCSU) Maximum Likelihood Threshold

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Probabilistic Graphical Models. Rudolf Kruse, Alexander Dockhorn Bayesian Networks 153

Probabilistic Graphical Models. Rudolf Kruse, Alexander Dockhorn Bayesian Networks 153 Probabilistic Graphical Models Rudolf Kruse, Alexander Dockhorn Bayesian Networks 153 The Big Objective(s) In a wide variety of application fields two main problems need to be addressed over and over:

More information

Quiz 1 Date: Monday, October 17, 2016

Quiz 1 Date: Monday, October 17, 2016 10-704 Information Processing and Learning Fall 016 Quiz 1 Date: Monday, October 17, 016 Name: Andrew ID: Department: Guidelines: 1. PLEASE DO NOT TURN THIS PAGE UNTIL INSTRUCTED.. Write your name, Andrew

More information

COMP538: Introduction to Bayesian Networks

COMP538: Introduction to Bayesian Networks COMP538: Introduction to Bayesian Networks Lecture 9: Optimal Structure Learning Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering Hong Kong University of Science and Technology

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Markov properties for graphical time series models

Markov properties for graphical time series models Markov properties for graphical time series models Michael Eichler Universität Heidelberg Abstract This paper deals with the Markov properties of a new class of graphical time series models which focus

More information

Elements of Graphical Models DRAFT.

Elements of Graphical Models DRAFT. Steffen L. Lauritzen Elements of Graphical Models DRAFT. Lectures from the XXXVIth International Probability Summer School in Saint-Flour, France, 2006 December 2, 2009 Springer Contents 1 Introduction...................................................

More information

An Algebraic and Geometric Perspective on Exponential Families

An Algebraic and Geometric Perspective on Exponential Families An Algebraic and Geometric Perspective on Exponential Families Caroline Uhler (IST Austria) Based on two papers: with Mateusz Micha lek, Bernd Sturmfels, and Piotr Zwiernik, and with Liam Solus and Ruriko

More information

Lecture 17: May 29, 2002

Lecture 17: May 29, 2002 EE596 Pat. Recog. II: Introduction to Graphical Models University of Washington Spring 2000 Dept. of Electrical Engineering Lecture 17: May 29, 2002 Lecturer: Jeff ilmes Scribe: Kurt Partridge, Salvador

More information

Maximum likelihood in log-linear models

Maximum likelihood in log-linear models Graphical Models, Lecture 4, Michaelmas Term 2010 October 22, 2010 Generating class Dependence graph of log-linear model Conformal graphical models Factor graphs Let A denote an arbitrary set of subsets

More information

Lecture 12: May 09, Decomposable Graphs (continues from last time)

Lecture 12: May 09, Decomposable Graphs (continues from last time) 596 Pat. Recog. II: Introduction to Graphical Models University of Washington Spring 00 Dept. of lectrical ngineering Lecture : May 09, 00 Lecturer: Jeff Bilmes Scribe: Hansang ho, Izhak Shafran(000).

More information

Introduction to graphical models: Lecture III

Introduction to graphical models: Lecture III Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Towards an extension of the PC algorithm to local context-specific independencies detection

Towards an extension of the PC algorithm to local context-specific independencies detection Towards an extension of the PC algorithm to local context-specific independencies detection Feb-09-2016 Outline Background: Bayesian Networks The PC algorithm Context-specific independence: from DAGs to

More information

On an Additive Semigraphoid Model for Statistical Networks With Application to Nov Pathway 25, 2016 Analysis -1 Bing / 38Li,

On an Additive Semigraphoid Model for Statistical Networks With Application to Nov Pathway 25, 2016 Analysis -1 Bing / 38Li, On an Additive Semigraphoid Model for Statistical Networks With Application to Pathway Analysis - Bing Li, Hyunho Chun & Hongyu Zhao Kim Youngrae SNU Stat. Multivariate Lab Nov 25, 2016 On an Additive

More information

CS281A/Stat241A Lecture 19

CS281A/Stat241A Lecture 19 CS281A/Stat241A Lecture 19 p. 1/4 CS281A/Stat241A Lecture 19 Junction Tree Algorithm Peter Bartlett CS281A/Stat241A Lecture 19 p. 2/4 Announcements My office hours: Tuesday Nov 3 (today), 1-2pm, in 723

More information

Independence, Decomposability and functions which take values into an Abelian Group

Independence, Decomposability and functions which take values into an Abelian Group Independence, Decomposability and functions which take values into an Abelian Group Adrian Silvescu Department of Computer Science Iowa State University Ames, IA 50010, USA silvescu@cs.iastate.edu Abstract

More information

Log-concave distributions: definitions, properties, and consequences

Log-concave distributions: definitions, properties, and consequences Log-concave distributions: definitions, properties, and consequences Jon A. Wellner University of Washington, Seattle; visiting Heidelberg Seminaire, Institut de Mathématiques de Toulouse 28 February 202

More information

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity Eckehard Olbrich MPI MiS Leipzig Potsdam WS 2007/08 Olbrich (Leipzig) 26.10.2007 1 / 18 Overview 1 Summary

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Probability and Measure

Probability and Measure Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability

More information

Markovian Combination of Decomposable Model Structures: MCMoSt

Markovian Combination of Decomposable Model Structures: MCMoSt Markovian Combination 1/45 London Math Society Durham Symposium on Mathematical Aspects of Graphical Models. June 30 - July, 2008 Markovian Combination of Decomposable Model Structures: MCMoSt Sung-Ho

More information

Causal Models with Hidden Variables

Causal Models with Hidden Variables Causal Models with Hidden Variables Robin J. Evans www.stats.ox.ac.uk/ evans Department of Statistics, University of Oxford Quantum Networks, Oxford August 2017 1 / 44 Correlation does not imply causation

More information

Chapter 9: Relations Relations

Chapter 9: Relations Relations Chapter 9: Relations 9.1 - Relations Definition 1 (Relation). Let A and B be sets. A binary relation from A to B is a subset R A B, i.e., R is a set of ordered pairs where the first element from each pair

More information

Bayesian (conditionally) conjugate inference for discrete data models. Jon Forster (University of Southampton)

Bayesian (conditionally) conjugate inference for discrete data models. Jon Forster (University of Southampton) Bayesian (conditionally) conjugate inference for discrete data models Jon Forster (University of Southampton) with Mark Grigsby (Procter and Gamble?) Emily Webb (Institute of Cancer Research) Table 1:

More information

The Minimum Rank, Inverse Inertia, and Inverse Eigenvalue Problems for Graphs. Mark C. Kempton

The Minimum Rank, Inverse Inertia, and Inverse Eigenvalue Problems for Graphs. Mark C. Kempton The Minimum Rank, Inverse Inertia, and Inverse Eigenvalue Problems for Graphs Mark C. Kempton A thesis submitted to the faculty of Brigham Young University in partial fulfillment of the requirements for

More information

Intelligent Systems:

Intelligent Systems: Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

ON STRONGLY PRIME IDEALS AND STRONGLY ZERO-DIMENSIONAL RINGS. Christian Gottlieb

ON STRONGLY PRIME IDEALS AND STRONGLY ZERO-DIMENSIONAL RINGS. Christian Gottlieb ON STRONGLY PRIME IDEALS AND STRONGLY ZERO-DIMENSIONAL RINGS Christian Gottlieb Department of Mathematics, University of Stockholm SE-106 91 Stockholm, Sweden gottlieb@math.su.se Abstract A prime ideal

More information

11 : Gaussian Graphic Models and Ising Models

11 : Gaussian Graphic Models and Ising Models 10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood

More information

Algebraic Representations of Gaussian Markov Combinations

Algebraic Representations of Gaussian Markov Combinations Submitted to the Bernoulli Algebraic Representations of Gaussian Markov Combinations M. SOFIA MASSA 1 and EVA RICCOMAGNO 2 1 Department of Statistics, University of Oxford, 1 South Parks Road, Oxford,

More information

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs) Machine Learning (ML, F16) Lecture#07 (Thursday Nov. 3rd) Lecturer: Byron Boots Undirected Graphical Models 1 Undirected Graphical Models In the previous lecture, we discussed directed graphical models.

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 4 Learning Bayesian Networks CS/CNS/EE 155 Andreas Krause Announcements Another TA: Hongchao Zhou Please fill out the questionnaire about recitations Homework 1 out.

More information

The intersection axiom of

The intersection axiom of The intersection axiom of conditional independence: some new [?] results Richard D. Gill Mathematical Institute, University Leiden This version: 26 March, 2019 (X Y Z) & (X Z Y) X (Y, Z) Presented at Algebraic

More information

arxiv: v4 [math.st] 11 Jul 2017

arxiv: v4 [math.st] 11 Jul 2017 UNIFYING MARKOV PROPERTIES FOR GRAPHICAL MODELS arxiv:1608.05810v4 [math.st] 11 Jul 2017 By Steffen Lauritzen and Kayvan Sadeghi University of Copenhagen and University of Cambridge Several types of graphs

More information

Regression models for multivariate ordered responses via the Plackett distribution

Regression models for multivariate ordered responses via the Plackett distribution Journal of Multivariate Analysis 99 (2008) 2472 2478 www.elsevier.com/locate/jmva Regression models for multivariate ordered responses via the Plackett distribution A. Forcina a,, V. Dardanoni b a Dipartimento

More information

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Causality in Econometrics (3)

Causality in Econometrics (3) Graphical Causal Models References Causality in Econometrics (3) Alessio Moneta Max Planck Institute of Economics Jena moneta@econ.mpg.de 26 April 2011 GSBC Lecture Friedrich-Schiller-Universität Jena

More information

ARTICLE IN PRESS. Journal of Multivariate Analysis ( ) Contents lists available at ScienceDirect. Journal of Multivariate Analysis

ARTICLE IN PRESS. Journal of Multivariate Analysis ( ) Contents lists available at ScienceDirect. Journal of Multivariate Analysis Journal of Multivariate Analysis ( ) Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Marginal parameterizations of discrete models

More information

Capturing Independence Graphically; Undirected Graphs

Capturing Independence Graphically; Undirected Graphs Capturing Independence Graphically; Undirected Graphs COMPSCI 276, Spring 2014 Set 2: Rina Dechter (Reading: Pearl chapters 3, Darwiche chapter 4) 1 Constraint Networks Example: map coloring Variables

More information

Graphical Model Inference with Perfect Graphs

Graphical Model Inference with Perfect Graphs Graphical Model Inference with Perfect Graphs Tony Jebara Columbia University July 25, 2013 joint work with Adrian Weller Graphical models and Markov random fields We depict a graphical model G as a bipartite

More information

Fisher Information in Gaussian Graphical Models

Fisher Information in Gaussian Graphical Models Fisher Information in Gaussian Graphical Models Jason K. Johnson September 21, 2006 Abstract This note summarizes various derivations, formulas and computational algorithms relevant to the Fisher information

More information

Learning from Sensor Data: Set II. Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University

Learning from Sensor Data: Set II. Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University Learning from Sensor Data: Set II Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University 1 6. Data Representation The approach for learning from data Probabilistic

More information

Gaussian Graphical Models: An Algebraic and Geometric Perspective

Gaussian Graphical Models: An Algebraic and Geometric Perspective Gaussian Graphical Models: An Algebraic and Geometric Perspective Caroline Uhler arxiv:707.04345v [math.st] 3 Jul 07 Abstract Gaussian graphical models are used throughout the natural sciences, social

More information