Decomposable Graphical Gaussian Models

Size: px
Start display at page:

Download "Decomposable Graphical Gaussian Models"

Transcription

1 CIMPA Summerschool, Hammamet 2011, Tunisia September 12, 2011

2 Basic algorithm This simple algorithm has complexity O( V + E ): 1. Choose v 0 V arbitrary and let v 0 = 1; 2. When vertices {1, 2,..., j} have been identified, choose v = j + 1 among V \ {1, 2,..., j} with highest cardinality of its numbered neighbours; 3. If bd(j + 1) {1, 2,..., j} is not complete, G is not chordal; 4. Repeat from 2; 5. If the algorithm continues until no vertices are left, the graph is chordal and the numbering is perfect.

3 Basic algorithm Finding the cliques of a chordal graph From an MCS numbering V = {1,..., V }, let B λ = bd(λ) {1,..., λ 1} and π λ = B λ. Call λ a ladder vertex if λ = V or if π λ+1 < π λ + 1. Let Λ be the set of ladder vertices π λ : 0,1,2,2,2,1,1. The cliques are C λ = {λ} B λ, λ Λ.

4 Definition Characterizing chordal graphs Construction of junction tree of prime components Let A be a collection of finite subsets of a set V. A junction tree T of sets in A is an undirected tree with A as a vertex set, satisfying the junction tree property: If A, B A and C is on the unique path in T between A and B it holds that A B C. If the sets in an arbitrary A are pairwise incomparable, they can be arranged in a junction tree if and only if A = C where C are the cliques of a chordal graph

5 Definition Characterizing chordal graphs Construction of junction tree of prime components The following are equivalent for any undirected graph G. (i) G is chordal; (ii) G is decomposable; (iii) All prime components of G are cliques; (iv) G admits a perfect numbering; (v) Every minimal (α, β)-separator are complete. (vi) The cliques of G can be arranged in a junction tree.

6 Definition Characterizing chordal graphs Construction of junction tree of prime components The junction tree can be constructed directly from the MCS ordering C λ, λ Λ, where C λ are the cliques: Since the MCS-numbering is perfect, C λ, λ > λ min all satisfy for some λ < λ. C λ ( λ <λc λ ) = C λ C λ = S λ A junction tree is now easily constructed by attaching C λ to any C λ satisfying the above. Although λ may not be uniquely determined, S λ is. Indeed, the sets S λ are the minimal complete separators and the numbers ν(s) are ν(s) = {λ Λ : S λ = S}.

7 A chordal graph Maximum cardinality search Definition Characterizing chordal graphs Construction of junction tree of prime components

8 Junction tree Maximum cardinality search Definition Characterizing chordal graphs Construction of junction tree of prime components Cliques of graph arranged into a tree with C 1 C 2 D for all cliques D on path between C 1 and C 2.

9 Definition Characterizing chordal graphs Construction of junction tree of prime components In general, the prime components of any undirected graph can be arranged in a junction tree in a similar way. Then every pair of neighbours (C, D) in the junction tree represents a decomposition of G into G C and G D, where C is the set of vertices in prime components connected to C but separated from D in the junction tree, and similarly with D. The corresponding algorithm is based on a slightly more sophisticated algorithm known as Lexicographic Search (LEX) which runs in O( V 2 ) time.

10 Basic factorizations Maximum likelihood estimates An example If the graph G is chordal, we say that the graphical model is decomposable. In this case, the IPS-algorithm converges in a finite number of steps. We also have the familiar factorization of densities C C f (x Σ) = f (x C Σ C ) S S f (x S Σ S ) ν(s) (1) where ν(s) is the number of times S appear as intersection between neighbouring cliques of a junction tree for C.

11 Relations for trace and determinant Basic factorizations Maximum likelihood estimates An example Using the factorization (1) we can for example match the expressions for the trace and determinant of Σ tr(kw ) = C C tr(k C W C ) S S ν(s) tr(k S W S ) and further det Σ = {det(k)} 1 = C C det{σ C } S S {det(σ S)} ν(s) These are some of many relations that can be derived using the decomposition property of chordal graphs.

12 Basic factorizations Maximum likelihood estimates An example The same factorization clearly holds for the maximum likelihood estimates: C C f (x ˆΣ) = f (x C ˆΣ C ) S S f (x (2) S ˆΣ S ) ν(s) Moreover, it follows from the general likelihood equations that ˆΣ A = W A /n whenever A is complete. Exploiting this, we can obtain an explicit formula for the maximum likelihood estimate in the case of a chordal graph.

13 Basic factorizations Maximum likelihood estimates An example For a d e matrix A = {a γµ } γ d,µ e we let [A] V denote the matrix obtained from A by filling up with zero entries to obtain full dimension V V, i.e. ( ) { [A] V = aγµ if γ d, µ e γµ 0 otherwise. The maximum likelihood estimates exists if and only if n C for all C C. Then the following simple formula holds for the maximum likelihood estimate of K: { } ˆK = n [(w C ) 1] V ν(s) [(w S ) 1] V. C C S S

14 Mathematics marks Basic factorizations Maximum likelihood estimates An example 2:Vectors 4:Analysis 3:Algebra 1:Mechanics 5:Statistics This graph is chordal with cliques {1, 2, 3}, {3, 4, 5} with separator S = {3} having ν({3}) = 1.

15 Basic factorizations Maximum likelihood estimates An example Since one degree of freedom is lost by subtracting the average, we get in this example ˆK = 87 w 11 [123] w 12 [123] w 13 [123] 0 0 w 21 [123] w 22 [123] w 23 [123] 0 0 w 31 [123] w 32 [123] w 33 [123] + w 33 [345] 1/w 33 w 34 [345] w 35 [345] 0 0 w 43 [345] w 44 [345] w 45 [345] 0 0 w 53 [345] w 54 [345] w 55 [345] where w ij [123] is the ijth element of the inverse of and so on. W [123] = w 11 w 12 w 13 w 21 w 22 w 23 w 31 w 32 w 33

16 Distribution of MLE in Gaussian graphical model Hyper Markov Laws Canonical construction of hyper Markov laws Strong hyper Markov laws The formula ˆΣ = n 1 { C C [(W C ) 1] V specifies ˆΣ as a random matrix. S S } 1 ν(s) [(W S ) 1] V The distribution of this random Wishart-type matrix is partly reflecting Markov properties of the graph G. This is also true for the distribution of ˆΣ for a non-chordal graph G but not to the same degree. Before we delve further into this, we shall need some more terminology.

17 Distribution of MLE in Gaussian graphical model Hyper Markov Laws Canonical construction of hyper Markov laws Strong hyper Markov laws Laws and distributions Generally we think of P = {P θ, θ Θ} and sometimes identify Θ with P which is justified when the parametrization θ P θ is one-to-one and onto. In a Gaussian graphical model θ = K S + (G) is uniquely identifying any regular Gaussian distribution satisfying the Markov properties w.r.t. G.

18 Distribution of MLE in Gaussian graphical model Hyper Markov Laws Canonical construction of hyper Markov laws Strong hyper Markov laws In any case, any probability measure on P (or on Θ) represents a random element of P, i.e. a random distribution. The sampling distribution of the MLE ˆΣ is an example of such a measure. To keep heads straight we refer to a probability measure on P as a law, whereas a distribution is a probability measure on X. Thus we shall e.g. speak of the Wishart law as we think of it specifying a distribution of f ( Σ).

19 Distribution of MLE in Gaussian graphical model Hyper Markov Laws Canonical construction of hyper Markov laws Strong hyper Markov laws We identify θ Θ and P θ P, so e.g. θ A for A V denotes the distribution of X A under P θ and θ A B the family of conditional distributions of X A given X B, etc. For a law L on Θ we write A L B S θ A S L θ B S θ S. A law L on Θ is hyper Markov w.r.t. G if (i) All θ Θ are globally Markov w.r.t. G; (ii) A L B S whenever S is complete and A G B S. Note the conditional independence is only required to hold for graph decompositions.

20 Hyper Markov property Distribution of MLE in Gaussian graphical model Hyper Markov Laws Canonical construction of hyper Markov laws Strong hyper Markov laws If θ follows a hyper Markov law for this graph, it holds for example that θ 1235 θ θ 25. This is indeed true for ˆΣ in the graphical model with this graph, i.e. ˆΣ 1235 ˆΣ ˆΣ 25.

21 Distribution of MLE in Gaussian graphical model Hyper Markov Laws Canonical construction of hyper Markov laws Strong hyper Markov laws Consequences of the hyper Markov property Clearly, if A L B S, we have for example also (using property (C2) of conditional independence) θ A L θ B θ S since θ A and θ B are functions of θ A S and θ B S respectively. But the converse is false! θ A L θ B θ S does not imply θ A S L θ B S θ S, since θ A S is not a function of (θ A, θ S ). In contrast, X A B is indeed a (one-to-one) function of (X A, X B ). However it generally holds that A L B S θ A S L θ B S θ S.

22 Chordal graphs Maximum cardinality search Distribution of MLE in Gaussian graphical model Hyper Markov Laws Canonical construction of hyper Markov laws Strong hyper Markov laws If G is chordal and θ is hyper Markov on G, it holds that A G B S A L B S i.e. it is not necessary to specify that S is a complete separator to obtain the relevant conditional independence. This follows essentially because for a chordal graph it holds that A G B S S S : A G B S with S complete. If G is not chordal, we can form G by completing all prime components of G. If θ is hyper Markov on G it is also hyper Markov on G, and thus A G B S A L B S. But the similar result is false for an arbitrary chordal cover of G.

23 Directed hyper Markov property Distribution of MLE in Gaussian graphical model Hyper Markov Laws Canonical construction of hyper Markov laws Strong hyper Markov laws We have similar notions and results in the directed case. Say L = L(θ) is directed hyper Markov w.r.t. a DAG D if θ is directed Markov on D for all θ Θ and θ v pa(v) L θ nd(v) θ pa(v), or equivalently θ v pa(v) L θ nd(v) θ pa(v), or equivalently for a well-ordering θ v pa(v) L θ pr(v) θ pa(v). In general there is no similar statement corresponding to the global property and d-separation. However, if D is perfect, L is directed hyper Markov w.r.t. D if and only if L is hyper Markov w.r.t. G = σ(d) = D m.

24 Distribution of MLE in Gaussian graphical model Hyper Markov Laws Canonical construction of hyper Markov laws Strong hyper Markov laws The distributions of maximum likelihood estimators are important examples of hyper Markov laws. But for chordal graphs there is a canonical construction of such laws. Let C be the cliques of a chordal graph G and let L C, C C be a family of laws over Θ C P(X C ). The family of laws are hyperconsistent if for any C and D with C D = S, L C and L D induce the same law for θ S. If L C, C C are hyperconsistent, there is a unique hyper Markov law L over G with L(θ C ) = L C, C C.

25 Distribution of MLE in Gaussian graphical model Hyper Markov Laws Canonical construction of hyper Markov laws Strong hyper Markov laws In some cases it is of interest to consider a stronger version of the hyper Markov property. A hyper Markov law is strongly hyper Markov if θ A S L θ S for all complete separators S. A directed hyper Markov law is strongly directed hyper Markov if θ v pa(v) L θ pa(v) for all v V.

26 Basic setup Conjugate families Stability of hyper Markov properties Conjugate exponential families Parameter θ Θ, data X = x, likelihood L(θ x) p(x θ) = dp θ(x) dµ(x). Express knowledge about θ through a prior law π on θ. Use also π to denote density of the prior law w.r.t. some measure ν on Θ. Inference about θ from x is then represented through posterior law π (θ) = p(θ x). Then, from Bayes formula π (θ) = p(x θ)π(θ)/p(x) L(θ x)π(θ) so the likelihood function is equal to the density of the posterior w.r.t. the prior modulo a constant.

27 Basic setup Conjugate families Stability of hyper Markov properties Conjugate exponential families A family P of laws on Θ is said to be stable under sampling from x if π P π P. If the family of priors is parametrised: P = {P α, α A} we sometimes say that α is a hyperparameter. Then, Bayesian inference can be made by just updating hyperparameters.

28 Basic setup Conjugate families Stability of hyper Markov properties Conjugate exponential families If L is a prior law over Θ and X = x is an observation from θ, L = L(θ X = x) denotes the posterior law over Θ. If L is hyper Markov w.r.t. G so is L. If L is strongly hyper Markov w.r.t. G so is L. In the latter case, the update of L is local to prime components, i.e. L (θ Q ) = L Q (θ Q) = L Q (θ Q X Q = x Q ) and the marginal distribution p of X is globally Markov w.r.t. G, where p(x) = P(X = x θ)l(dθ). Θ

29 Basic setup Conjugate families Stability of hyper Markov properties Conjugate exponential families For a k-dimensional exponential family p(x θ) = b(x)e θ t(x) ψ(θ) the standard conjugate family is given as π(θ a, κ) e θ a κψ(θ) for (a, κ) A R k R +, where A is determined so that the normalisation constant is finite. The conjugate family is stable under sampling and posterior updating from (x 1,..., x n ) with t = i t(x i) is then made as (a, κ ) = (a + t, κ + n).

30 Basic setup Conjugate families Stability of hyper Markov properties Conjugate exponential families Hyper inverse Wishart and laws Gaussian graphical models are canonical exponential families. The standard family of conjugate priors have densities π(k Φ, δ) (det K) δ/2 e tr(kφ), K S + (G). These laws are termed hyper inverse Wishart laws as Σ follows an inverse Wishart law for complete graphs. For chordal graphs, each marginal law L C of Σ C is inverse Wishart.

Decomposable and Directed Graphical Gaussian Models

Decomposable and Directed Graphical Gaussian Models Decomposable Decomposable and Directed Graphical Gaussian Models Graphical Models and Inference, Lecture 13, Michaelmas Term 2009 November 26, 2009 Decomposable Definition Basic properties Wishart density

More information

Probability Propagation

Probability Propagation Graphical Models, Lecture 12, Michaelmas Term 2010 November 19, 2010 Characterizing chordal graphs The following are equivalent for any undirected graph G. (i) G is chordal; (ii) G is decomposable; (iii)

More information

Probability Propagation

Probability Propagation Graphical Models, Lectures 9 and 10, Michaelmas Term 2009 November 13, 2009 Characterizing chordal graphs The following are equivalent for any undirected graph G. (i) G is chordal; (ii) G is decomposable;

More information

Likelihood Analysis of Gaussian Graphical Models

Likelihood Analysis of Gaussian Graphical Models Faculty of Science Likelihood Analysis of Gaussian Graphical Models Ste en Lauritzen Department of Mathematical Sciences Minikurs TUM 2016 Lecture 2 Slide 1/43 Overview of lectures Lecture 1 Markov Properties

More information

Structure estimation for Gaussian graphical models

Structure estimation for Gaussian graphical models Faculty of Science Structure estimation for Gaussian graphical models Steffen Lauritzen, University of Copenhagen Department of Mathematical Sciences Minikurs TUM 2016 Lecture 3 Slide 1/48 Overview of

More information

Elements of Graphical Models DRAFT.

Elements of Graphical Models DRAFT. Steffen L. Lauritzen Elements of Graphical Models DRAFT. Lectures from the XXXVIth International Probability Summer School in Saint-Flour, France, 2006 December 2, 2009 Springer Contents 1 Introduction...................................................

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Lecture 17: May 29, 2002

Lecture 17: May 29, 2002 EE596 Pat. Recog. II: Introduction to Graphical Models University of Washington Spring 2000 Dept. of Electrical Engineering Lecture 17: May 29, 2002 Lecturer: Jeff ilmes Scribe: Kurt Partridge, Salvador

More information

Conditional Independence and Markov Properties

Conditional Independence and Markov Properties Conditional Independence and Markov Properties Lecture 1 Saint Flour Summerschool, July 5, 2006 Steffen L. Lauritzen, University of Oxford Overview of lectures 1. Conditional independence and Markov properties

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

Probabilistic Graphical Models (I)

Probabilistic Graphical Models (I) Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random

More information

Maximum likelihood in log-linear models

Maximum likelihood in log-linear models Graphical Models, Lecture 4, Michaelmas Term 2010 October 22, 2010 Generating class Dependence graph of log-linear model Conformal graphical models Factor graphs Let A denote an arbitrary set of subsets

More information

Bayesian (conditionally) conjugate inference for discrete data models. Jon Forster (University of Southampton)

Bayesian (conditionally) conjugate inference for discrete data models. Jon Forster (University of Southampton) Bayesian (conditionally) conjugate inference for discrete data models Jon Forster (University of Southampton) with Mark Grigsby (Procter and Gamble?) Emily Webb (Institute of Cancer Research) Table 1:

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Total positivity in Markov structures

Total positivity in Markov structures 1 based on joint work with Shaun Fallat, Kayvan Sadeghi, Caroline Uhler, Nanny Wermuth, and Piotr Zwiernik (arxiv:1510.01290) Faculty of Science Total positivity in Markov structures Steffen Lauritzen

More information

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4 ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Bayesian Graphical Models

Bayesian Graphical Models Graphical Models and Inference, Lecture 16, Michaelmas Term 2009 December 4, 2009 Parameter θ, data X = x, likelihood L(θ x) p(x θ). Express knowledge about θ through prior distribution π on θ. Inference

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Directed Graphical Models

Directed Graphical Models CS 2750: Machine Learning Directed Graphical Models Prof. Adriana Kovashka University of Pittsburgh March 28, 2017 Graphical Models If no assumption of independence is made, must estimate an exponential

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Multivariate Gaussian Analysis

Multivariate Gaussian Analysis BS2 Statistical Inference, Lecture 7, Hilary Term 2009 February 13, 2009 Marginal and conditional distributions For a positive definite covariance matrix Σ, the multivariate Gaussian distribution has density

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

4.1 Notation and probability review

4.1 Notation and probability review Directed and undirected graphical models Fall 2015 Lecture 4 October 21st Lecturer: Simon Lacoste-Julien Scribe: Jaime Roquero, JieYing Wu 4.1 Notation and probability review 4.1.1 Notations Let us recall

More information

1 Bayesian Linear Regression (BLR)

1 Bayesian Linear Regression (BLR) Statistical Techniques in Robotics (STR, S15) Lecture#10 (Wednesday, February 11) Lecturer: Byron Boots Gaussian Properties, Bayesian Linear Regression 1 Bayesian Linear Regression (BLR) In linear regression,

More information

CS281A/Stat241A Lecture 19

CS281A/Stat241A Lecture 19 CS281A/Stat241A Lecture 19 p. 1/4 CS281A/Stat241A Lecture 19 Junction Tree Algorithm Peter Bartlett CS281A/Stat241A Lecture 19 p. 2/4 Announcements My office hours: Tuesday Nov 3 (today), 1-2pm, in 723

More information

Graphical Models with Symmetry

Graphical Models with Symmetry Wald Lecture, World Meeting on Probability and Statistics Istanbul 2012 Sparse graphical models with few parameters can describe complex phenomena. Introduce symmetry to obtain further parsimony so models

More information

Lecture 4 October 18th

Lecture 4 October 18th Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations

More information

Tutorial: Gaussian conditional independence and graphical models. Thomas Kahle Otto-von-Guericke Universität Magdeburg

Tutorial: Gaussian conditional independence and graphical models. Thomas Kahle Otto-von-Guericke Universität Magdeburg Tutorial: Gaussian conditional independence and graphical models Thomas Kahle Otto-von-Guericke Universität Magdeburg The central dogma of algebraic statistics Statistical models are varieties The central

More information

Inverse Wishart Distribution and Conjugate Bayesian Analysis

Inverse Wishart Distribution and Conjugate Bayesian Analysis Inverse Wishart Distribution and Conjugate Bayesian Analysis BS2 Statistical Inference, Lecture 14, Hilary Term 2008 March 2, 2008 Definition Testing for independence Hotelling s T 2 If W 1 W d (f 1, Σ)

More information

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional

More information

FLEXIBLE COVARIANCE ESTIMATION IN GRAPHICAL GAUSSIAN MODELS

FLEXIBLE COVARIANCE ESTIMATION IN GRAPHICAL GAUSSIAN MODELS Submitted to the Annals of Statistics FLEXIBLE COVARIANCE ESTIMATION IN GRAPHICAL GAUSSIAN MODELS By Bala Rajaratnam, Hélène Massam and Carlos M. Carvalho Stanford University, York University, The University

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 10 Undirected Models CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due this Wednesday (Nov 4) in class Project milestones due next Monday (Nov 9) About half

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Recitation 3 1 Gaussian Graphical Models: Schur s Complement Consider

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

FLEXIBLE COVARIANCE ESTIMATION IN GRAPHICAL GAUSSIAN MODELS

FLEXIBLE COVARIANCE ESTIMATION IN GRAPHICAL GAUSSIAN MODELS Submitted to the Annals of Statistics FLEXIBLE COVARIANCE ESTIMATION IN GRAPHICAL GAUSSIAN MODELS By Bala Rajaratnam, Hélène Massam and Carlos M. Carvalho Stanford University, York University, The University

More information

Markov properties for directed graphs

Markov properties for directed graphs Graphical Models, Lecture 7, Michaelmas Term 2009 November 2, 2009 Definitions Structural relations among Markov properties Factorization G = (V, E) simple undirected graph; σ Say σ satisfies (P) the pairwise

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

Undirected Graphical Models

Undirected Graphical Models Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

Learning P-maps Param. Learning

Learning P-maps Param. Learning Readings: K&F: 3.3, 3.4, 16.1, 16.2, 16.3, 16.4 Learning P-maps Param. Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 24 th, 2008 10-708 Carlos Guestrin 2006-2008

More information

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Problem Set 3 Issued: Thursday, September 25, 2014 Due: Thursday,

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction

More information

EM & Variational Bayes

EM & Variational Bayes EM & Variational Bayes Hanxiao Liu September 9, 2014 1 / 19 Outline 1. EM Algorithm 1.1 Introduction 1.2 Example: Mixture of vmfs 2. Variational Bayes 2.1 Introduction 2.2 Example: Bayesian Mixture of

More information

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm Probabilistic Graphical Models 10-708 Homework 2: Due February 24, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 4-8. You must complete all four problems to

More information

Chordal networks of polynomial ideals

Chordal networks of polynomial ideals Chordal networks of polynomial ideals Diego Cifuentes Laboratory for Information and Decision Systems Electrical Engineering and Computer Science Massachusetts Institute of Technology Joint work with Pablo

More information

BN Semantics 3 Now it s personal! Parameter Learning 1

BN Semantics 3 Now it s personal! Parameter Learning 1 Readings: K&F: 3.4, 14.1, 14.2 BN Semantics 3 Now it s personal! Parameter Learning 1 Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 22 nd, 2006 1 Building BNs from independence

More information

Nuisance parameters and their treatment

Nuisance parameters and their treatment BS2 Statistical Inference, Lecture 2, Hilary Term 2008 April 2, 2008 Ancillarity Inference principles Completeness A statistic A = a(x ) is said to be ancillary if (i) The distribution of A does not depend

More information

Bayesian Networks: Belief Propagation in Singly Connected Networks

Bayesian Networks: Belief Propagation in Singly Connected Networks Bayesian Networks: Belief Propagation in Singly Connected Networks Huizhen u janey.yu@cs.helsinki.fi Dept. Computer Science, Univ. of Helsinki Probabilistic Models, Spring, 2010 Huizhen u (U.H.) Bayesian

More information

An Introduction to Bayesian Machine Learning

An Introduction to Bayesian Machine Learning 1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Prof. Dr. Lars Schmidt-Thieme, L. B. Marinho, K. Buza Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course

Prof. Dr. Lars Schmidt-Thieme, L. B. Marinho, K. Buza Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course Course on Bayesian Networks, winter term 2007 0/31 Bayesian Networks Bayesian Networks I. Bayesian Networks / 1. Probabilistic Independence and Separation in Graphs Prof. Dr. Lars Schmidt-Thieme, L. B.

More information

Bayesian Model Comparison

Bayesian Model Comparison BS2 Statistical Inference, Lecture 11, Hilary Term 2009 February 26, 2009 Basic result An accurate approximation Asymptotic posterior distribution An integral of form I = b a e λg(y) h(y) dy where h(y)

More information

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018 Bayesian Methods David S. Rosenberg New York University March 20, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 March 20, 2018 1 / 38 Contents 1 Classical Statistics 2 Bayesian

More information

Inference in Bayesian Networks

Inference in Bayesian Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)

More information

Linear Models A linear model is defined by the expression

Linear Models A linear model is defined by the expression Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose

More information

Clique trees & Belief Propagation. Siamak Ravanbakhsh Winter 2018

Clique trees & Belief Propagation. Siamak Ravanbakhsh Winter 2018 Graphical Models Clique trees & Belief Propagation Siamak Ravanbakhsh Winter 2018 Learning objectives message passing on clique trees its relation to variable elimination two different forms of belief

More information

Linear-Time Inverse Covariance Matrix Estimation in Gaussian Processes

Linear-Time Inverse Covariance Matrix Estimation in Gaussian Processes Linear-Time Inverse Covariance Matrix Estimation in Gaussian Processes Joseph Gonzalez Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 jegonzal@cs.cmu.edu Sue Ann Hong Computer

More information

Statistical Learning

Statistical Learning Statistical Learning Lecture 5: Bayesian Networks and Graphical Models Mário A. T. Figueiredo Instituto Superior Técnico & Instituto de Telecomunicações University of Lisbon, Portugal May 2018 Mário A.

More information

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio Estimation of reliability parameters from Experimental data (Parte 2) This lecture Life test (t 1,t 2,...,t n ) Estimate θ of f T t θ For example: λ of f T (t)= λe - λt Classical approach (frequentist

More information

Introduction to Probabilistic Graphical Models: Exercises

Introduction to Probabilistic Graphical Models: Exercises Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics

More information

Review: Directed Models (Bayes Nets)

Review: Directed Models (Bayes Nets) X Review: Directed Models (Bayes Nets) Lecture 3: Undirected Graphical Models Sam Roweis January 2, 24 Semantics: x y z if z d-separates x and y d-separation: z d-separates x from y if along every undirected

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Notes on Markov Networks

Notes on Markov Networks Notes on Markov Networks Lili Mou moull12@sei.pku.edu.cn December, 2014 This note covers basic topics in Markov networks. We mainly talk about the formal definition, Gibbs sampling for inference, and maximum

More information

Symbolic Variable Elimination in Discrete and Continuous Graphical Models. Scott Sanner Ehsan Abbasnejad

Symbolic Variable Elimination in Discrete and Continuous Graphical Models. Scott Sanner Ehsan Abbasnejad Symbolic Variable Elimination in Discrete and Continuous Graphical Models Scott Sanner Ehsan Abbasnejad Inference for Dynamic Tracking No one previously did this inference exactly in closed-form! Exact

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

More on nuisance parameters

More on nuisance parameters BS2 Statistical Inference, Lecture 3, Hilary Term 2009 January 30, 2009 Suppose that there is a minimal sufficient statistic T = t(x ) partitioned as T = (S, C) = (s(x ), c(x )) where: C1: the distribution

More information

Bayesian model selection in graphs by using BDgraph package

Bayesian model selection in graphs by using BDgraph package Bayesian model selection in graphs by using BDgraph package A. Mohammadi and E. Wit March 26, 2013 MOTIVATION Flow cytometry data with 11 proteins from Sachs et al. (2005) RESULT FOR CELL SIGNALING DATA

More information

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Matthew S. Johnson New York ASA Chapter Workshop CUNY Graduate Center New York, NY hspace1in December 17, 2009 December

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning

More information

A structural Markov property for decomposable graph laws that allows control of clique intersections

A structural Markov property for decomposable graph laws that allows control of clique intersections Biometrika (2017), 00, 0,pp. 1 11 Printed in Great Britain doi: 10.1093/biomet/asx072 A structural Markov property for decomposable graph laws that allows control of clique intersections BY PETER J. GREEN

More information

MATHEMATICAL ENGINEERING TECHNICAL REPORTS. Boundary cliques, clique trees and perfect sequences of maximal cliques of a chordal graph

MATHEMATICAL ENGINEERING TECHNICAL REPORTS. Boundary cliques, clique trees and perfect sequences of maximal cliques of a chordal graph MATHEMATICAL ENGINEERING TECHNICAL REPORTS Boundary cliques, clique trees and perfect sequences of maximal cliques of a chordal graph Hisayuki HARA and Akimichi TAKEMURA METR 2006 41 July 2006 DEPARTMENT

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Elias Tragas tragas@cs.toronto.edu October 3, 2016 Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, 2016 1 / 23 Naive Bayes Bayes Rules: Naive

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Variational Scoring of Graphical Model Structures

Variational Scoring of Graphical Model Structures Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

More Spectral Clustering and an Introduction to Conjugacy

More Spectral Clustering and an Introduction to Conjugacy CS8B/Stat4B: Advanced Topics in Learning & Decision Making More Spectral Clustering and an Introduction to Conjugacy Lecturer: Michael I. Jordan Scribe: Marco Barreno Monday, April 5, 004. Back to spectral

More information

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Mengye Ren mren@cs.toronto.edu October 18, 2015 Mengye Ren Naive Bayes and Gaussian Bayes Classifier October 18, 2015 1 / 21 Naive Bayes Bayes Rules: Naive Bayes

More information

Bayesian Inference. Chapter 9. Linear models and regression

Bayesian Inference. Chapter 9. Linear models and regression Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering

More information