Decomposable and Directed Graphical Gaussian Models

Size: px
Start display at page:

Download "Decomposable and Directed Graphical Gaussian Models"

Transcription

1 Decomposable Decomposable and Directed Graphical Gaussian Models Graphical Models and Inference, Lecture 13, Michaelmas Term 2009 November 26, 2009

2 Decomposable Definition Basic properties Wishart density The Wishart distribution is the sampling distribution of the matrix of sums of squares and products. More precisely: A random d d matrix W has a d-dimensional Wishart distribution with parameter Σ and n degrees of freedom if W D = n X ν (X ν ) i=1 where X ν N d (0, Σ). We then write W W d (n, Σ). The Wishart is the multivariate analogue to the χ 2 : W 1 (n, σ 2 ) = σ 2 χ 2 (n). If W W d (n, Σ) its mean is E(W ) = nσ.

3 Decomposable Definition Basic properties Wishart density If W 1 and W 2 are independent with W i W d (n i, Σ), then W 1 + W 2 W d (n 1 + n 2, Σ). If A is an r d matrix and W W d (n, Σ), then AWA W r (n, AΣA ). For r = 1 we get that when W W d (n, Σ) and λ R d, λ W λ σλ 2 χ2 (n), where σλ 2 = λ Σλ.

4 Decomposable Definition Basic properties Wishart density If W W d (n, Σ), where Σ is regular, then W is regular with probability one if and only if n d. When n d the Wishart distribution has density f d (w n, Σ) = c(d, n) 1 (det Σ) n/2 (det w) (n d 1)/2 e tr(σ 1 w)/2 for w positive definite, and 0 otherwise. The Wishart constant c(d, n) is c(d, n) = 2 nd/2 (2π) d(d 1)/4 d Γ{(n + 1 i)/2}. i=1

5 Decomposable Definition and likelihood function Iterative Proportional Scaling Consider X = (X v, v V ) N V (0, Σ) with Σ regular and K = Σ 1. The concentration matrix of the conditional distribution of (X α, X β ) given X V \{α,β} is Hence ( kαα k K {α,β} = αβ k βα k ββ ). α β V \ {α, β} k αβ = 0. Thus the dependence graph G(K) of a regular Gaussian distribution is given by α β k αβ = 0.

6 Decomposable Definition and likelihood function Iterative Proportional Scaling S(G) denotes the symmetric matrices A with a αβ = 0 unless α β and S + (G) their positive definite elements. A Gaussian graphical model for X specifies X as multivariate normal with K S + (G) and otherwise unknown. The likelihood function based on a sample of size n is L(K) (det K) n/2 e tr(kw )/2, where W is the Wishart matrix of sums of squares and products, W W V (n, Σ) with Σ 1 = K S + (G).

7 Decomposable Definition and likelihood function Iterative Proportional Scaling Define the matrices A u, u V E as those with elements 1 if u V and i = j = u aij u = 1 if u E and u = {i, j}. 0 otherwise. Then, as K S(G), K = v V k v A v + e E k e A e (1) and hence tr(kw ) = k v tr(a v W ) + k e tr(a e W ) v V e E

8 Decomposable Definition and likelihood function Iterative Proportional Scaling Hence we can identify the family as a (regular and canonical) exponential family with tr(a u W )/2, u V E as canonical sufficient statistics. This yields the likelihood equations tr(a u W ) = n tr(a u Σ), u V E. which can also be expressed as or, equivalently nˆσ vv = w vv, nˆσ αβ = w αβ, v V, {α, β} E. n ˆΣ cc = w cc for all cliques c C(G), We should remember the model restriction Σ 1 S + (G).

9 Decomposable Definition and likelihood function Iterative Proportional Scaling For K S + (G) and c C, define the operation of adjusting the c-marginal as follows. Let a = V \ c and ( n(wcc ) T c K = 1 + K ca (K aa ) 1 ) K ac K ca. (2) K ac K aa The C-marginal covariance Σ cc corresponding to the adjusted concentration matrix becomes Σ cc = {(T c K) 1 } cc = { n(w cc ) 1 + K ca (K aa ) 1 K ac K ca (K aa ) 1 K ac } 1 = w cc /n, hence T c K does indeed adjust the marginals. From (2) it is seen that the pattern of zeros in K is preserved under the operation T c, and it can also be seen to stay positive definite.

10 Decomposable Definition and likelihood function Iterative Proportional Scaling Next we choose any ordering (c 1,..., c k ) of the cliques in G. Choose further K 0 = I and define for r = 0, 1,... K r+1 = (T c1 T ck )K r. Then we have: Consider a sample from a covariance selection model with graph G. Then ˆK = lim r K r, provided the maximum likelihood estimate ˆK of K exists.

11 Decomposable Basic factorizations Maximum likelihood estimates An example If the graph G is chordal, we say that the graphical model is decomposable. In this case, the IPS-algorithm converges in a finite number of steps, as in the discrete case. We also have the familiar factorization of densities C C f (x Σ) = f (x C Σ C ) S S f (x S Σ S ) ν(s) (3) where ν(s) is the number of times S appear as intersection between neighbouring cliques of a junction tree for C.

12 Decomposable Relations for trace and determinant Basic factorizations Maximum likelihood estimates An example Using the factorization (3) we can for example match the expressions for the trace and determinant of Σ tr(kw ) = C C tr(k C W C ) S S ν(s) tr(k S W S ) and further det Σ = {det(k)} 1 = C C det{σ C } S S {det(σ S)} ν(s) These are some of many relations that can be derived using the decomposition property of chordal graphs.

13 Decomposable Basic factorizations Maximum likelihood estimates An example The same factorization clearly holds for the maximum likelihood estimates: C C f (x ˆΣ) = f (x C ˆΣ C ) S S f (x (4) S ˆΣ S ) ν(s) Moreover, it follows from the general likelihood equations that ˆΣ A = W A /n whenever A is complete. Exploiting this, we can obtain an explicit formula for the maximum likelihood estimate in the case of a chordal graph.

14 Decomposable Basic factorizations Maximum likelihood estimates An example For a d e matrix A = {a γµ } γ d,µ e we let [A] V denote the matrix obtained from A by filling up with zero entries to obtain full dimension V V, i.e. ( ) { [A] V = aγµ if γ d, µ e γµ 0 otherwise. The maximum likelihood estimates exists if and only if n C for all C C. Then the following simple formula holds for the maximum likelihood estimate of K: { } ˆK = n [(w C ) 1] V ν(s) [(w S ) 1] V. C C S S

15 Decomposable Mathematics marks Basic factorizations Maximum likelihood estimates An example 2:Vectors 4:Analysis 3:Algebra 1:Mechanics 5:Statistics This graph is chordal with cliques {1, 2, 3}, {3, 4, 5} with separator S = {3} having ν({3}) = 1.

16 Decomposable Basic factorizations Maximum likelihood estimates An example Since one degree of freedom is lost by subtracting the average, we get in this example ˆK = 87 w 11 [123] w 12 [123] w 13 [123] 0 0 w 21 [123] w 22 [123] w 23 [123] 0 0 w 31 [123] w 32 [123] w 33 [123] + w 33 [345] 1/w 33 w 34 [345] w 35 [345] 0 0 w 43 [345] w 44 [345] w 45 [345] 0 0 w 53 [345] w 54 [345] w 55 [345] where w ij [123] is the ijth element of the inverse of and so on. W [123] = w 11 w 12 w 13 w 21 w 22 w 23 w 31 w 32 w 33

17 Decomposable Definition A simple example More general systems Maximum likelihood estimation Consider a directed acyclic graph D and associate for every vertex a random variable X v. Consider now the equation system X v α v X pa(v) + β v + U v, v V (5) where U v, v V are independent random disturbances with U v N (0, σ 2 v ). Such an equation system is known as a recursive structural equation system. Structural equation systems are used heavily in social sciences and in economics. The term structural refers to the fact that the equations are assumed to be stable under intervention so that fixing a value of x v would change the system only by removing the line in the equation system (5) defining x v.

18 Decomposable Definition A simple example More general systems Maximum likelihood estimation A recursive structural equation system defines a multivariate Gaussian distribution which satisfies the directed Markov property of D since the joint density becomes f (x α, σ) = v (2π) 1/2 σv 1 e = (2π) V /2 ( v e v σ 1 v (xv α v x pa(v) βv )2 ) (xv α v x pa(v) βv )2 2σv 2, from which the joint concentration matrix K can easily be derived. 2σ 2 v

19 Decomposable Definition A simple example More general systems Maximum likelihood estimation Consider the system X 1 U 1 X 2 U 2 X 3 α 31 X 1 + U 3 X 4 α 42 X 2 + α 43 X 3 + U 4. The quadratic expression in the exponent becomes x1 2 σ1 2 + x 2 2 σ (x 3 α 31 x 1 ) 2 σ (x 4 α 42 x 2 α 43 x 3 ) 2 σ4 2.

20 Decomposable Definition A simple example More general systems Maximum likelihood estimation Expanding the squares and identifying terms yields the concentration matrix 1 + α2 σ 2 31 α σ3 2 σ α2 σ α 42 α 43 α 42 σ4 2 σ4 2 σ4 2. α 31 α 42 α α2 σ3 2 σ4 2 σ α 43 σ4 2 σ4 2 0 α 42 σ 2 4 α 43 σ σ 2 4

21 Decomposable Definition A simple example More general systems Maximum likelihood estimation The covariance matrix can in principle be found by inverting the above. However, it is easier to express X in terms of the Us as X 1 = U 1 X 2 = U 2 X 3 = α 31 U 1 + U 3 X 4 = α 43 α 31 U 1 + α 42 U 2 + α 43 U 3 + U 4 and then calculate the covariances directly to obtain σ1 2 0 α 31 σ1 2 α 43 α 31 σ1 2 0 σ2 2 0 α 42 σ2 2 α 31 σ1 2 0 σ3 2 + α2 31 σ2 1 α 43 α31 2 σ2 1 + α 43σ3 2, α 43 α 31 σ1 2 α 42 σ2 2 α 43 α31 2 σ2 1 + α 43σ3 2 ω4 2 where ω 2 4 = α 2 43α 2 31σ α 2 42σ α 2 43σ σ 2 4.

22 Decomposable Definition A simple example More general systems Maximum likelihood estimation Systems of structural equations of the type considered are called recursive in contrast to feedback systems of equations where directed cycles in the corresponding graph are allowed. It is also customary to allow correlations between the disturbance terms. This leads to violation of the directed Markov property, so other types of graph and Markov properties must be considered to deal correctly with such models. This type of model and analysis goes back to the geneticist Sewall Wright who coined the term path analysis to the calculus of effects based on this kind of models. This is one of the early precursors for modern graphical modelling.

23 Decomposable Definition A simple example More general systems Maximum likelihood estimation A directed graphical Gaussian model generated by a linear recursive structural equation system is equivalent to a corresponding undirected graphical model if and only if D is perfect, in which case its skeleton σ(d) is chordal. Still, even for non-perfect DAGs the MLE is easy to find for linear recursive structural equation systems: For each v, linear regression of data X ν v, ν = 1,... N onto parents X pa(v) is performed and corresponding regression coefficients ˆα v and residual variance estimates ˆσ 2 v are calculated in the usual way.

Decomposable Graphical Gaussian Models

Decomposable Graphical Gaussian Models CIMPA Summerschool, Hammamet 2011, Tunisia September 12, 2011 Basic algorithm This simple algorithm has complexity O( V + E ): 1. Choose v 0 V arbitrary and let v 0 = 1; 2. When vertices {1, 2,..., j}

More information

Likelihood Analysis of Gaussian Graphical Models

Likelihood Analysis of Gaussian Graphical Models Faculty of Science Likelihood Analysis of Gaussian Graphical Models Ste en Lauritzen Department of Mathematical Sciences Minikurs TUM 2016 Lecture 2 Slide 1/43 Overview of lectures Lecture 1 Markov Properties

More information

Multivariate Gaussian Analysis

Multivariate Gaussian Analysis BS2 Statistical Inference, Lecture 7, Hilary Term 2009 February 13, 2009 Marginal and conditional distributions For a positive definite covariance matrix Σ, the multivariate Gaussian distribution has density

More information

Structure estimation for Gaussian graphical models

Structure estimation for Gaussian graphical models Faculty of Science Structure estimation for Gaussian graphical models Steffen Lauritzen, University of Copenhagen Department of Mathematical Sciences Minikurs TUM 2016 Lecture 3 Slide 1/48 Overview of

More information

Wilks Λ and Hotelling s T 2.

Wilks Λ and Hotelling s T 2. Wilks Λ and. Steffen Lauritzen, University of Oxford BS2 Statistical Inference, Lecture 13, Hilary Term 2008 March 2, 2008 If X and Y are independent, X Γ(α x, γ), and Y Γ(α y, γ), then the ratio X /(X

More information

Probability Propagation

Probability Propagation Graphical Models, Lectures 9 and 10, Michaelmas Term 2009 November 13, 2009 Characterizing chordal graphs The following are equivalent for any undirected graph G. (i) G is chordal; (ii) G is decomposable;

More information

Probability Propagation

Probability Propagation Graphical Models, Lecture 12, Michaelmas Term 2010 November 19, 2010 Characterizing chordal graphs The following are equivalent for any undirected graph G. (i) G is chordal; (ii) G is decomposable; (iii)

More information

Graphical Models with Symmetry

Graphical Models with Symmetry Wald Lecture, World Meeting on Probability and Statistics Istanbul 2012 Sparse graphical models with few parameters can describe complex phenomena. Introduce symmetry to obtain further parsimony so models

More information

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Recitation 3 1 Gaussian Graphical Models: Schur s Complement Consider

More information

Partitioned Covariance Matrices and Partial Correlations. Proposition 1 Let the (p + q) (p + q) covariance matrix C > 0 be partitioned as C = C11 C 12

Partitioned Covariance Matrices and Partial Correlations. Proposition 1 Let the (p + q) (p + q) covariance matrix C > 0 be partitioned as C = C11 C 12 Partitioned Covariance Matrices and Partial Correlations Proposition 1 Let the (p + q (p + q covariance matrix C > 0 be partitioned as ( C11 C C = 12 C 21 C 22 Then the symmetric matrix C > 0 has the following

More information

CS281A/Stat241A Lecture 19

CS281A/Stat241A Lecture 19 CS281A/Stat241A Lecture 19 p. 1/4 CS281A/Stat241A Lecture 19 Junction Tree Algorithm Peter Bartlett CS281A/Stat241A Lecture 19 p. 2/4 Announcements My office hours: Tuesday Nov 3 (today), 1-2pm, in 723

More information

Total positivity in Markov structures

Total positivity in Markov structures 1 based on joint work with Shaun Fallat, Kayvan Sadeghi, Caroline Uhler, Nanny Wermuth, and Piotr Zwiernik (arxiv:1510.01290) Faculty of Science Total positivity in Markov structures Steffen Lauritzen

More information

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

Markov properties for directed graphs

Markov properties for directed graphs Graphical Models, Lecture 7, Michaelmas Term 2009 November 2, 2009 Definitions Structural relations among Markov properties Factorization G = (V, E) simple undirected graph; σ Say σ satisfies (P) the pairwise

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Elements of Graphical Models DRAFT.

Elements of Graphical Models DRAFT. Steffen L. Lauritzen Elements of Graphical Models DRAFT. Lectures from the XXXVIth International Probability Summer School in Saint-Flour, France, 2006 December 2, 2009 Springer Contents 1 Introduction...................................................

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Lecture 17: May 29, 2002

Lecture 17: May 29, 2002 EE596 Pat. Recog. II: Introduction to Graphical Models University of Washington Spring 2000 Dept. of Electrical Engineering Lecture 17: May 29, 2002 Lecturer: Jeff ilmes Scribe: Kurt Partridge, Salvador

More information

CS Lecture 3. More Bayesian Networks

CS Lecture 3. More Bayesian Networks CS 6347 Lecture 3 More Bayesian Networks Recap Last time: Complexity challenges Representing distributions Computing probabilities/doing inference Introduction to Bayesian networks Today: D-separation,

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Gaussian Models (9/9/13)

Gaussian Models (9/9/13) STA561: Probabilistic machine learning Gaussian Models (9/9/13) Lecturer: Barbara Engelhardt Scribes: Xi He, Jiangwei Pan, Ali Razeen, Animesh Srivastava 1 Multivariate Normal Distribution The multivariate

More information

Gaussian Graphical Models: An Algebraic and Geometric Perspective

Gaussian Graphical Models: An Algebraic and Geometric Perspective Gaussian Graphical Models: An Algebraic and Geometric Perspective Caroline Uhler arxiv:707.04345v [math.st] 3 Jul 07 Abstract Gaussian graphical models are used throughout the natural sciences, social

More information

Linear-Time Inverse Covariance Matrix Estimation in Gaussian Processes

Linear-Time Inverse Covariance Matrix Estimation in Gaussian Processes Linear-Time Inverse Covariance Matrix Estimation in Gaussian Processes Joseph Gonzalez Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 jegonzal@cs.cmu.edu Sue Ann Hong Computer

More information

MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models

MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models 1/13 MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models Dominique Guillot Departments of Mathematical Sciences University of Delaware May 4, 2016 Recall

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4 ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian

More information

Exact Inference I. Mark Peot. In this lecture we will look at issues associated with exact inference. = =

Exact Inference I. Mark Peot. In this lecture we will look at issues associated with exact inference. = = Exact Inference I Mark Peot In this lecture we will look at issues associated with exact inference 10 Queries The objective of probabilistic inference is to compute a joint distribution of a set of query

More information

Probabilistic Graphical Models (I)

Probabilistic Graphical Models (I) Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

Inverse Wishart Distribution and Conjugate Bayesian Analysis

Inverse Wishart Distribution and Conjugate Bayesian Analysis Inverse Wishart Distribution and Conjugate Bayesian Analysis BS2 Statistical Inference, Lecture 14, Hilary Term 2008 March 2, 2008 Definition Testing for independence Hotelling s T 2 If W 1 W d (f 1, Σ)

More information

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Problem Set 3 Issued: Thursday, September 25, 2014 Due: Thursday,

More information

Asymptotic standard errors of MLE

Asymptotic standard errors of MLE Asymptotic standard errors of MLE Suppose, in the previous example of Carbon and Nitrogen in soil data, that we get the parameter estimates For maximum likelihood estimation, we can use Hessian matrix

More information

Tutorial: Gaussian conditional independence and graphical models. Thomas Kahle Otto-von-Guericke Universität Magdeburg

Tutorial: Gaussian conditional independence and graphical models. Thomas Kahle Otto-von-Guericke Universität Magdeburg Tutorial: Gaussian conditional independence and graphical models Thomas Kahle Otto-von-Guericke Universität Magdeburg The central dogma of algebraic statistics Statistical models are varieties The central

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

MATH 829: Introduction to Data Mining and Analysis Graphical Models I

MATH 829: Introduction to Data Mining and Analysis Graphical Models I MATH 829: Introduction to Data Mining and Analysis Graphical Models I Dominique Guillot Departments of Mathematical Sciences University of Delaware May 2, 2016 1/12 Independence and conditional independence:

More information

WISHART/RIETZ DISTRIBUTIONS AND DECOMPOSABLE UNDIRECTED GRAPHS BY STEEN ANDERSSON INDIANA UNIVERSITY AND THOMAS KLEIN AUGSBURG UNIVERSITÄET

WISHART/RIETZ DISTRIBUTIONS AND DECOMPOSABLE UNDIRECTED GRAPHS BY STEEN ANDERSSON INDIANA UNIVERSITY AND THOMAS KLEIN AUGSBURG UNIVERSITÄET . WISHART/RIETZ DISTRIBUTIONS AND DECOMPOSABLE UNDIRECTED GRAPHS BY STEEN ANDERSSON INDIANA UNIVERSITY AND THOMAS KLEIN AUGSBURG UNIVERSITÄET 1 TARTU 6JUN007 Classical Wishart distributions. Let V be a

More information

Maximum likelihood in log-linear models

Maximum likelihood in log-linear models Graphical Models, Lecture 4, Michaelmas Term 2010 October 22, 2010 Generating class Dependence graph of log-linear model Conformal graphical models Factor graphs Let A denote an arbitrary set of subsets

More information

Graphical Gaussian Models with Edge and Vertex Symmetries

Graphical Gaussian Models with Edge and Vertex Symmetries Graphical Gaussian Models with Edge and Vertex Symmetries Søren Højsgaard Aarhus University, Denmark Steffen L. Lauritzen University of Oxford, United Kingdom Summary. In this paper we introduce new types

More information

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Sargur Srihari srihari@cedar.buffalo.edu 1 Topics 1. What are probabilistic graphical models (PGMs) 2. Use of PGMs Engineering and AI 3. Directionality in

More information

Multivariate Regression

Multivariate Regression Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the

More information

3. Probability and Statistics

3. Probability and Statistics FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important

More information

Undirected Graphical Models

Undirected Graphical Models Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates

More information

1. The Multivariate Classical Linear Regression Model

1. The Multivariate Classical Linear Regression Model Business School, Brunel University MSc. EC550/5509 Modelling Financial Decisions and Markets/Introduction to Quantitative Methods Prof. Menelaos Karanasos (Room SS69, Tel. 08956584) Lecture Notes 5. The

More information

Lecture 4 October 18th

Lecture 4 October 18th Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations

More information

Conditional Independence and Markov Properties

Conditional Independence and Markov Properties Conditional Independence and Markov Properties Lecture 1 Saint Flour Summerschool, July 5, 2006 Steffen L. Lauritzen, University of Oxford Overview of lectures 1. Conditional independence and Markov properties

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

Next tool is Partial ACF; mathematical tools first. The Multivariate Normal Distribution. e z2 /2. f Z (z) = 1 2π. e z2 i /2

Next tool is Partial ACF; mathematical tools first. The Multivariate Normal Distribution. e z2 /2. f Z (z) = 1 2π. e z2 i /2 Next tool is Partial ACF; mathematical tools first. The Multivariate Normal Distribution Defn: Z R 1 N(0,1) iff f Z (z) = 1 2π e z2 /2 Defn: Z R p MV N p (0, I) if and only if Z = (Z 1,..., Z p ) (a column

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

An Introduction to Multivariate Statistical Analysis

An Introduction to Multivariate Statistical Analysis An Introduction to Multivariate Statistical Analysis Third Edition T. W. ANDERSON Stanford University Department of Statistics Stanford, CA WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

Algebraic Representations of Gaussian Markov Combinations

Algebraic Representations of Gaussian Markov Combinations Submitted to the Bernoulli Algebraic Representations of Gaussian Markov Combinations M. SOFIA MASSA 1 and EVA RICCOMAGNO 2 1 Department of Statistics, University of Oxford, 1 South Parks Road, Oxford,

More information

Eigenvalues and diagonalization

Eigenvalues and diagonalization Eigenvalues and diagonalization Patrick Breheny November 15 Patrick Breheny BST 764: Applied Statistical Modeling 1/20 Introduction The next topic in our course, principal components analysis, revolves

More information

Fisher Information in Gaussian Graphical Models

Fisher Information in Gaussian Graphical Models Fisher Information in Gaussian Graphical Models Jason K. Johnson September 21, 2006 Abstract This note summarizes various derivations, formulas and computational algorithms relevant to the Fisher information

More information

Eco517 Fall 2014 C. Sims FINAL EXAM

Eco517 Fall 2014 C. Sims FINAL EXAM Eco517 Fall 2014 C. Sims FINAL EXAM This is a three hour exam. You may refer to books, notes, or computer equipment during the exam. You may not communicate, either electronically or in any other way,

More information

Parametrizations of Discrete Graphical Models

Parametrizations of Discrete Graphical Models Parametrizations of Discrete Graphical Models Robin J. Evans www.stat.washington.edu/ rje42 10th August 2011 1/34 Outline 1 Introduction Graphical Models Acyclic Directed Mixed Graphs Two Problems 2 Ingenuous

More information

Y1 Y2 Y3 Y4 Y1 Y2 Y3 Y4 Z1 Z2 Z3 Z4

Y1 Y2 Y3 Y4 Y1 Y2 Y3 Y4 Z1 Z2 Z3 Z4 Inference: Exploiting Local Structure aphne Koller Stanford University CS228 Handout #4 We have seen that N inference exploits the network structure, in particular the conditional independence and the

More information

Introduction to Matrix Algebra and the Multivariate Normal Distribution

Introduction to Matrix Algebra and the Multivariate Normal Distribution Introduction to Matrix Algebra and the Multivariate Normal Distribution Introduction to Structural Equation Modeling Lecture #2 January 18, 2012 ERSH 8750: Lecture 2 Motivation for Learning the Multivariate

More information

A Bayesian Treatment of Linear Gaussian Regression

A Bayesian Treatment of Linear Gaussian Regression A Bayesian Treatment of Linear Gaussian Regression Frank Wood December 3, 2009 Bayesian Approach to Classical Linear Regression In classical linear regression we have the following model y β, σ 2, X N(Xβ,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 10 Undirected Models CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due this Wednesday (Nov 4) in class Project milestones due next Monday (Nov 9) About half

More information

Iterative Conditional Fitting for Gaussian Ancestral Graph Models

Iterative Conditional Fitting for Gaussian Ancestral Graph Models Iterative Conditional Fitting for Gaussian Ancestral Graph Models Mathias Drton Department of Statistics University of Washington Seattle, WA 98195-322 Thomas S. Richardson Department of Statistics University

More information

Parameter estimation in linear Gaussian covariance models

Parameter estimation in linear Gaussian covariance models Parameter estimation in linear Gaussian covariance models Caroline Uhler (IST Austria) Joint work with Piotr Zwiernik (UC Berkeley) and Donald Richards (Penn State University) Big Data Reunion Workshop

More information

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Ma 3/103: Lecture 24 Linear Regression I: Estimation Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the

More information

Markov Chains and Hidden Markov Models

Markov Chains and Hidden Markov Models Chapter 1 Markov Chains and Hidden Markov Models In this chapter, we will introduce the concept of Markov chains, and show how Markov chains can be used to model signals using structures such as hidden

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models STA 345: Multivariate Analysis Department of Statistical Science Duke University, Durham, NC, USA Robert L. Wolpert 1 Conditional Dependence Two real-valued or vector-valued

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Multivariate Analysis and Likelihood Inference

Multivariate Analysis and Likelihood Inference Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

1 Cricket chirps: an example

1 Cricket chirps: an example Notes for 2016-09-26 1 Cricket chirps: an example Did you know that you can estimate the temperature by listening to the rate of chirps? The data set in Table 1 1. represents measurements of the number

More information

STAT 350: Geometry of Least Squares

STAT 350: Geometry of Least Squares The Geometry of Least Squares Mathematical Basics Inner / dot product: a and b column vectors a b = a T b = a i b i a b a T b = 0 Matrix Product: A is r s B is s t (AB) rt = s A rs B st Partitioned Matrices

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

Markov Independence (Continued)

Markov Independence (Continued) Markov Independence (Continued) As an Undirected Graph Basic idea: Each variable V is represented as a vertex in an undirected graph G = (V(G), E(G)), with set of vertices V(G) and set of edges E(G) the

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

arxiv: v3 [stat.me] 10 Mar 2016

arxiv: v3 [stat.me] 10 Mar 2016 Submitted to the Annals of Statistics ESTIMATING THE EFFECT OF JOINT INTERVENTIONS FROM OBSERVATIONAL DATA IN SPARSE HIGH-DIMENSIONAL SETTINGS arxiv:1407.2451v3 [stat.me] 10 Mar 2016 By Preetam Nandy,,

More information

evaluate functions, expressed in function notation, given one or more elements in their domains

evaluate functions, expressed in function notation, given one or more elements in their domains Describing Linear Functions A.3 Linear functions, equations, and inequalities. The student writes and represents linear functions in multiple ways, with and without technology. The student demonstrates

More information

FLEXIBLE COVARIANCE ESTIMATION IN GRAPHICAL GAUSSIAN MODELS

FLEXIBLE COVARIANCE ESTIMATION IN GRAPHICAL GAUSSIAN MODELS Submitted to the Annals of Statistics FLEXIBLE COVARIANCE ESTIMATION IN GRAPHICAL GAUSSIAN MODELS By Bala Rajaratnam, Hélène Massam and Carlos M. Carvalho Stanford University, York University, The University

More information

Graphical Models and Independence Models

Graphical Models and Independence Models Graphical Models and Independence Models Yunshu Liu ASPITRG Research Group 2014-03-04 References: [1]. Steffen Lauritzen, Graphical Models, Oxford University Press, 1996 [2]. Christopher M. Bishop, Pattern

More information

CS 559: Machine Learning Fundamentals and Applications 2 nd Set of Notes

CS 559: Machine Learning Fundamentals and Applications 2 nd Set of Notes 1 CS 559: Machine Learning Fundamentals and Applications 2 nd Set of Notes Instructor: Philippos Mordohai Webpage: www.cs.stevens.edu/~mordohai E-mail: Philippos.Mordohai@stevens.edu Office: Lieb 215 Overview

More information

Next is material on matrix rank. Please see the handout

Next is material on matrix rank. Please see the handout B90.330 / C.005 NOTES for Wednesday 0.APR.7 Suppose that the model is β + ε, but ε does not have the desired variance matrix. Say that ε is normal, but Var(ε) σ W. The form of W is W w 0 0 0 0 0 0 w 0

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Example: multivariate Gaussian Distribution

Example: multivariate Gaussian Distribution School of omputer Science Probabilistic Graphical Models Representation of undirected GM (continued) Eric Xing Lecture 3, September 16, 2009 Reading: KF-chap4 Eric Xing @ MU, 2005-2009 1 Example: multivariate

More information

Dynamic Matrix-Variate Graphical Models A Synopsis 1

Dynamic Matrix-Variate Graphical Models A Synopsis 1 Proc. Valencia / ISBA 8th World Meeting on Bayesian Statistics Benidorm (Alicante, Spain), June 1st 6th, 2006 Dynamic Matrix-Variate Graphical Models A Synopsis 1 Carlos M. Carvalho & Mike West ISDS, Duke

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Lecture 20: November 1st

Lecture 20: November 1st 10-725: Optimization Fall 2012 Lecture 20: November 1st Lecturer: Geoff Gordon Scribes: Xiaolong Shen, Alex Beutel Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not

More information

The Wishart distribution Scaled Wishart. Wishart Priors. Patrick Breheny. March 28. Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/11

The Wishart distribution Scaled Wishart. Wishart Priors. Patrick Breheny. March 28. Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/11 Wishart Priors Patrick Breheny March 28 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/11 Introduction When more than two coefficients vary, it becomes difficult to directly model each element

More information

The Multivariate Gaussian Distribution [DRAFT]

The Multivariate Gaussian Distribution [DRAFT] The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs,

More information

Likelihood Methods. 1 Likelihood Functions. The multivariate normal distribution likelihood function is

Likelihood Methods. 1 Likelihood Functions. The multivariate normal distribution likelihood function is Likelihood Methods 1 Likelihood Functions The multivariate normal distribution likelihood function is The log of the likelihood, say L 1 is Ly = π.5n V.5 exp.5y Xb V 1 y Xb. L 1 = 0.5[N lnπ + ln V +y Xb

More information

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 10-708: Probabilistic Graphical Models, Spring 2015 26 : Spectral GMs Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 1 Introduction A common task in machine learning is to work with

More information

Math, Stats, and Mathstats Review ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

Math, Stats, and Mathstats Review ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD Math, Stats, and Mathstats Review ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD Outline These preliminaries serve to signal to students what tools they need to know to succeed in ECON 360 and refresh their

More information

Multivariate Bayesian Linear Regression MLAI Lecture 11

Multivariate Bayesian Linear Regression MLAI Lecture 11 Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate

More information

Maths for Signals and Systems Linear Algebra in Engineering

Maths for Signals and Systems Linear Algebra in Engineering Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 15, Tuesday 8 th and Friday 11 th November 016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR) IN SIGNAL PROCESSING IMPERIAL COLLEGE

More information

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm Probabilistic Graphical Models 10-708 Homework 2: Due February 24, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 4-8. You must complete all four problems to

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning

More information

A Note on the comparison of Nearest Neighbor Gaussian Process (NNGP) based models

A Note on the comparison of Nearest Neighbor Gaussian Process (NNGP) based models A Note on the comparison of Nearest Neighbor Gaussian Process (NNGP) based models arxiv:1811.03735v1 [math.st] 9 Nov 2018 Lu Zhang UCLA Department of Biostatistics Lu.Zhang@ucla.edu Sudipto Banerjee UCLA

More information