Chapter 17: Undirected Graphical Models

Size: px
Start display at page:

Download "Chapter 17: Undirected Graphical Models"

Transcription

1 Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University October 30, 2014 Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30

2 Overview 1 Introduction Probabilistic Graphical Models Review: Multivariate Statistics Review: Matrix Operations 2 Undirected Graphical Models for Continuous Variables Connection with Multiple Linear Regression Estimation of Parameters with Known Structure Estimation of Graph Structure 3 Undirected Graphical Models for Discrete Variables Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30

3 Introduction Probabilistic Graphical Models What is Probabilistic Graphical Models A graph consists of a set of vertices (nodes), along with a set of edges joining some pairs of the vertices. In graphical models, each vertex represents a random variable, and the graph gives a visual way of understanding the joint distribution of the entire set of random variables. Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30

4 Introduction Probabilistic Graphical Models How it works Categories of PGM Directed Graphical Models, a.k.a. Bayesian Networks Undirected Graphical Models, a.k.a. Markov Random Field Computational Tasks of PGM Structuring, choosing the structure of the graph; Learning, estimating the edge parameters from data; and Inference, computing marginal vertex probabilities and expectations from their joint distribution. Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30

5 Introduction MultiVariate Normal Distribution Review: Multivariate Statistics The MVN distribution is a generalization of the univariate normal distribution which has the density function (p.d.f.) f (x) = 1 { exp 2πσ } (x µ)2 2σ 2 where µ is mean of distribution, σ 2 is variance. In p-dimensions the density becomes { 1 f (x) = (2π) p/2 exp 1 } Σ 1/2 2 (x µ)t Σ 1 (x µ) where µ is a p-dimensional mean vector and Σ is a symmetric covariance matrix. Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30

6 Introduction Conditional Probability of MVN Review: Multivariate Statistics [ ] X1 Let X = be a partitioned MVN random p-vector, with mean X [ ] 2 µ1 µ = and covariance matrix µ 2 [ ] Σ11 Σ Σ = 12. Σ 21 Σ 22 The conditional distribution of X 2 given X 1 = x 1 is an MVN with E(X 2 X 1 = x 1 ) = µ 2 + Σ 21 Σ 1 11 (x 1 µ 1 ) Cov(X 2 X 1 = x 1 ) = Σ 22 Σ 21 Σ 1 11 Σ 12 Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30

7 Introduction Review: Matrix Operations Matrix Trace In Linear Algebra, the trace of an n-by-n square matrix A is defined to be the sum of the elements on the main diagonal of A, i.e., tr(a) = a 11 + a a nn = n a ii. i=1 Matrix trace has several basic properties: tr(a + B) = tr(a) + tr(b) tr(ab) = tr(ba) Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30

8 Undirected Graphical Models for Continuous Variables Estimation of Parameters with Known Graph Structure Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30

9 Undirected Graphical Models for Continuous Variables What is Parameter Estimation Connection with Multiple Linear Regression Given empirical covariance matrix S, find the optimal estimation ˆΣ = W and its inverse ˆΣ 1 = Θ. In particular, if the ijth component of Θ is zero, then variable i and j are conditionally independent, given the other variables. In other words, there is no edge connection between vertex i and j. Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30

10 Undirected Graphical Models for Continuous Variables Connection with Multiple Linear Regression Conditional Mean and Multiple Linear Regression Suppose we partition X = (Z, Y ) where Z = (X 1,..., X p 1 ) and Y = X p. Then we have the conditional distribution of Y given Z (Eq. (17.6)) (Y Z = z) N (µ Y + (z µ Z ) T Σ 1 ZZ σ ZY, σ YY σ 1 ZY Σ 1 ZZ σ ZY ) where we have partitioned Σ as (Eq. (17.7)) [ ] ΣZZ σ Σ = ZY. σ T ZY σ YY The conditional mean in Eq. (17.6) has exactly the same form as the population multiple linear regression of Y on Z, with regression coefficient β = Σ 1 ZZ σ ZY. (Proof on next page) Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30

11 Proof Undirected Graphical Models for Continuous Variables Connection with Multiple Linear Regression Given Eq. (2.9), we have expected prediction error as EPE(f ) = E(y f (z)) 2 = E(y z T β) 2 = E[y 2 2yz T β + β T zz T β] By differentiating the expected function, we have [ depe(f ) d(y 2 2yz T β + β T zz T ] β) = E dβ dβ [ ] = E 2yz + 2zz T β = 0 Then we derive β = E(zz T ) 1 E[yz] = Σ 1 ZZ σ ZY. Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30

12 Undirected Graphical Models for Continuous Variables How to Solve its Inverse Θ Connection with Multiple Linear Regression The standard formulas for partitioned inverses give ΣΘ = I, i.e., [ ] [ ] [ ] ΣZZ σ ZY ΘZZ θ ZY I 0 = 0 T. 1 Then we derive σ T ZY σ YY θ T ZY θ YY Σ ZZ θ ZY + σ ZY θ YY = 0 σ T ZY θ ZY + σ YY θ YY = 1 To solve these two equations, we have Eq. (17.8) θ ZY = θ YY Σ 1 ZZ σ ZY where 1/θ YY = σ YY σzy T Σ 1 ZZ σ ZY > 0. And hence, we have Eq. (17.9) β = Σ 1 ZZ σ ZY = θ ZY /θ YY. Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30

13 Undirected Graphical Models for Continuous Variables What We Have Learned Connection with Multiple Linear Regression The dependence of Y on Z in (17.6) is in the mean term alone. Here we see exactly that zero elements in β and hence θ ZY mean that the corresponding elements of Z are conditionally independent of Y. We can learn about this dependence structure through Multiple Linear Regression. Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30

14 Undirected Graphical Models for Continuous Variables Maximum Likelihood Estimation of MVN Estimation of Parameters with Known Structure Let X T = (x 1,..., x N ) be sampled from N p (µ, Σ). And the MLE of µ and Σ are the sample mean and empirical covariance (Eq. (17.10)) ˆµ = x = 1 N x i N ˆΣ = S = 1 N i=1 N (x i x)(x i x) T i=1 The likelihood function is a function of the parameters µ and Σ given the data X N L(µ, Σ X) = f (x i µ, Σ) i=1 = (2π) Np 2 Σ N 2 exp { 1 2 } N (x i µ) T Σ 1 (x i µ) Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30 i=1

15 Undirected Graphical Models for Continuous Variables Log-likelihood Estimation of Parameters with Known Structure Then the log-likelihood of the data can be written as l(µ, Σ) = 2 log L(µ, Σ X) N = N log Σ + (x i µ) T Σ 1 (x i µ) + C which is equivalent to Eq. (17.11) since l(θ) = log det Θ tr(sθ) = log Σ tr( (x i µ) (x i µ) T Θ) i i=1 = log Σ i = log Σ i tr((x i µ) T Θ (x i µ)) (x i µ) T Σ 1 (x i µ) Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30

16 Undirected Graphical Models for Continuous Variables Missing Edges: Equality Constraints Estimation of Parameters with Known Structure Now, we would like to maximize the log-likelihood under the constraints that some pre-defined subset of the parameters are zero. maximize Θ subject to l C (Θ) = log det Θ tr(sθ) θ jk = 0, (j, k) E Then we add Lagrange multiplier, and derive Eq. (17.12) maximize l C (Θ) = log Θ tr(sθ) γ j,k θ j,k Θ Taking the derivative, we have Eq. (17.13) Θ 1 S Γ = 0 (j,k) E where Γ is a matrix of Lagrange parameters with nonzero values for all missing edges. Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30

17 Undirected Graphical Models for Continuous Variables Estimation of Parameters with Known Structure Solve (17.13) by Multiple Linear Regression Step 1: Partition W and derive Eq. (17.14): w 12 s 12 γ 12 = 0. Step 2: Connect w 12 with β. Eq. (17.16) [ ] [ ] W11 w 12 Θ11 θ 12 w T 12 w 22 θ T = 12 θ 22 This implies Eq. (17.17) [ ] I 0 0 T. 1 w 12 = W 11 θ 12 /θ 22 = W 11 β Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30

18 Undirected Graphical Models for Continuous Variables Estimation of Parameters with Known Structure Solve (17.13) by Multiple Linear Regression (Cont.) Step 3: Use simple subset regression to solve Eq. (17.18) W 11 β s 12 γ 12 = 0 If γ j 0, we remove all the elements in jth row and jth column, and derive the reduced system of equation Eq. (17.19) W 11β s 12 = 0 Step 4: Update θ 22 and θ 12 (Eq. (17.20)) 1/θ 22 = s 22 w T 12 ˆβ θ 12 = ˆβθ 22. Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30

19 Undirected Graphical Models for Continuous Variables Summary: Algorithm 17.1 Estimation of Parameters with Known Structure Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30

20 Undirected Graphical Models for Continuous Variables A Case Study: Figure 17.4 Estimation of Parameters with Known Structure Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30

21 Undirected Graphical Models for Continuous Variables Estimation of Graph Structure Estimation of the Graph Structure Graph Lasso Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30

22 Undirected Graphical Models for Continuous Variables Graph Lasso Estimation of Graph Structure Graph Lasso fits a lasso regression using each variable as the response and the others as predictors. Consider maximizing the penalized log-likelihood Eq. (17.21) log Θ tr(sθ) λ Θ 1 where Θ 1 is the L 1 norm, i.e., the sum of the absolute values of the elements of Θ. Similarly, taking the differentiation, we reach the analog of Eq. (17.18) as Eq. (17.23) W 11 β s 12 + λ Sign(β) = 0. Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30

23 Undirected Graphical Models for Continuous Variables Estimation of Graph Structure Cyclical Coordinate Descent Algorithm Let s re-denote the following equation W 11 β s 12 + λ Sign(β) = 0. as a (p 1) by (p 1) linear system using A, x and b Ax b + λ Sign(x) = 0. For i = 1, 2,..., p 1, 1, 2,..., p 1,..., we update (Eq. (17.26)) ( x i St b i ) A ki x k, λ k i /A ii where St(x, t) is the soft-threshold operator (Eq. (17.27)) St(x, t) = sign(x) ( x t) + Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30

24 Undirected Graphical Models for Continuous Variables Summary: Graph Lasso Algorithm Estimation of Graph Structure Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30

25 Undirected Graphical Models for Continuous Variables A Case Study: Flow-Cytometry Data Estimation of Graph Structure Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30

26 Undirected Graphical Models for Continuous Variables Missing/Hidden Node Values: EM Estimation of Graph Structure Note that the values at some of the nodes in a graphical model can by unobserved; i.e., missing or hidden. The EM algorithm can be used to impute the missing values with E Step (Eq. (17.43)) imputing the missing values from the current estimates of µ and Σ ˆx i,mi = E(x i,mi x i,oi, θ) = ˆµ mi + ˆΣ mi,o i ˆΣ 1 o i,o i (x i,oi ˆµ oi ) and M Step (Eq. (17.44)) re-estimating µ and Σ from the empirical mean and (modified) covariance of the imputed data ˆµ j = 1 N ˆx ij N ˆΣ jj = 1 N i=1 N (ˆx ij ˆµ j )(ˆx ij ˆµ j ) + c i,jj i=1 Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30

27 Undirected Graphical Models for Discrete Variables Ising Models/Boltzmann Machines Pairwise Markov networks with binary variables are called Ising models in statistical mechanics, and Boltzmann machines in machine learning. The joint probabilities of the Ising model is given by Eq. (17.28, 17.29) ( ) P(X, Θ) = 1 exp (j,k) E θ jkx j X k ψ C (x C ) = [ ( Φ(Θ) )] C C exp (j,k) E θ jkx j x k x X The Ising model implies a logistic form for each node conditional on the other (Eq. (17.30)) P(X j = 1 X j = x j ) = 1 ( 1 + exp θ j0 ) (j,k) E θ jkx k where X j denotes all of the nodes except j. Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30

28 Undirected Graphical Models for Discrete Variables Estimation of Parameters with Known Graph Structure Given X, find Θ. The log-likelihood is Eq. (17.31) N l(θ) = log P Θ (X i = x i ) = i=1 [ ] N θ jk x ij x ik Φ(Θ) i=1 (j,k) E The gradient of the log-likelihood is Eq. (17.32, 17.33, 17.34) l(θ) θ jk = N x ij x ik N x j x k p(x, Θ) x X i=1 = Ê(X j X k ) E Θ (X j X k ) = 0 Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30

29 Undirected Graphical Models for Discrete Variables Reference T. Hastie, R. Tibshirani and J. Friedman. The Elements of Statistical Learning. D. Koller, N. Friedman. Probabilistic Graphical Models: Principles and Techniques. Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30

30 Undirected Graphical Models for Discrete Variables The End Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30

Graphical Model Selection

Graphical Model Selection May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor

More information

Undirected Graphical Models

Undirected Graphical Models 17 Undirected Graphical Models 17.1 Introduction A graph consists of a set of vertices (nodes), along with a set of edges joining some pairs of the vertices. In graphical models, each vertex represents

More information

An Introduction to Graphical Lasso

An Introduction to Graphical Lasso An Introduction to Graphical Lasso Bo Chang Graphical Models Reading Group May 15, 2015 Bo Chang (UBC) Graphical Lasso May 15, 2015 1 / 16 Undirected Graphical Models An undirected graph, each vertex represents

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Multivariate Gaussians Mark Schmidt University of British Columbia Winter 2019 Last Time: Multivariate Gaussian http://personal.kenyon.edu/hartlaub/mellonproject/bivariate2.html

More information

MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models

MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models 1/13 MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models Dominique Guillot Departments of Mathematical Sciences University of Delaware May 4, 2016 Recall

More information

Gaussian Models (9/9/13)

Gaussian Models (9/9/13) STA561: Probabilistic machine learning Gaussian Models (9/9/13) Lecturer: Barbara Engelhardt Scribes: Xi He, Jiangwei Pan, Ali Razeen, Animesh Srivastava 1 Multivariate Normal Distribution The multivariate

More information

MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.)

MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.) 1/12 MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.) Dominique Guillot Departments of Mathematical Sciences University of Delaware May 6, 2016

More information

Sparse inverse covariance estimation with the lasso

Sparse inverse covariance estimation with the lasso Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

CSC 412 (Lecture 4): Undirected Graphical Models

CSC 412 (Lecture 4): Undirected Graphical Models CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Joint Gaussian Graphical Model Review Series I

Joint Gaussian Graphical Model Review Series I Joint Gaussian Graphical Model Review Series I Probability Foundations Beilun Wang Advisor: Yanjun Qi 1 Department of Computer Science, University of Virginia http://jointggm.org/ June 23rd, 2017 Beilun

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Mixture Models, Density Estimation, Factor Analysis Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 2: 1 late day to hand it in now. Assignment 3: Posted,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Sparse Gaussian conditional random fields

Sparse Gaussian conditional random fields Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

Sparse Graph Learning via Markov Random Fields

Sparse Graph Learning via Markov Random Fields Sparse Graph Learning via Markov Random Fields Xin Sui, Shao Tang Sep 23, 2016 Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, 2016 1 / 36 Outline 1 Introduction to graph learning

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

11 : Gaussian Graphic Models and Ising Models

11 : Gaussian Graphic Models and Ising Models 10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood

More information

The lasso: some novel algorithms and applications

The lasso: some novel algorithms and applications 1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Notes on the Multivariate Normal and Related Topics

Notes on the Multivariate Normal and Related Topics Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

Review: Directed Models (Bayes Nets)

Review: Directed Models (Bayes Nets) X Review: Directed Models (Bayes Nets) Lecture 3: Undirected Graphical Models Sam Roweis January 2, 24 Semantics: x y z if z d-separates x and y d-separation: z d-separates x from y if along every undirected

More information

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Sargur Srihari srihari@cedar.buffalo.edu 1 Topics 1. What are probabilistic graphical models (PGMs) 2. Use of PGMs Engineering and AI 3. Directionality in

More information

Exam 2. Jeremy Morris. March 23, 2006

Exam 2. Jeremy Morris. March 23, 2006 Exam Jeremy Morris March 3, 006 4. Consider a bivariate normal population with µ 0, µ, σ, σ and ρ.5. a Write out the bivariate normal density. The multivariate normal density is defined by the following

More information

Lecture 1 October 9, 2013

Lecture 1 October 9, 2013 Probabilistic Graphical Models Fall 2013 Lecture 1 October 9, 2013 Lecturer: Guillaume Obozinski Scribe: Huu Dien Khue Le, Robin Bénesse The web page of the course: http://www.di.ens.fr/~fbach/courses/fall2013/

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Structure estimation for Gaussian graphical models

Structure estimation for Gaussian graphical models Faculty of Science Structure estimation for Gaussian graphical models Steffen Lauritzen, University of Copenhagen Department of Mathematical Sciences Minikurs TUM 2016 Lecture 3 Slide 1/48 Overview of

More information

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015 Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative

More information

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina.

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina. Using Quadratic Approximation Inderjit S. Dhillon Dept of Computer Science UT Austin SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina Sept 12, 2012 Joint work with C. Hsieh, M. Sustik and

More information

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4 ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian

More information

TAMS39 Lecture 2 Multivariate normal distribution

TAMS39 Lecture 2 Multivariate normal distribution TAMS39 Lecture 2 Multivariate normal distribution Martin Singull Department of Mathematics Mathematical Statistics Linköping University, Sweden Content Lecture Random vectors Multivariate normal distribution

More information

MATH 829: Introduction to Data Mining and Analysis Graphical Models I

MATH 829: Introduction to Data Mining and Analysis Graphical Models I MATH 829: Introduction to Data Mining and Analysis Graphical Models I Dominique Guillot Departments of Mathematical Sciences University of Delaware May 2, 2016 1/12 Independence and conditional independence:

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Mengye Ren mren@cs.toronto.edu October 18, 2015 Mengye Ren Naive Bayes and Gaussian Bayes Classifier October 18, 2015 1 / 21 Naive Bayes Bayes Rules: Naive Bayes

More information

Notes on Random Vectors and Multivariate Normal

Notes on Random Vectors and Multivariate Normal MATH 590 Spring 06 Notes on Random Vectors and Multivariate Normal Properties of Random Vectors If X,, X n are random variables, then X = X,, X n ) is a random vector, with the cumulative distribution

More information

Inverse Covariance Estimation with Missing Data using the Concave-Convex Procedure

Inverse Covariance Estimation with Missing Data using the Concave-Convex Procedure Inverse Covariance Estimation with Missing Data using the Concave-Convex Procedure Jérôme Thai 1 Timothy Hunter 1 Anayo Akametalu 1 Claire Tomlin 1 Alex Bayen 1,2 1 Department of Electrical Engineering

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04

More information

CS229 Lecture notes. Andrew Ng

CS229 Lecture notes. Andrew Ng CS229 Lecture notes Andrew Ng Part X Factor analysis When we have data x (i) R n that comes from a mixture of several Gaussians, the EM algorithm can be applied to fit a mixture model. In this setting,

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

An Introduction to Bayesian Machine Learning

An Introduction to Bayesian Machine Learning 1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision

More information

Lecture 25: November 27

Lecture 25: November 27 10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Learning Gaussian Graphical Models with Unknown Group Sparsity

Learning Gaussian Graphical Models with Unknown Group Sparsity Learning Gaussian Graphical Models with Unknown Group Sparsity Kevin Murphy Ben Marlin Depts. of Statistics & Computer Science Univ. British Columbia Canada Connections Graphical models Density estimation

More information

Conditional Random Field

Conditional Random Field Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

STAT 730 Chapter 4: Estimation

STAT 730 Chapter 4: Estimation STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Comparing Bayesian Networks and Structure Learning Algorithms

Comparing Bayesian Networks and Structure Learning Algorithms Comparing Bayesian Networks and Structure Learning Algorithms (and other graphical models) marco.scutari@stat.unipd.it Department of Statistical Sciences October 20, 2009 Introduction Introduction Graphical

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016 Lecture 5 Gaussian Models - Part 1 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza November 29, 2016 Luigi Freda ( La Sapienza University) Lecture 5 November 29, 2016 1 / 42 Outline 1 Basics

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

The generative approach to classification. A classification problem. Generative models CSE 250B

The generative approach to classification. A classification problem. Generative models CSE 250B The generative approach to classification The generative approach to classification CSE 250B The learning process: Fit a probability distribution to each class, individually To classify a new point: Which

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Elias Tragas tragas@cs.toronto.edu October 3, 2016 Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, 2016 1 / 23 Naive Bayes Bayes Rules: Naive

More information

Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data

Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data Applied Mathematical Sciences, Vol 3, 009, no 54, 695-70 Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data Evelina Veleva Rousse University A Kanchev Department of Numerical

More information

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. Rebecca Barter May 5, 2015 Linear Regression Review Linear Regression Review

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others February 22, 2016 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 1 / 21 Naive Bayes Bayes Rule:

More information

Lecture 1: Bayesian Framework Basics

Lecture 1: Bayesian Framework Basics Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of

More information

Multivariate Normal Models

Multivariate Normal Models Case Study 3: fmri Prediction Coping with Large Covariances: Latent Factor Models, Graphical Models, Graphical LASSO Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

Graphical Models and Independence Models

Graphical Models and Independence Models Graphical Models and Independence Models Yunshu Liu ASPITRG Research Group 2014-03-04 References: [1]. Steffen Lauritzen, Graphical Models, Oxford University Press, 1996 [2]. Christopher M. Bishop, Pattern

More information

Undirected Graphical Models

Undirected Graphical Models Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates

More information

Mixtures of Gaussians. Sargur Srihari

Mixtures of Gaussians. Sargur Srihari Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm

More information

Rapid Introduction to Machine Learning/ Deep Learning

Rapid Introduction to Machine Learning/ Deep Learning Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/24 Lecture 5b Markov random field (MRF) November 13, 2015 2/24 Table of contents 1 1. Objectives of Lecture

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results David Prince Biostat 572 dprince3@uw.edu April 19, 2012 David Prince (UW) SPICE April 19, 2012 1 / 11 Electronic

More information

Factor Analysis and Kalman Filtering (11/2/04)

Factor Analysis and Kalman Filtering (11/2/04) CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

A brief introduction to Conditional Random Fields

A brief introduction to Conditional Random Fields A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Robust and sparse Gaussian graphical modelling under cell-wise contamination

Robust and sparse Gaussian graphical modelling under cell-wise contamination Robust and sparse Gaussian graphical modelling under cell-wise contamination Shota Katayama 1, Hironori Fujisawa 2 and Mathias Drton 3 1 Tokyo Institute of Technology, Japan 2 The Institute of Statistical

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions - Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions Simon Luo The University of Sydney Data61, CSIRO simon.luo@data61.csiro.au Mahito Sugiyama National Institute of

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed

More information

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

Undirected Graphical Models: Markov Random Fields

Undirected Graphical Models: Markov Random Fields Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David

More information