Chapter 17: Undirected Graphical Models
|
|
- Marvin Parks
- 6 years ago
- Views:
Transcription
1 Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University October 30, 2014 Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30
2 Overview 1 Introduction Probabilistic Graphical Models Review: Multivariate Statistics Review: Matrix Operations 2 Undirected Graphical Models for Continuous Variables Connection with Multiple Linear Regression Estimation of Parameters with Known Structure Estimation of Graph Structure 3 Undirected Graphical Models for Discrete Variables Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30
3 Introduction Probabilistic Graphical Models What is Probabilistic Graphical Models A graph consists of a set of vertices (nodes), along with a set of edges joining some pairs of the vertices. In graphical models, each vertex represents a random variable, and the graph gives a visual way of understanding the joint distribution of the entire set of random variables. Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30
4 Introduction Probabilistic Graphical Models How it works Categories of PGM Directed Graphical Models, a.k.a. Bayesian Networks Undirected Graphical Models, a.k.a. Markov Random Field Computational Tasks of PGM Structuring, choosing the structure of the graph; Learning, estimating the edge parameters from data; and Inference, computing marginal vertex probabilities and expectations from their joint distribution. Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30
5 Introduction MultiVariate Normal Distribution Review: Multivariate Statistics The MVN distribution is a generalization of the univariate normal distribution which has the density function (p.d.f.) f (x) = 1 { exp 2πσ } (x µ)2 2σ 2 where µ is mean of distribution, σ 2 is variance. In p-dimensions the density becomes { 1 f (x) = (2π) p/2 exp 1 } Σ 1/2 2 (x µ)t Σ 1 (x µ) where µ is a p-dimensional mean vector and Σ is a symmetric covariance matrix. Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30
6 Introduction Conditional Probability of MVN Review: Multivariate Statistics [ ] X1 Let X = be a partitioned MVN random p-vector, with mean X [ ] 2 µ1 µ = and covariance matrix µ 2 [ ] Σ11 Σ Σ = 12. Σ 21 Σ 22 The conditional distribution of X 2 given X 1 = x 1 is an MVN with E(X 2 X 1 = x 1 ) = µ 2 + Σ 21 Σ 1 11 (x 1 µ 1 ) Cov(X 2 X 1 = x 1 ) = Σ 22 Σ 21 Σ 1 11 Σ 12 Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30
7 Introduction Review: Matrix Operations Matrix Trace In Linear Algebra, the trace of an n-by-n square matrix A is defined to be the sum of the elements on the main diagonal of A, i.e., tr(a) = a 11 + a a nn = n a ii. i=1 Matrix trace has several basic properties: tr(a + B) = tr(a) + tr(b) tr(ab) = tr(ba) Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30
8 Undirected Graphical Models for Continuous Variables Estimation of Parameters with Known Graph Structure Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30
9 Undirected Graphical Models for Continuous Variables What is Parameter Estimation Connection with Multiple Linear Regression Given empirical covariance matrix S, find the optimal estimation ˆΣ = W and its inverse ˆΣ 1 = Θ. In particular, if the ijth component of Θ is zero, then variable i and j are conditionally independent, given the other variables. In other words, there is no edge connection between vertex i and j. Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30
10 Undirected Graphical Models for Continuous Variables Connection with Multiple Linear Regression Conditional Mean and Multiple Linear Regression Suppose we partition X = (Z, Y ) where Z = (X 1,..., X p 1 ) and Y = X p. Then we have the conditional distribution of Y given Z (Eq. (17.6)) (Y Z = z) N (µ Y + (z µ Z ) T Σ 1 ZZ σ ZY, σ YY σ 1 ZY Σ 1 ZZ σ ZY ) where we have partitioned Σ as (Eq. (17.7)) [ ] ΣZZ σ Σ = ZY. σ T ZY σ YY The conditional mean in Eq. (17.6) has exactly the same form as the population multiple linear regression of Y on Z, with regression coefficient β = Σ 1 ZZ σ ZY. (Proof on next page) Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30
11 Proof Undirected Graphical Models for Continuous Variables Connection with Multiple Linear Regression Given Eq. (2.9), we have expected prediction error as EPE(f ) = E(y f (z)) 2 = E(y z T β) 2 = E[y 2 2yz T β + β T zz T β] By differentiating the expected function, we have [ depe(f ) d(y 2 2yz T β + β T zz T ] β) = E dβ dβ [ ] = E 2yz + 2zz T β = 0 Then we derive β = E(zz T ) 1 E[yz] = Σ 1 ZZ σ ZY. Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30
12 Undirected Graphical Models for Continuous Variables How to Solve its Inverse Θ Connection with Multiple Linear Regression The standard formulas for partitioned inverses give ΣΘ = I, i.e., [ ] [ ] [ ] ΣZZ σ ZY ΘZZ θ ZY I 0 = 0 T. 1 Then we derive σ T ZY σ YY θ T ZY θ YY Σ ZZ θ ZY + σ ZY θ YY = 0 σ T ZY θ ZY + σ YY θ YY = 1 To solve these two equations, we have Eq. (17.8) θ ZY = θ YY Σ 1 ZZ σ ZY where 1/θ YY = σ YY σzy T Σ 1 ZZ σ ZY > 0. And hence, we have Eq. (17.9) β = Σ 1 ZZ σ ZY = θ ZY /θ YY. Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30
13 Undirected Graphical Models for Continuous Variables What We Have Learned Connection with Multiple Linear Regression The dependence of Y on Z in (17.6) is in the mean term alone. Here we see exactly that zero elements in β and hence θ ZY mean that the corresponding elements of Z are conditionally independent of Y. We can learn about this dependence structure through Multiple Linear Regression. Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30
14 Undirected Graphical Models for Continuous Variables Maximum Likelihood Estimation of MVN Estimation of Parameters with Known Structure Let X T = (x 1,..., x N ) be sampled from N p (µ, Σ). And the MLE of µ and Σ are the sample mean and empirical covariance (Eq. (17.10)) ˆµ = x = 1 N x i N ˆΣ = S = 1 N i=1 N (x i x)(x i x) T i=1 The likelihood function is a function of the parameters µ and Σ given the data X N L(µ, Σ X) = f (x i µ, Σ) i=1 = (2π) Np 2 Σ N 2 exp { 1 2 } N (x i µ) T Σ 1 (x i µ) Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30 i=1
15 Undirected Graphical Models for Continuous Variables Log-likelihood Estimation of Parameters with Known Structure Then the log-likelihood of the data can be written as l(µ, Σ) = 2 log L(µ, Σ X) N = N log Σ + (x i µ) T Σ 1 (x i µ) + C which is equivalent to Eq. (17.11) since l(θ) = log det Θ tr(sθ) = log Σ tr( (x i µ) (x i µ) T Θ) i i=1 = log Σ i = log Σ i tr((x i µ) T Θ (x i µ)) (x i µ) T Σ 1 (x i µ) Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30
16 Undirected Graphical Models for Continuous Variables Missing Edges: Equality Constraints Estimation of Parameters with Known Structure Now, we would like to maximize the log-likelihood under the constraints that some pre-defined subset of the parameters are zero. maximize Θ subject to l C (Θ) = log det Θ tr(sθ) θ jk = 0, (j, k) E Then we add Lagrange multiplier, and derive Eq. (17.12) maximize l C (Θ) = log Θ tr(sθ) γ j,k θ j,k Θ Taking the derivative, we have Eq. (17.13) Θ 1 S Γ = 0 (j,k) E where Γ is a matrix of Lagrange parameters with nonzero values for all missing edges. Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30
17 Undirected Graphical Models for Continuous Variables Estimation of Parameters with Known Structure Solve (17.13) by Multiple Linear Regression Step 1: Partition W and derive Eq. (17.14): w 12 s 12 γ 12 = 0. Step 2: Connect w 12 with β. Eq. (17.16) [ ] [ ] W11 w 12 Θ11 θ 12 w T 12 w 22 θ T = 12 θ 22 This implies Eq. (17.17) [ ] I 0 0 T. 1 w 12 = W 11 θ 12 /θ 22 = W 11 β Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30
18 Undirected Graphical Models for Continuous Variables Estimation of Parameters with Known Structure Solve (17.13) by Multiple Linear Regression (Cont.) Step 3: Use simple subset regression to solve Eq. (17.18) W 11 β s 12 γ 12 = 0 If γ j 0, we remove all the elements in jth row and jth column, and derive the reduced system of equation Eq. (17.19) W 11β s 12 = 0 Step 4: Update θ 22 and θ 12 (Eq. (17.20)) 1/θ 22 = s 22 w T 12 ˆβ θ 12 = ˆβθ 22. Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30
19 Undirected Graphical Models for Continuous Variables Summary: Algorithm 17.1 Estimation of Parameters with Known Structure Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30
20 Undirected Graphical Models for Continuous Variables A Case Study: Figure 17.4 Estimation of Parameters with Known Structure Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30
21 Undirected Graphical Models for Continuous Variables Estimation of Graph Structure Estimation of the Graph Structure Graph Lasso Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30
22 Undirected Graphical Models for Continuous Variables Graph Lasso Estimation of Graph Structure Graph Lasso fits a lasso regression using each variable as the response and the others as predictors. Consider maximizing the penalized log-likelihood Eq. (17.21) log Θ tr(sθ) λ Θ 1 where Θ 1 is the L 1 norm, i.e., the sum of the absolute values of the elements of Θ. Similarly, taking the differentiation, we reach the analog of Eq. (17.18) as Eq. (17.23) W 11 β s 12 + λ Sign(β) = 0. Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30
23 Undirected Graphical Models for Continuous Variables Estimation of Graph Structure Cyclical Coordinate Descent Algorithm Let s re-denote the following equation W 11 β s 12 + λ Sign(β) = 0. as a (p 1) by (p 1) linear system using A, x and b Ax b + λ Sign(x) = 0. For i = 1, 2,..., p 1, 1, 2,..., p 1,..., we update (Eq. (17.26)) ( x i St b i ) A ki x k, λ k i /A ii where St(x, t) is the soft-threshold operator (Eq. (17.27)) St(x, t) = sign(x) ( x t) + Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30
24 Undirected Graphical Models for Continuous Variables Summary: Graph Lasso Algorithm Estimation of Graph Structure Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30
25 Undirected Graphical Models for Continuous Variables A Case Study: Flow-Cytometry Data Estimation of Graph Structure Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30
26 Undirected Graphical Models for Continuous Variables Missing/Hidden Node Values: EM Estimation of Graph Structure Note that the values at some of the nodes in a graphical model can by unobserved; i.e., missing or hidden. The EM algorithm can be used to impute the missing values with E Step (Eq. (17.43)) imputing the missing values from the current estimates of µ and Σ ˆx i,mi = E(x i,mi x i,oi, θ) = ˆµ mi + ˆΣ mi,o i ˆΣ 1 o i,o i (x i,oi ˆµ oi ) and M Step (Eq. (17.44)) re-estimating µ and Σ from the empirical mean and (modified) covariance of the imputed data ˆµ j = 1 N ˆx ij N ˆΣ jj = 1 N i=1 N (ˆx ij ˆµ j )(ˆx ij ˆµ j ) + c i,jj i=1 Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30
27 Undirected Graphical Models for Discrete Variables Ising Models/Boltzmann Machines Pairwise Markov networks with binary variables are called Ising models in statistical mechanics, and Boltzmann machines in machine learning. The joint probabilities of the Ising model is given by Eq. (17.28, 17.29) ( ) P(X, Θ) = 1 exp (j,k) E θ jkx j X k ψ C (x C ) = [ ( Φ(Θ) )] C C exp (j,k) E θ jkx j x k x X The Ising model implies a logistic form for each node conditional on the other (Eq. (17.30)) P(X j = 1 X j = x j ) = 1 ( 1 + exp θ j0 ) (j,k) E θ jkx k where X j denotes all of the nodes except j. Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30
28 Undirected Graphical Models for Discrete Variables Estimation of Parameters with Known Graph Structure Given X, find Θ. The log-likelihood is Eq. (17.31) N l(θ) = log P Θ (X i = x i ) = i=1 [ ] N θ jk x ij x ik Φ(Θ) i=1 (j,k) E The gradient of the log-likelihood is Eq. (17.32, 17.33, 17.34) l(θ) θ jk = N x ij x ik N x j x k p(x, Θ) x X i=1 = Ê(X j X k ) E Θ (X j X k ) = 0 Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30
29 Undirected Graphical Models for Discrete Variables Reference T. Hastie, R. Tibshirani and J. Friedman. The Elements of Statistical Learning. D. Koller, N. Friedman. Probabilistic Graphical Models: Principles and Techniques. Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30
30 Undirected Graphical Models for Discrete Variables The End Biaobin Jiang (Purdue) 2014 Summer Reading October 30, / 30
Graphical Model Selection
May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor
More informationUndirected Graphical Models
17 Undirected Graphical Models 17.1 Introduction A graph consists of a set of vertices (nodes), along with a set of edges joining some pairs of the vertices. In graphical models, each vertex represents
More informationAn Introduction to Graphical Lasso
An Introduction to Graphical Lasso Bo Chang Graphical Models Reading Group May 15, 2015 Bo Chang (UBC) Graphical Lasso May 15, 2015 1 / 16 Undirected Graphical Models An undirected graph, each vertex represents
More information10708 Graphical Models: Homework 2
10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves
More informationGaussian Graphical Models and Graphical Lasso
ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Multivariate Gaussians Mark Schmidt University of British Columbia Winter 2019 Last Time: Multivariate Gaussian http://personal.kenyon.edu/hartlaub/mellonproject/bivariate2.html
More informationMATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models
1/13 MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models Dominique Guillot Departments of Mathematical Sciences University of Delaware May 4, 2016 Recall
More informationGaussian Models (9/9/13)
STA561: Probabilistic machine learning Gaussian Models (9/9/13) Lecturer: Barbara Engelhardt Scribes: Xi He, Jiangwei Pan, Ali Razeen, Animesh Srivastava 1 Multivariate Normal Distribution The multivariate
More informationMATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.)
1/12 MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.) Dominique Guillot Departments of Mathematical Sciences University of Delaware May 6, 2016
More informationSparse inverse covariance estimation with the lasso
Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationCSC 412 (Lecture 4): Undirected Graphical Models
CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationJoint Gaussian Graphical Model Review Series I
Joint Gaussian Graphical Model Review Series I Probability Foundations Beilun Wang Advisor: Yanjun Qi 1 Department of Computer Science, University of Virginia http://jointggm.org/ June 23rd, 2017 Beilun
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Mixture Models, Density Estimation, Factor Analysis Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 2: 1 late day to hand it in now. Assignment 3: Posted,
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationSparse Gaussian conditional random fields
Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian
More informationLearning Multiple Tasks with a Sparse Matrix-Normal Penalty
Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1
More informationSparse Graph Learning via Markov Random Fields
Sparse Graph Learning via Markov Random Fields Xin Sui, Shao Tang Sep 23, 2016 Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, 2016 1 / 36 Outline 1 Introduction to graph learning
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More information11 : Gaussian Graphic Models and Ising Models
10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood
More informationThe lasso: some novel algorithms and applications
1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak,
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationNotes on the Multivariate Normal and Related Topics
Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationReview: Directed Models (Bayes Nets)
X Review: Directed Models (Bayes Nets) Lecture 3: Undirected Graphical Models Sam Roweis January 2, 24 Semantics: x y z if z d-separates x and y d-separation: z d-separates x from y if along every undirected
More informationIntroduction to Probabilistic Graphical Models
Introduction to Probabilistic Graphical Models Sargur Srihari srihari@cedar.buffalo.edu 1 Topics 1. What are probabilistic graphical models (PGMs) 2. Use of PGMs Engineering and AI 3. Directionality in
More informationExam 2. Jeremy Morris. March 23, 2006
Exam Jeremy Morris March 3, 006 4. Consider a bivariate normal population with µ 0, µ, σ, σ and ρ.5. a Write out the bivariate normal density. The multivariate normal density is defined by the following
More informationLecture 1 October 9, 2013
Probabilistic Graphical Models Fall 2013 Lecture 1 October 9, 2013 Lecturer: Guillaume Obozinski Scribe: Huu Dien Khue Le, Robin Bénesse The web page of the course: http://www.di.ens.fr/~fbach/courses/fall2013/
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationStructure estimation for Gaussian graphical models
Faculty of Science Structure estimation for Gaussian graphical models Steffen Lauritzen, University of Copenhagen Department of Mathematical Sciences Minikurs TUM 2016 Lecture 3 Slide 1/48 Overview of
More informationSequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015
Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative
More informationApproximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina.
Using Quadratic Approximation Inderjit S. Dhillon Dept of Computer Science UT Austin SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina Sept 12, 2012 Joint work with C. Hsieh, M. Sustik and
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationTAMS39 Lecture 2 Multivariate normal distribution
TAMS39 Lecture 2 Multivariate normal distribution Martin Singull Department of Mathematics Mathematical Statistics Linköping University, Sweden Content Lecture Random vectors Multivariate normal distribution
More informationMATH 829: Introduction to Data Mining and Analysis Graphical Models I
MATH 829: Introduction to Data Mining and Analysis Graphical Models I Dominique Guillot Departments of Mathematical Sciences University of Delaware May 2, 2016 1/12 Independence and conditional independence:
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Mengye Ren mren@cs.toronto.edu October 18, 2015 Mengye Ren Naive Bayes and Gaussian Bayes Classifier October 18, 2015 1 / 21 Naive Bayes Bayes Rules: Naive Bayes
More informationNotes on Random Vectors and Multivariate Normal
MATH 590 Spring 06 Notes on Random Vectors and Multivariate Normal Properties of Random Vectors If X,, X n are random variables, then X = X,, X n ) is a random vector, with the cumulative distribution
More informationInverse Covariance Estimation with Missing Data using the Concave-Convex Procedure
Inverse Covariance Estimation with Missing Data using the Concave-Convex Procedure Jérôme Thai 1 Timothy Hunter 1 Anayo Akametalu 1 Claire Tomlin 1 Alex Bayen 1,2 1 Department of Electrical Engineering
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More information1 Data Arrays and Decompositions
1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04
More informationCS229 Lecture notes. Andrew Ng
CS229 Lecture notes Andrew Ng Part X Factor analysis When we have data x (i) R n that comes from a mixture of several Gaussians, the EM algorithm can be applied to fit a mixture model. In this setting,
More informationMaximum Likelihood, Logistic Regression, and Stochastic Gradient Training
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationAn Introduction to Bayesian Machine Learning
1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems
More informationUniversity of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians
Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision
More informationLecture 25: November 27
10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationLearning Gaussian Graphical Models with Unknown Group Sparsity
Learning Gaussian Graphical Models with Unknown Group Sparsity Kevin Murphy Ben Marlin Depts. of Statistics & Computer Science Univ. British Columbia Canada Connections Graphical models Density estimation
More informationConditional Random Field
Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationSTAT 730 Chapter 4: Estimation
STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationComparing Bayesian Networks and Structure Learning Algorithms
Comparing Bayesian Networks and Structure Learning Algorithms (and other graphical models) marco.scutari@stat.unipd.it Department of Statistical Sciences October 20, 2009 Introduction Introduction Graphical
More information3 : Representation of Undirected GM
10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationLecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016
Lecture 5 Gaussian Models - Part 1 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza November 29, 2016 Luigi Freda ( La Sapienza University) Lecture 5 November 29, 2016 1 / 42 Outline 1 Basics
More informationLearning discrete graphical models via generalized inverse covariance matrices
Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,
More informationCoordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /
Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods
More informationProperties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation
Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana
More informationThe generative approach to classification. A classification problem. Generative models CSE 250B
The generative approach to classification The generative approach to classification CSE 250B The learning process: Fit a probability distribution to each class, individually To classify a new point: Which
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Elias Tragas tragas@cs.toronto.edu October 3, 2016 Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, 2016 1 / 23 Naive Bayes Bayes Rules: Naive
More informationTesting a Normal Covariance Matrix for Small Samples with Monotone Missing Data
Applied Mathematical Sciences, Vol 3, 009, no 54, 695-70 Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data Evelina Veleva Rousse University A Kanchev Department of Numerical
More informationSTAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.
STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. Rebecca Barter May 5, 2015 Linear Regression Review Linear Regression Review
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others February 22, 2016 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 1 / 21 Naive Bayes Bayes Rule:
More informationLecture 1: Bayesian Framework Basics
Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of
More informationMultivariate Normal Models
Case Study 3: fmri Prediction Coping with Large Covariances: Latent Factor Models, Graphical Models, Graphical LASSO Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February
More informationBayesian Learning in Undirected Graphical Models
Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,
More informationMachine learning - HT Maximum Likelihood
Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce
More informationGraphical Models and Independence Models
Graphical Models and Independence Models Yunshu Liu ASPITRG Research Group 2014-03-04 References: [1]. Steffen Lauritzen, Graphical Models, Oxford University Press, 1996 [2]. Christopher M. Bishop, Pattern
More informationUndirected Graphical Models
Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates
More informationMixtures of Gaussians. Sargur Srihari
Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm
More informationRapid Introduction to Machine Learning/ Deep Learning
Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/24 Lecture 5b Markov random field (MRF) November 13, 2015 2/24 Table of contents 1 1. Objectives of Lecture
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by
More informationSparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results
Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results David Prince Biostat 572 dprince3@uw.edu April 19, 2012 David Prince (UW) SPICE April 19, 2012 1 / 11 Electronic
More informationFactor Analysis and Kalman Filtering (11/2/04)
CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationCOM336: Neural Computing
COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk
More informationA brief introduction to Conditional Random Fields
A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood
More informationLecture 3. Linear Regression II Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationRobust and sparse Gaussian graphical modelling under cell-wise contamination
Robust and sparse Gaussian graphical modelling under cell-wise contamination Shota Katayama 1, Hironori Fujisawa 2 and Mathias Drton 3 1 Tokyo Institute of Technology, Japan 2 The Institute of Statistical
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationBias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions
- Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions Simon Luo The University of Sydney Data61, CSIRO simon.luo@data61.csiro.au Mahito Sugiyama National Institute of
More informationDirected and Undirected Graphical Models
Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed
More informationToday. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion
Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence
More informationMS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari
MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind
More informationUndirected Graphical Models: Markov Random Fields
Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David
More information