CPSC 540: Machine Learning
|
|
- Marybeth Kelly
- 6 years ago
- Views:
Transcription
1 CPSC 540: Machine Learning Mark Schmidt University of British Columbia Winter 2018
2 Last Time: Learning and Inference in DAGs We discussed learning in DAG models, log p(x W ) = n d log p(x i j x i pa(j), wj ), i=1 j=1 which becomes a supervised learning problem for each feature j. Tabular parameterization is common but requires small number of parents. Gaussian belief networks use least squares (defines a multivariate Gaussian). Sigmoid belief networks use logistic regression. For inference in DAGs (decoding, computing marginals, computing conditionals): We can use ancestral sampling to compute Monte Carlo approximations. We can apply message passing, but messages may be huge. Only guarantee O(dk 2 ) cost if each node has at most one parent ( tree or forest ).
3 Conditional Sampling in DAGs What about conditional sampling in DAGs? Could be easy or hard depending on what we condition on. For example, easy if we condition on the first variables in the order: Just fix these and run ancestral sampling. Hard to condition on the last variables in the order: Conditioning on descendent makes ancestors dependent.
4 DAG Structure Learning Structure learning is the problem of choosing the graph. Input is data X. Output is a graph G. The easy case is when we re given the ordering of the variables. So the parents of j must be chosen from {1, 2,..., j 1}. Given the ordering, structure learning reduces to feature selection: Select features {x 1, x 2,..., x j 1 } that best predict label x j. We can use any feature selection method to solve these d problems.
5 Example: Structure Learning in Rain Data Given Ordering Structure learning in rain data using L1-regularized logistic regression. For different λ values, assuming chronological ordering.
6 DAG Structure Learning without an Ordering Without an ordering, a common approach is search and score Define a score for a particular graph structure (like BIC). Search through the space of possible DAGs (greedily add/remove/reverse edges). Another common approach is constraint-based methods: Based on performing a sequence of conditional independence tests. Prune edge between x i and x j if you find variables S making them independent, x i x j x S. Assumes faithfulness (all independences are reflected in graph). Otherwise it s weird (a duplicated feature would be disconnected from everything.) Structure learning is NP-hard in general, but finding the optimal tree is poly-time: For symmetric scores, can be done by minimum spanning tree. For asymetric scores, can be by minimum spanning arborescence.
7 Structure Learning on USPS Digits Optimal tree on USPS digits. 1,1 1,2 2,1 1,3 1,4 1,5 1,6 2,2 2,3 2,4 1,7 2,5 2,6 3,1 3,2 3,3 3,4 2,7 3,5 4,1 4,2 4,3 1,8 3,6 1,9 1,10 1,11 4,4 5,1 5,2 5,3 2,8 3,7 4,5 2,9 2,10 2,11 1,12 6,1 6,2 6,3 3,8 4,7 4,6 5,4 3,9 3,10 1,13 3,11 2,12 7,1 7,2 7,3 4,8 5,5 4,9 4,12 4,13 1,15 4,10 1,16 1,14 2,13 3,12 8,1 8,2 8,3 5,8 5,7 5,6 6,4 5,9 5,10 5,11 5,12 5,13 2,15 2,16 2,14 3,13 4,11 9,1 9,2 9,3 6,8 6,7 6,6 6,5 7,4 6,9 6,10 6,11 6,12 6,13 3,16 3,15 3,14 10,1 10,2 10,3 7,8 7,7 7,5 7,6 8,4 7,9 7,10 7,11 7,12 4,15 4,14 11,1 11,2 11,3 8,8 8,7 8,5 8,6 9,4 8,9 8,10 8,11 4,16 5,15 5,14 12,1 12,2 12,3 9,8 9,7 9,6 9,5 10,4 9,9 9,10 5,16 13,1 10,8 10,7 10,6 10,5 11,4 10,9 6,15 13,2 14,1 11,8 11,6 11,7 11,5 6,16 6,14 7,13 13,3 14,2 15,1 12,7 12,8 13,4 12,5 12,6 12,4 7,16 7,15 7,14 8,13 8,12 14,3 15,2 16,1 13,5 8,16 8,15 8,14 9,12 9,11 15,3 16,2 13,6 14,4 9,16 9,15 14,8 9,14 13,8 10,11 9,13 10,10 12,9 16,3 13,7 14,5 10,16 10,15 14,9 15,8 10,14 13,9 11,10 10,12 10,13 11,9 12,10 14,6 15,4 11,16 11,15 14,10 16,8 11,14 13,10 11,11 11,12 11,13 12,11 15,5 14,7 16,4 12,16 12,15 14,11 12,14 12,13 13,11 12,12 15,9 15,6 15,7 13,14 14,12 13,13 13,12 16,9 13,15 13,16 15,10 16,5 16,6 16,7 14,14 14,13 14,15 15,11 16,10 15,13 15,14 15,12 15,15 14,16 16,11 16,13 16,14 16,12 15,16 16,15 16,16
8 20 Newsgroups Data Data containing presence of 100 words from newsgroups posts: car drive files hockey mac league pc win Structure learning should give relationship between words.
9 Structure Learning on News Words Optimal tree on newsgroups data: ftp phone files number disk format windows drive memory system image card dos driver pc program version win car scsi data problem display graphics video software space team won bmw dealer engine honda mac help server launch moon nasa orbit shuttle technology fans games hockey league players puck season oil lunar mars earth satellite solar mission nhl baseball god hit bible christian jesus religion jews government israel children power president rights state war health human law university world aids food insurance medicine fact gun research science msg water patients studies case course evidence question computer cancer disease doctor vitamin
10 Outline 1 Structure Learning 2
11 Directed vs. Undirected Models In some applications we have a natural ordering of the x j. In the rain data, the past affects the future. In some applications we don t have a natural order. E.g., pixels in an image. In these settings we often use undirected graphical models (UGMs). Also known as Markov random fields (MRFs) and originally from statistical physics.
12 Directed vs. Undirected Models Undirected graphical models are based on undirected graphs: They are a classic way to model dependencies in images: Can capture dependencies between neighbours without imposing an ordering.
13 Ising Models from Statistical Physics The Ising model for binary x i is defined by d p(x 1, x 2,..., x d ) exp x i w i + x i x j w ij, i=1 (i,j) E where E is the set of edges in an undirected graph. Called a log-linear model, because log p(x) is linear plus a constant. Consider using x i { 1, 1}: If w i > 0 it encourages x i = 1. If w ij > 0 it encourages neighbours i and j to have the same value. E.g., neighbouring pixels in the image receive the same label ( attractive model) We re modeling dependencies, but haven t assumed an ordering.
14 Pairwise undirected graphical models (UGMs) assume p(x) has the form d p(x) φ j (x j ) φ ij (x i, x j ). j=1 (i,j) E The φ j and φ ij functions are called potential functions: They can be any non-negative function. Ordering doesn t matter: more natural for things like pixels of an image. Ising model is a special case where φ i (x i ) = exp(x i w i ), φ ij (x i, x j ) = exp(x i x j w ij ). Bonus slides generalize Ising to non-binary case.
15 Label Propagation as a UGM Consider modeling the probability of a vector of labels ȳ R t using n t p(ȳ 1, ȳ 2,..., ȳ t ) exp w ij (y i ȳ i ) 2 1 t t w ij (ȳ i ȳ j ) 2. 2 i=1 j=1 i=1 j=1 Decoding in this model is equivalent to the label propagation problem. This is a pairwise UGM: ( ) n φ j (ȳ j ) = exp w ij (y i ȳ j ) 2, φ ij (ȳ i, ȳ j ) = exp ( 12 ) w ij(ȳ i ȳ j ) 2. i=1
16 Conditional Independence in It s easy to check conditional independence in UGMs: A B C if C blocks all paths from any A to any B. Example: A C. A C B. A C B, E. A, B F C A, B F C, E.
17 Multivariate Gaussian and Pairwise Graphical Models Independence in multivariate Gaussian: In Gaussians, marginal independence is determined by covariance: x i x j Σ ij = 0. But how can we determine conditional independence? Multivarate Gaussian is a special case of a pairwise UGM. So we can just use graph separation. In particular, edges of the UGM are (i, j) values where Θ i,j 0. We use the term Gaussian graphical model (GGM) in this context. Or Gaussian Markov random field (GMRF).
18 Digression: Gaussian Graphical Models Multivariate Gaussian can be written as ( p(x) exp 1 ) 2 (x µ)t Σ 1 (x µ) exp 1 2 xt Σ 1 x + x T Σ 1 µ, }{{} v and writing it in summation notation we can see that it s a pairwise UGM: p(x) exp( 1 d d d x i x j Σ 1 ij + x i v i 2 i=1 j=1 i=1 d d ( = exp 1 ) 2 x ix j Σ 1 d ij exp (x i v i ) }{{} i=1 j=1 }{{} i=1 φ i(x i) φ ij(x i,x j)
19 Independence in GGMs So Gaussians are pairwise UGMs with φ ij (x i, x j ) = exp ( 1 2 x ix j Θ ij ), Where Θ ij is element (i, j) of Σ 1. Consider setting Θ ij = 0: For all (x i, x j ) we have φ(x i, x j ) = 1, which is equivalent to not having edge (i, j). So setting Θ ij = 0 is equivalent to removing φ ij (x i, x j ) from the UGM. Gaussian conditional independence is determined by precision matrix sparsity. Diagonal Θ gives disconnected graph: all variables are independent. Full Θ gives fully-connected graph: there are no independences.
20 Independence in GGMs Consider a Gaussian with the following covariance matrix: Σ = Σ ij 0 so all variables are dependent: x 1 x 2, x 1 x 5, and so on. This would show up in graph: you would be able to reach any x i from any x j. The inverse is given by a tri-diagonal matrix: Σ 1 = So conditional independence is described by a Markov chain: p(x 1 x 2, x 3, x 4, x 5 ) = p(x 1 x 2 ).
21 Graphical Lasso Conditional independence in GGMs is described by sparsity in Θ. Setting a Θ ij to 0 removes an edge from the graph. Recall fitting multivariate Gaussian with L1-regularization, argmin Tr(SΘ) log Θ + λ Θ 1, Θ 0 which is called the graphical Lasso because it encourages a sparse graph. Graphical Lasso is a convex approach to structure learning for GGMs. Examples
22 Higher-Order In UGMs, we can also define potentials on higher-order interactions. A three-variable generalization of Ising potentials is: φ ijk (x i, x j, x k ) = w ijk x i x j x k. If w ijk > 0 and x j {0, 1}, encourages you to set all three to 1. If w ijk > 0 and x j { 1, 1}, encourages odd number of positives. In the general case, a UGM just assumes p(x) factorizes over subsets c, p(x 1, x 2,..., x d ) c C φ c (x c ), from among a collection of subsets of C. In this case, graph has edge (i, j) if i and j are together in at least one c. Conditional independences are still given by graph separation.
23 Tractability of UGMs Without using, we write UGM probability as p(x) = 1 φ c (x c ), Z c C where Z is the constant that makes the probabilites sum up to 1. Z = φ c (x c ) or Z = φ c (x c )dx d dx d 1... dx 1 = 1. x 1 x 2 x 1 x 2 x d c C x d c C Whether you can compute Z depends on the choice of the φ c : Gaussian case: O(d 3 ) in general, but O(d) for forests (no loops). Continuous non-gaussian: usually requires numerical integration. Discrete case: #P-hard in general, but O(dk 2 ) for forests (no loops).
24 Summary Structure learning is the problem of learning the graph structure. Hard in general, but easy for trees and L1-regularization gives fast heuristic. Undirected graphical models factorize probability into non-negative potentials. Gaussians are a special case. Log-linear models (like Ising) are a common choice. Simple conditional independence properties. Next time: our first visit to the wild world of approximate inference.
25 General Pairwise UGM For general discrete x i a generalization of Ising models is p(x 1, x 2,..., x d ) = 1 d Z exp w i,xi + i=1 (i,j) E w i,j,xi,x j which can represent any positive pairwise UGM (meaning p(x) > 0 for all x)., Interpretation of weights for this UGM: If w i,1 > w i,2 then we prefer x i = 1 to x i = 2. If w i,j,1,1 > w i,j,2,2 then we prefer (x i = 1, x j = 1) to (x i = 2, x j = 2). As before, we can use parameter tieing: We could use the same w i,xi for all positions i. Ising model corresponds to a particular parameter tieing of the w i,j,xi,x j.
26 Factor Graphs Factor graphs are a way to visualize UGMs that distinguishes different orders. Use circles for variables, squares to represent dependencies. Factor graph if p(x 1, x 2, x 3 ) φ 12 (x 1, x 2 )φ 13 (x 1, x 2, x 3 )φ 23 (x 2, x 3 ): Factor graph if p(x 1, x 2, x 3 ) φ 123 (x 1, x 2, x 3 ):
CS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs
CS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs Professor Erik Sudderth Brown University Computer Science October 6, 2016 Some figures and materials courtesy
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationCS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning
CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning More Approximate Inference Mark Schmidt University of British Columbia Winter 2018 Last Time: Approximate Inference We ve been discussing graphical models for density estimation,
More informationLecture 2: Parameter Learning in Fully Observed Graphical Models. Sam Roweis. Monday July 24, 2006 Machine Learning Summer School, Taiwan
Lecture 2: Parameter Learning in Fully Observed Graphical Models Sam Roweis Monday July 24, 2006 Machine Learning Summer School, Taiwan Likelihood Functions So far we have focused on the (log) probability
More informationThe Generative Self-Organizing Map. A Probabilistic Generalization of Kohonen s SOM
Submitted to: European Symposium on Artificial Neural Networks 2003 Universiteit van Amsterdam IAS technical report IAS-UVA-02-03 The Generative Self-Organizing Map A Probabilistic Generalization of Kohonen
More informationSelf-Organization by Optimizing Free-Energy
Self-Organization by Optimizing Free-Energy J.J. Verbeek, N. Vlassis, B.J.A. Kröse University of Amsterdam, Informatics Institute Kruislaan 403, 1098 SJ Amsterdam, The Netherlands Abstract. We present
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More information11 : Gaussian Graphic Models and Ising Models
10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood
More informationLearning Tractable Graphical Models: Latent Trees and Tree Mixtures
Learning Tractable Graphical Models: Latent Trees and Tree Mixtures Anima Anandkumar U.C. Irvine Joint work with Furong Huang, U.N. Niranjan, Daniel Hsu and Sham Kakade. High-Dimensional Graphical Modeling
More informationMachine Learning for Data Science (CS4786) Lecture 24
Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Multivariate Gaussians Mark Schmidt University of British Columbia Winter 2019 Last Time: Multivariate Gaussian http://personal.kenyon.edu/hartlaub/mellonproject/bivariate2.html
More informationIntroduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah
Introduction to Graphical Models Srikumar Ramalingam School of Computing University of Utah Reference Christopher M. Bishop, Pattern Recognition and Machine Learning, Jonathan S. Yedidia, William T. Freeman,
More informationCSC 412 (Lecture 4): Undirected Graphical Models
CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Mixture Models, Density Estimation, Factor Analysis Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 2: 1 late day to hand it in now. Assignment 3: Posted,
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationSemi-Markov/Graph Cuts
Semi-Markov/Graph Cuts Alireza Shafaei University of British Columbia August, 2015 1 / 30 A Quick Review For a general chain-structured UGM we have: n n p(x 1, x 2,..., x n ) φ i (x i ) φ i,i 1 (x i, x
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationIntroduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah
Introduction to Graphical Models Srikumar Ramalingam School of Computing University of Utah Reference Christopher M. Bishop, Pattern Recognition and Machine Learning, Jonathan S. Yedidia, William T. Freeman,
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Mark Schmidt University of British Columbia Winter 2019 Last Time: Monte Carlo Methods If we want to approximate expectations of random functions, E[g(x)] = g(x)p(x) or E[g(x)]
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationReview: Directed Models (Bayes Nets)
X Review: Directed Models (Bayes Nets) Lecture 3: Undirected Graphical Models Sam Roweis January 2, 24 Semantics: x y z if z d-separates x and y d-separation: z d-separates x from y if along every undirected
More informationBAYESIAN MACHINE LEARNING.
BAYESIAN MACHINE LEARNING frederic.pennerath@centralesupelec.fr What is this Bayesian Machine Learning course about? A course emphasizing the few essential theoretical ingredients Probabilistic generative
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationLecture 6: Graphical Models
Lecture 6: Graphical Models Kai-Wei Chang CS @ Uniersity of Virginia kw@kwchang.net Some slides are adapted from Viek Skirmar s course on Structured Prediction 1 So far We discussed sequence labeling tasks:
More informationProbabilistic Graphical Networks: Definitions and Basic Results
This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More informationExample: multivariate Gaussian Distribution
School of omputer Science Probabilistic Graphical Models Representation of undirected GM (continued) Eric Xing Lecture 3, September 16, 2009 Reading: KF-chap4 Eric Xing @ MU, 2005-2009 1 Example: multivariate
More informationProbabilistic Graphical Models (I)
Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random
More informationUndirected Graphical Models: Markov Random Fields
Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected
More informationDirected and Undirected Graphical Models
Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed
More informationThe Origin of Deep Learning. Lili Mou Jan, 2015
The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets
More information2 : Directed GMs: Bayesian Networks
10-708: Probabilistic Graphical Models 10-708, Spring 2017 2 : Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Jayanth Koushik, Hiroaki Hayashi, Christian Perez Topic: Directed GMs 1 Types
More informationJunction Tree, BP and Variational Methods
Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationIntelligent Systems:
Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition
More informationLecture 8: PGM Inference
15 September 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I 1 Variable elimination Max-product Sum-product 2 LP Relaxations QP Relaxations 3 Marginal and MAP X1 X2 X3 X4
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction
More informationInference in Graphical Models Variable Elimination and Message Passing Algorithm
Inference in Graphical Models Variable Elimination and Message Passing lgorithm Le Song Machine Learning II: dvanced Topics SE 8803ML, Spring 2012 onditional Independence ssumptions Local Markov ssumption
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Mark Schmidt University of British Columbia Winter 2018 Last Time: Monte Carlo Methods If we want to approximate expectations of random functions, E[g(x)] = g(x)p(x) or E[g(x)]
More informationCS Lecture 4. Markov Random Fields
CS 6347 Lecture 4 Markov Random Fields Recap Announcements First homework is available on elearning Reminder: Office hours Tuesday from 10am-11am Last Time Bayesian networks Today Markov random fields
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationDirected Graphical Models
Directed Graphical Models Instructor: Alan Ritter Many Slides from Tom Mitchell Graphical Models Key Idea: Conditional independence assumptions useful but Naïve Bayes is extreme! Graphical models express
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationChapter 17: Undirected Graphical Models
Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationAlternative Parameterizations of Markov Networks. Sargur Srihari
Alternative Parameterizations of Markov Networks Sargur srihari@cedar.buffalo.edu 1 Topics Three types of parameterization 1. Gibbs Parameterization 2. Factor Graphs 3. Log-linear Models with Energy functions
More information3 : Representation of Undirected GM
10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:
More informationBayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Networks Matt Gormley Lecture 24 April 9, 2018 1 Homework 7: HMMs Reminders
More informationLogistic Regression: Online, Lazy, Kernelized, Sequential, etc.
Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010
More informationMachine Learning, Fall 2009: Midterm
10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all
More informationRepresentation of undirected GM. Kayhan Batmanghelich
Representation of undirected GM Kayhan Batmanghelich Review Review: Directed Graphical Model Represent distribution of the form ny p(x 1,,X n = p(x i (X i i=1 Factorizes in terms of local conditional probabilities
More informationUNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS
UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS JONATHAN YEDIDIA, WILLIAM FREEMAN, YAIR WEISS 2001 MERL TECH REPORT Kristin Branson and Ian Fasel June 11, 2003 1. Inference Inference problems
More informationSubmodularity in Machine Learning
Saifuddin Syed MLRG Summer 2016 1 / 39 What are submodular functions Outline 1 What are submodular functions Motivation Submodularity and Concavity Examples 2 Properties of submodular functions Submodularity
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationGraphical Models - Part I
Graphical Models - Part I Oliver Schulte - CMPT 726 Bishop PRML Ch. 8, some slides from Russell and Norvig AIMA2e Outline Probabilistic Models Bayesian Networks Markov Random Fields Inference Outline Probabilistic
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining Linear Classifiers: predictions Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due Friday of next
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationMultivariate Normal Models
Case Study 3: fmri Prediction Coping with Large Covariances: Latent Factor Models, Graphical Models, Graphical LASSO Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationWalk-Sum Interpretation and Analysis of Gaussian Belief Propagation
Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Jason K. Johnson, Dmitry M. Malioutov and Alan S. Willsky Department of Electrical Engineering and Computer Science Massachusetts Institute
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models CPSC
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3
More informationProbabilistic and Bayesian Machine Learning
Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/
More informationDirected Graphical Models or Bayesian Networks
Directed Graphical Models or Bayesian Networks Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Bayesian Networks One of the most exciting recent advancements in statistical AI Compact
More informationGraphical Model Inference with Perfect Graphs
Graphical Model Inference with Perfect Graphs Tony Jebara Columbia University July 25, 2013 joint work with Adrian Weller Graphical models and Markov random fields We depict a graphical model G as a bipartite
More informationBayesian (conditionally) conjugate inference for discrete data models. Jon Forster (University of Southampton)
Bayesian (conditionally) conjugate inference for discrete data models Jon Forster (University of Southampton) with Mark Grigsby (Procter and Gamble?) Emily Webb (Institute of Cancer Research) Table 1:
More informationMessage-Passing Algorithms for GMRFs and Non-Linear Optimization
Message-Passing Algorithms for GMRFs and Non-Linear Optimization Jason Johnson Joint Work with Dmitry Malioutov, Venkat Chandrasekaran and Alan Willsky Stochastic Systems Group, MIT NIPS Workshop: Approximate
More informationTightness of LP Relaxations for Almost Balanced Models
Tightness of LP Relaxations for Almost Balanced Models Adrian Weller University of Cambridge AISTATS May 10, 2016 Joint work with Mark Rowland and David Sontag For more information, see http://mlg.eng.cam.ac.uk/adrian/
More informationBayesian Networks Inference with Probabilistic Graphical Models
4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning
More informationTotal positivity in Markov structures
1 based on joint work with Shaun Fallat, Kayvan Sadeghi, Caroline Uhler, Nanny Wermuth, and Piotr Zwiernik (arxiv:1510.01290) Faculty of Science Total positivity in Markov structures Steffen Lauritzen
More informationDirected and Undirected Graphical Models
Directed and Undirected Graphical Models Adrian Weller MLSALT4 Lecture Feb 26, 2016 With thanks to David Sontag (NYU) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,
More informationLearning Gaussian Graphical Models with Unknown Group Sparsity
Learning Gaussian Graphical Models with Unknown Group Sparsity Kevin Murphy Ben Marlin Depts. of Statistics & Computer Science Univ. British Columbia Canada Connections Graphical models Density estimation
More informationThe Ising model and Markov chain Monte Carlo
The Ising model and Markov chain Monte Carlo Ramesh Sridharan These notes give a short description of the Ising model for images and an introduction to Metropolis-Hastings and Gibbs Markov Chain Monte
More informationIntroduction to Probabilistic Graphical Models
Introduction to Probabilistic Graphical Models Sargur Srihari srihari@cedar.buffalo.edu 1 Topics 1. What are probabilistic graphical models (PGMs) 2. Use of PGMs Engineering and AI 3. Directionality in
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Matrix Notation Mark Schmidt University of British Columbia Winter 2017 Admin Auditting/registration forms: Submit them at end of class, pick them up end of next class. I need
More informationMarkov Random Fields for Computer Vision (Part 1)
Markov Random Fields for Computer Vision (Part 1) Machine Learning Summer School (MLSS 2011) Stephen Gould stephen.gould@anu.edu.au Australian National University 13 17 June, 2011 Stephen Gould 1/23 Pixel
More informationActive and Semi-supervised Kernel Classification
Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,
More informationBayesian Learning in Undirected Graphical Models
Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul
More information10708 Graphical Models: Homework 2
10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves
More informationStat 521A Lecture 18 1
Stat 521A Lecture 18 1 Outline Cts and discrete variables (14.1) Gaussian networks (14.2) Conditional Gaussian networks (14.3) Non-linear Gaussian networks (14.4) Sampling (14.5) 2 Hybrid networks A hybrid
More informationCS Lecture 19. Exponential Families & Expectation Propagation
CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces
More informationBayesian Networks to design optimal experiments. Davide De March
Bayesian Networks to design optimal experiments Davide De March davidedemarch@gmail.com 1 Outline evolutionary experimental design in high-dimensional space and costly experimentation the microwell mixture
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More information