Chapter 16. Structured Probabilistic Models for Deep Learning
|
|
- August Nicholson
- 5 years ago
- Views:
Transcription
1 Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning
2 Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe a probability distribution with an emphasis on visualizing which random variables interact with each other directly Each node represents a random variable Each edge represents a direct interaction x 0 x 1 x 2 x 0 x 1 x 2 x 3 x 4 x 5 x 3 x 4 x 5 Directed models (Bayesian Nets) Undirected models (Markov Nets) lso known as probabilistic graphical models, or graphical models
3 Peng et al.: Deep Learning and Practice 3 Learning, Sampling, and Inference Thing we will be concerned with around the graphical models Learning the model structure p(x) and parameters θ θ = arg max θ p(x; θ) Drawing samples from the learned model x p(x; θ ) or x 2 p(x 2 x 1 ; θ ) Doing approximate or exact inference arg max x 2 p(x 2 x 1 ; θ ) arg max x 2 q(x 2 x 1 ; w)
4 Peng et al.: Deep Learning and Practice 4 Directed Graphical Models directed model defined on x is specified by 1. directed acyclic graph G with nodes denoting elements x i of x 2. set of local conditional probability distributions p(x i P a G (x i )) with P a G (x i ) giving the parent nodes of x i in G and factorizes the joint distribution of the node variables as p(x) = i p(x i P a G (x i )) Such graphical models are also known as Bayesian/belief networks
5 Peng et al.: Deep Learning and Practice 5 They are most naturally applicable in situations where there is clear causality between variables For convenience, we sometimes introduce plate notation
6 Peng et al.: Deep Learning and Practice 6 s an example, we have for the following graph x 0 x 1 x 2 x 3 x 4 x 5 p(x 0, x 1, x 2, x 3, x 4, x 5 ) =p(x 0 )p(x 1 x 0 )p(x 2 x 1 )p(x 3 x 0 ) p(x 4 x 1, x 0 )p(x 5 x 4 ) When compared to the chain rule of probability, p(x) = p(x i x i 1, x i 2,..., x 0 ), i=0 the graph factorization implies certain conditional independence, e.g. p(x 2 x 1, x 0 ) = p(x 2 x 1 )
7 Peng et al.: Deep Learning and Practice 7 p(x 3 x 2, x 1, x 0 ) = p(x 3 x 0 ) Note however it only specifies which variables are allowed to appear in the arguments; there is no constraint on how we define each conditional probability distribution In the present example, we may as well specify p(x 1 x 0 ) = f 1 (x 1, x 0 ) = p(x 1 ) p(x 2 x 1 ) = f 2 (x 2, x 1 ) = p(x 2 ) p(x 3 x 0 ) = f 3 (x 3, x 0 ) = p(x 3 ) p(x 4 x 1, x 0 ) = f 4 (x 4, x 1, x 0 ) = p(x 4 ) p(x 5 x 4 ) = f 5 (x 5, x 4 ) = p(x 5 ) to arrive at a fully factorized distribution p(x 0, x 1, x 2, x 3, x 4, x 5 ) = p(x 0 )p(x 1 )p(x 2 )p(x 3 )p(x 4 )p(x 5 )
8 Peng et al.: Deep Learning and Practice 8 s such, there could be several distributions that satisfy the graph factorization; it is helpful to think of a directed graph as a filter where DF denotes the set of distributions that satisfy the factorization described by the graph To be precise, for any given graph, the DF will include any distributions that have additional independence properties beyond those described by the graph
9 Peng et al.: Deep Learning and Practice 9 Extreme case I: fully connected graph will accept any possible distribution over the given variables x 0 x 1 x 3 x 2 p(x 0, x 1, x 2, x 3 ) = p(x 0 )p(x 1 x 0 )p(x 2 x 1, x 0 )p(x 3 x 2, x 1, x 0 ) (simply the chain rule of probability)
10 Peng et al.: Deep Learning and Practice 10 Extreme case II: fully disconnected graph will only accept a fully factorized distribution x 0 x 1 x 3 x 2 p(x 0, x 1, x 2, x 3 ) = p(x 0 )p(x 1 )p(x 2 )p(x 3 ) It is also straightforward to see that a fully factorized distribution will pass through any graph
11 Peng et al.: Deep Learning and Practice 11 In general, to model n discrete variables each having k values, we need a table of size O(k n ); the conditional independence implied by the graph can reduce the table size to O(k m ), given m is the maximum number of conditioning variables for all x i This suggests that as long as each variable has few parents in the graph, the distribution can be represented with very few parameters
12 Peng et al.: Deep Learning and Practice 12 Undirected Graphical Models n undirected graphical model is defined on an undirected graph G and factorizes the joint distribution of its node variables as a product of potential functions φ(c) over the maximum cliques C of the graph where p(x) = 1 Z C G p(x) is an unnormalized distribution φ(c) = 1 Z p(x) Z is a normalization constant (called the partition function) φ(c) is a clique potential and is non-negative They are also known as Markov random fields or Markov networks
13 Peng et al.: Deep Learning and Practice 13 clique is a subset of the nodes in a graph G in which there exists a link between every pair of nodes in the subset maximum clique C is a clique such that it is not possible to include any other nodes in the graph without ceasing to be a clique s an example, we have for the following graph x 0 x 1 x 2 x 3 x 4 x 5 p(x) = 1 Z φ a(x 0, x 3 )φ b (x 0, x 1, x 4 )φ c (x 1, x 2 )φ d (x 4, x 5 )
14 Peng et al.: Deep Learning and Practice 14 The clique potential φ measures the affinity of its member variables in each of their possible joint states One choice for φ is the energy-based model (Boltzmann distribution) φ(c) = exp( E(x C )) where x C denote the variables in that clique The choice of φ needs some attention; not every choice would result in a legitimate probability distribution, e.g. with x R and and β < 0 φ(x) = exp( βx 2 ) In the present case, the unnormalized joint distribution is also a Boltzmann distribution with a total energy given by the sum of the
15 Peng et al.: Deep Learning and Practice 15 energies of all the maximum cliques p(x) = exp( E(x)), with E(x) = C G E(x C ) Each energy term imposes a particular soft constraint on the variables Example: Image de-noising y i { 1, +1}: Observed image pixels
16 Peng et al.: Deep Learning and Practice 16 x i { 1, +1}: Hidden noise-free image pixels The maximum cliques of the graph are seen to be The joint distribution is given by {x i, y i }, {x i, x j } p(x, y) = 1 Z exp( E(x, y)) The (complete) energy function is assumed to be E(x, y) = i E(x i, y i ) + i,j E(x i, x j ) = η i x i y i β i,j x i x j + h i x i Z is an (intractable) function of model parameters η, β and h Z = x,y exp( E(x, y))
17 Peng et al.: Deep Learning and Practice 17 De-noising can be cast as an inference problem arg max x p(x y) s shown, the partition function Z often does not have tractable forms; some approximate algorithms are needed in estimating the model parameters, e.g., with the maximum likelihood principle
18 Peng et al.: Deep Learning and Practice 18 D-Separation We often want to know which subsets of variables are conditionally independent given the values of other sets of variables x 0 x 1 x 2 x 3 x 4 x 5 Is the set of variables {x 1, x 2 } conditionally independent of the variable x 5, given the values of {x 0, x 4 }? or equivalently, p(x 1, x 2, x 5 x 0, x 4 )? = p(x 1, x 2 x 0, x 4 )p(x 5 x 0, x 4 ), p(x 1, x 2 x 0, x 4, x 5 )? = p(x 1, x 2 x 0, x 4 )
19 Peng et al.: Deep Learning and Practice 19 The key rules can be deduced from observing three simple examples a b a b a b s Head-to-Tail s Tail-to-Tail s Head-to-Head Head-to-Tail: a and b are independent (d-separated) given s p(a, b s) = p(a)p(s a)p(b s) p(s) = p(a s)p(b s) Tail-to-Tail: a and b are independent (d-separated) given s p(a, b s) = p(s)p(a s)p(b s) p(s) = p(a s)p(b s)
20 Peng et al.: Deep Learning and Practice 20 Head-to-Head: a and b are in general dependent given s p(a, b s) = p(a)p(b)p(s a, b) p(s) p(a s)p(b s) The head-to-head rule can generalize to the case where a descendant of s is observed a b s c p(a, b c) p(a c)p(b c) in general
21 Peng et al.: Deep Learning and Practice 21 To summarize, given, B, C are three non-intersecting sets of nodes, and B are conditionally independent given C if all paths from any node in to any node in B satisfy Meeting either head-to-tail or tail-to-tail at a node in C, or Meeting head-to-head at a node, and neither the node, nor any of its descendant, is in C In other words, these paths are blocked or inactive These rules tell us only those independencies implied by the graph; recall however that not all independencies of a distribution is captured by the graph (c.f. the filter interpretation)
22 Peng et al.: Deep Learning and Practice 22 Separation Separation refers to the conditional independencies implied by the undirected graph Given, B, C are three non-intersecting sets of nodes, and B are conditionally independent (separated) given C if all paths from any node in to any node in B pass through one or more nodes in C
23 Peng et al.: Deep Learning and Practice 23 Conversion between Directed and Undirected Models Some independencies can be represented by only one of them Conversion from a directed model D to an undirected model U 1. dding an edge to U for any pair of nodes a, b if there is a directed edge between them in D 2. dding an edge to U for any pair of nodes a, b if they are both parents of a third node in D a b a b c a b and a b c c Moralized graph
24 Peng et al.: Deep Learning and Practice 24 In the present case, the potential function φ is given by φ(a, b, c) = p(a)p(b)p(c a, b) Conversion from an undirected model U to a directed model U is much less common, and in general, presents problems due to the normalization constraints (study by yourself)
25 Peng et al.: Deep Learning and Practice 25 Restricted Boltzmann Machines (RBM) n energy-based model with binary visible and hidden units h 1 h 2 h 3 h 4 v 1 v 2 v 3 E(v, h) = b T v c T h v T W h There is no direct interaction between visible units or between hidden units (essentially, a bipartite graph) From the separation rules, we have p(h v) = i p(h i v)
26 Peng et al.: Deep Learning and Practice 26 p(v h) = i p(v i h) which are both factorial By the definition of E(v, h), p(h i = 1 v) and p(v i = 1 h) are evaluated to be p(h i = 1 v) = σ(v T W ;,i + c i ) p(v i = 1 h) = σ(w i,; h + b i ) The hidden units h, although not interpretable, denote features that describe visible units v and can be inferred by p(h i = 1 v) Samples of visible units v can be generated by sampling all of v given h and then all of h given v via block Gibbs sampling
27 Peng et al.: Deep Learning and Practice 27 It is also possible to sample part of v given the values of the others for applications such as image completion (essentially, RBM is a fully probabilistic model) Training input Results of image completion Estimating the model parameters W, b, c is achieved with the maximum likelihood principle arg max p(v; W, b, c) W,b,c
28 Peng et al.: Deep Learning and Practice 28 where the marginal distribution of visible units is given by p(v; W, b, c) = 1 Z exp( E(v, h)) h It however is noticed that the partition function Z is intractable Z = v,h exp( E(v, h)) which is a function of the model parameters W, b, c Some specialized training techniques involving sampling are needed
29 Peng et al.: Deep Learning and Practice 29 Deep Boltzmann Machines (DBM) Introducing layers of hidden units to RBM h (3) h (3) h (3) h (3) h (2) h (2) h (2) h (2) h (1) h (1) h (1) h (1) v 1 v 2 v 3 E(v, h (1), h (2), h (3) ) = v T W (1) h (1) h (1)T W (2) h (2) h (2)T W (3) h (3)
30 Peng et al.: Deep Learning and Practice 30 From the graph, the posterior distribution is no longer factorial p(h (1), h (2), h (3) v) p(h (1) v)p(h (2) v)p(h (3) v) pproximate inference (based on variational inference) is needed p(h (1), h (2), h (3) v) q(h (1) v)q(h (2) v)q(h (3) v) Layer-wise unsupervised pre-training is also common
31 Peng et al.: Deep Learning and Practice 31 More Examples: Label Noise Model nother deep learning approach to graphical models is to approximate their conditional distributions with deep neural networks Objective: To infer ground truth labels for images Visible variables (noisy data) x : Image ỹ : Noisy label (one-hot vector) Latent variables y : True label (one-hot vector) z : Label noise type
32 Peng et al.: Deep Learning and Practice 32 Graphical model θ 1 x y θ 2 z ỹ p(ỹ, y, z x) = p(ỹ y, z) }{{} Hand designed N p(y x; θ 1 ) }{{} N.N. p(z x; θ 1 ) }{{} N.N.
33 Peng et al.: Deep Learning and Practice 33 Label noise type and the conditional distribution p(ỹ y, z) Noise free (z = 1): ỹ = y p(ỹ y, z) = ỹ T Iy Random noise (z = 2): ỹ is any value other than the true y where p(ỹ y, z) = 1 L 1ỹT (U I)y U is a matrix of 1 s L is the number of possible labels Confusing noise (z = 3): ỹ is any value close to the true y p(ỹ y, z) = ỹ T Cy
34 Peng et al.: Deep Learning and Practice 34 Training of θ 1, θ 2 is based on the EM algorithm (study the paper) Testing is achieved by the neural network p(y x; θ 1 ) Note that unlike RBM/DBM, the hidden variables here are interpretable as is the case with most conventional graphical models
35 Peng et al.: Deep Learning and Practice 35 Review Directed vs. undirected graphical models Probability distributions and their graph representations Training, sampling, and inference for graphical models Extracting conditional independence: d-separation and separation Deep learning with graphical models
Chris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationDirected and Undirected Graphical Models
Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed
More informationThe Origin of Deep Learning. Lili Mou Jan, 2015
The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets
More informationChapter 20. Deep Generative Models
Peng et al.: Deep Learning and Practice 1 Chapter 20 Deep Generative Models Peng et al.: Deep Learning and Practice 2 Generative Models Models that are able to Provide an estimate of the probability distribution
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationCS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional
More informationAn Introduction to Bayesian Machine Learning
1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems
More informationPart I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS
Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a
More informationRapid Introduction to Machine Learning/ Deep Learning
Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/32 Lecture 5a Bayesian network April 14, 2016 2/32 Table of contents 1 1. Objectives of Lecture 5a 2 2.Bayesian
More informationUsing Graphs to Describe Model Structure. Sargur N. Srihari
Using Graphs to Describe Model Structure Sargur N. srihari@cedar.buffalo.edu 1 Topics in Structured PGMs for Deep Learning 0. Overview 1. Challenge of Unstructured Modeling 2. Using graphs to describe
More information3 : Representation of Undirected GM
10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:
More informationDeep Learning Srihari. Deep Belief Nets. Sargur N. Srihari
Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous
More informationUndirected Graphical Models: Markov Random Fields
Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected
More informationDirected Graphical Models
CS 2750: Machine Learning Directed Graphical Models Prof. Adriana Kovashka University of Pittsburgh March 28, 2017 Graphical Models If no assumption of independence is made, must estimate an exponential
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationDeep unsupervised learning
Deep unsupervised learning Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea Unsupervised learning In machine learning, there are 3 kinds of learning paradigm.
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationReview: Directed Models (Bayes Nets)
X Review: Directed Models (Bayes Nets) Lecture 3: Undirected Graphical Models Sam Roweis January 2, 24 Semantics: x y z if z d-separates x and y d-separation: z d-separates x from y if along every undirected
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationProbabilistic Graphical Models (I)
Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random
More informationIntroduction to Restricted Boltzmann Machines
Introduction to Restricted Boltzmann Machines Ilija Bogunovic and Edo Collins EPFL {ilija.bogunovic,edo.collins}@epfl.ch October 13, 2014 Introduction Ingredients: 1. Probabilistic graphical models (undirected,
More informationLecture 16 Deep Neural Generative Models
Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 9: Expectation Maximiation (EM) Algorithm, Learning in Undirected Graphical Models Some figures courtesy
More informationReading Group on Deep Learning Session 4 Unsupervised Neural Networks
Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Jakob Verbeek & Daan Wynen 206-09-22 Jakob Verbeek & Daan Wynen Unsupervised Neural Networks Outline Autoencoders Restricted) Boltzmann
More informationLecture 15. Probabilistic Models on Graph
Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how
More informationLecture 6: Graphical Models
Lecture 6: Graphical Models Kai-Wei Chang CS @ Uniersity of Virginia kw@kwchang.net Some slides are adapted from Viek Skirmar s course on Structured Prediction 1 So far We discussed sequence labeling tasks:
More informationCSC 412 (Lecture 4): Undirected Graphical Models
CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:
More informationGreedy Layer-Wise Training of Deep Networks
Greedy Layer-Wise Training of Deep Networks Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle NIPS 2007 Presented by Ahmed Hefny Story so far Deep neural nets are more expressive: Can learn
More informationStatistical Approaches to Learning and Discovery
Statistical Approaches to Learning and Discovery Graphical Models Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon University
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationMachine Learning Lecture 14
Many slides adapted from B. Schiele, S. Roth, Z. Gharahmani Machine Learning Lecture 14 Undirected Graphical Models & Inference 23.06.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationRapid Introduction to Machine Learning/ Deep Learning
Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/24 Lecture 5b Markov random field (MRF) November 13, 2015 2/24 Table of contents 1 1. Objectives of Lecture
More informationProbabilistic Graphical Models
Probabilistic Graphical Models David Sontag New York University Lecture 4, February 16, 2012 David Sontag (NYU) Graphical Models Lecture 4, February 16, 2012 1 / 27 Undirected graphical models Reminder
More informationNeed for Sampling in Machine Learning. Sargur Srihari
Need for Sampling in Machine Learning Sargur srihari@cedar.buffalo.edu 1 Rationale for Sampling 1. ML methods model data with probability distributions E.g., p(x,y; θ) 2. Models are used to answer queries,
More informationRestricted Boltzmann Machines
Restricted Boltzmann Machines Boltzmann Machine(BM) A Boltzmann machine extends a stochastic Hopfield network to include hidden units. It has binary (0 or 1) visible vector unit x and hidden (latent) vector
More informationIntelligent Systems:
Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition
More informationRepresentation of undirected GM. Kayhan Batmanghelich
Representation of undirected GM Kayhan Batmanghelich Review Review: Directed Graphical Model Represent distribution of the form ny p(x 1,,X n = p(x i (X i i=1 Factorizes in terms of local conditional probabilities
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationProbabilistic Machine Learning
Probabilistic Machine Learning Bayesian Nets, MCMC, and more Marek Petrik 4/18/2017 Based on: P. Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. Chapter 10. Conditional Independence Independent
More informationConditional Independence
Conditional Independence Sargur Srihari srihari@cedar.buffalo.edu 1 Conditional Independence Topics 1. What is Conditional Independence? Factorization of probability distribution into marginals 2. Why
More information1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)
Machine Learning (ML, F16) Lecture#07 (Thursday Nov. 3rd) Lecturer: Byron Boots Undirected Graphical Models 1 Undirected Graphical Models In the previous lecture, we discussed directed graphical models.
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationBayesian Approaches Data Mining Selected Technique
Bayesian Approaches Data Mining Selected Technique Henry Xiao xiao@cs.queensu.ca School of Computing Queen s University Henry Xiao CISC 873 Data Mining p. 1/17 Probabilistic Bases Review the fundamentals
More information4.1 Notation and probability review
Directed and undirected graphical models Fall 2015 Lecture 4 October 21st Lecturer: Simon Lacoste-Julien Scribe: Jaime Roquero, JieYing Wu 4.1 Notation and probability review 4.1.1 Notations Let us recall
More informationUnsupervised Learning
CS 3750 Advanced Machine Learning hkc6@pitt.edu Unsupervised Learning Data: Just data, no labels Goal: Learn some underlying hidden structure of the data P(, ) P( ) Principle Component Analysis (Dimensionality
More informationRepresentation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2
Representation Stefano Ermon, Aditya Grover Stanford University Lecture 2 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 2 1 / 32 Learning a generative model We are given a training
More information1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution
NETWORK ANALYSIS Lourens Waldorp PROBABILITY AND GRAPHS The objective is to obtain a correspondence between the intuitive pictures (graphs) of variables of interest and the probability distributions of
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 16 Undirected Graphs Undirected Separation Inferring Marginals & Conditionals Moralization Junction Trees Triangulation Undirected Graphs Separation
More information2 : Directed GMs: Bayesian Networks
10-708: Probabilistic Graphical Models 10-708, Spring 2017 2 : Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Jayanth Koushik, Hiroaki Hayashi, Christian Perez Topic: Directed GMs 1 Types
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationReplicated Softmax: an Undirected Topic Model. Stephen Turner
Replicated Softmax: an Undirected Topic Model Stephen Turner 1. Introduction 2. Replicated Softmax: A Generative Model of Word Counts 3. Evaluating Replicated Softmax as a Generative Model 4. Experimental
More informationLearning from Sensor Data: Set II. Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University
Learning from Sensor Data: Set II Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University 1 6. Data Representation The approach for learning from data Probabilistic
More informationBayesian Learning in Undirected Graphical Models
Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationRecall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.
ecall from last time Lecture 3: onditional independence and graph structure onditional independencies implied by a belief network Independence maps (I-maps) Factorization theorem The Bayes ball algorithm
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More informationAlternative Parameterizations of Markov Networks. Sargur Srihari
Alternative Parameterizations of Markov Networks Sargur srihari@cedar.buffalo.edu 1 Topics Three types of parameterization 1. Gibbs Parameterization 2. Factor Graphs 3. Log-linear Models Features (Ising,
More informationLecture 4 October 18th
Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations
More informationUndirected Graphical Models
Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationProbabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm
Probabilistic Graphical Models 10-708 Homework 2: Due February 24, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 4-8. You must complete all four problems to
More informationUNSUPERVISED LEARNING
UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training
More informationGraphical Models 359
8 Graphical Models Probabilities play a central role in modern pattern recognition. We have seen in Chapter 1 that probability theory can be expressed in terms of two simple equations corresponding to
More informationAlternative Parameterizations of Markov Networks. Sargur Srihari
Alternative Parameterizations of Markov Networks Sargur srihari@cedar.buffalo.edu 1 Topics Three types of parameterization 1. Gibbs Parameterization 2. Factor Graphs 3. Log-linear Models with Energy functions
More informationMarkov Networks.
Markov Networks www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts Markov network syntax Markov network semantics Potential functions Partition function
More informationGraphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence
Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence General overview Introduction Directed acyclic graphs (DAGs) and conditional independence DAGs and causal effects
More informationNotes on Markov Networks
Notes on Markov Networks Lili Mou moull12@sei.pku.edu.cn December, 2014 This note covers basic topics in Markov networks. We mainly talk about the formal definition, Gibbs sampling for inference, and maximum
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationMachine Learning Summer School
Machine Learning Summer School Lecture 1: Introduction to Graphical Models Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ epartment of ngineering University of ambridge, UK
More informationCS Lecture 4. Markov Random Fields
CS 6347 Lecture 4 Markov Random Fields Recap Announcements First homework is available on elearning Reminder: Office hours Tuesday from 10am-11am Last Time Bayesian networks Today Markov random fields
More informationMAP Examples. Sargur Srihari
MAP Examples Sargur srihari@cedar.buffalo.edu 1 Potts Model CRF for OCR Topics Image segmentation based on energy minimization 2 Examples of MAP Many interesting examples of MAP inference are instances
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationGraphical Models - Part I
Graphical Models - Part I Oliver Schulte - CMPT 726 Bishop PRML Ch. 8, some slides from Russell and Norvig AIMA2e Outline Probabilistic Models Bayesian Networks Markov Random Fields Inference Outline Probabilistic
More informationDirected Graphical Models or Bayesian Networks
Directed Graphical Models or Bayesian Networks Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Bayesian Networks One of the most exciting recent advancements in statistical AI Compact
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationQualifier: CS 6375 Machine Learning Spring 2015
Qualifier: CS 6375 Machine Learning Spring 2015 The exam is closed book. You are allowed to use two double-sided cheat sheets and a calculator. If you run out of room for an answer, use an additional sheet
More information{ p if x = 1 1 p if x = 0
Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =
More informationEnergy Based Models. Stefano Ermon, Aditya Grover. Stanford University. Lecture 13
Energy Based Models Stefano Ermon, Aditya Grover Stanford University Lecture 13 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 13 1 / 21 Summary Story so far Representation: Latent
More informationConditional Independence and Factorization
Conditional Independence and Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationChapter 11. Stochastic Methods Rooted in Statistical Mechanics
Chapter 11. Stochastic Methods Rooted in Statistical Mechanics Neural Networks and Learning Machines (Haykin) Lecture Notes on Self-learning Neural Algorithms Byoung-Tak Zhang School of Computer Science
More informationDirected and Undirected Graphical Models
Directed and Undirected Graphical Models Adrian Weller MLSALT4 Lecture Feb 26, 2016 With thanks to David Sontag (NYU) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture Notes Fall 2009 November, 2009 Byoung-Ta Zhang School of Computer Science and Engineering & Cognitive Science, Brain Science, and Bioinformatics Seoul National University
More informationCausality in Econometrics (3)
Graphical Causal Models References Causality in Econometrics (3) Alessio Moneta Max Planck Institute of Economics Jena moneta@econ.mpg.de 26 April 2011 GSBC Lecture Friedrich-Schiller-Universität Jena
More informationGraphical Models - Part II
Graphical Models - Part II Bishop PRML Ch. 8 Alireza Ghane Outline Probabilistic Models Bayesian Networks Markov Random Fields Inference Graphical Models Alireza Ghane / Greg Mori 1 Outline Probabilistic
More informationIntroduction to Probabilistic Graphical Models
Introduction to Probabilistic Graphical Models Sargur Srihari srihari@cedar.buffalo.edu 1 Topics 1. What are probabilistic graphical models (PGMs) 2. Use of PGMs Engineering and AI 3. Directionality in
More informationParameter learning in CRF s
Parameter learning in CRF s June 01, 2009 Structured output learning We ish to learn a discriminant (or compatability) function: F : X Y R (1) here X is the space of inputs and Y is the space of outputs.
More informationNotes on Boltzmann Machines
1 Notes on Boltzmann Machines Patrick Kenny Centre de recherche informatique de Montréal Patrick.Kenny@crim.ca I. INTRODUCTION Boltzmann machines are probability distributions on high dimensional binary
More informationVariational Scoring of Graphical Model Structures
Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational
More information9 Forward-backward algorithm, sum-product on factor graphs
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous
More information