A graph contains a set of nodes (vertices) connected by links (edges or arcs)
|
|
- Miles Charles
- 5 years ago
- Views:
Transcription
1 BOLTZMANN MACHINES
2 Generative Models
3 Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable, and links represent probabilistic dependencies between random variables. Two types of graphical models: 1). Bayesian networks, also known as Directed Graphical Models (the links have a particular directionality indicated by the arrows) 2). Markov Random Fields, also known as Undirected Graphical Models (the links do not carry arrows and have no directional significance). Hybrid graphical models that combine directed and undirected graphical models, such as Deep Belief Networks.
4 Bayesian Networks (Directed Graphical Models ) Directed Graphs are useful for expressing causal (parent-child) relationships between random variables. Let us consider an arbitrary joint distribution over three random variables a, b and c (can be either discrete or continuous). By application of the product rule of probability, we can get (of course there can be other possible decompositions also) Represent the joint distribution in terms of a simple graphical model:
5 Bayesian Networks (Directed Graphical Models ) If there is a link going from node a to node b, then we say that: - node a is a parent of node b. - node b is a child of node a. If each node has incoming links from all lower numbered nodes, then the graph is fully connected; there is a link between all pairs of nodes. Absence of links conveys certain information about the properties of the class of distributions that the graph conveys. (Factorization Property): The joint distribution defined by the directed graph is given by the product of conditional distribution for each node conditioned on its parents:
6 Bayesian Networks (Directed Graphical Models ) Example: Important restriction: There must be no directed cycles!. Such graphs are also called directed acyclic graphs (DAGs). - Reason: If we have a cycle then there is no guarantee that joint distribution is a valid distribution
7 Bayesian Networks (Directed Graphical Models ) In probabilistic graphical models, random variables will be denoted by open circles and deterministic parameters will be denoted by smaller solid circles. When we apply a graphical model to a problem in machine learning, we will set some of the variables to specific observed values (e.g. condition on the data).
8 Bayesian Networks (Directed Graphical Models ) Ancestral Sampling Our goal is draw a sample from this distribution. - Start at the top and sample in order
9 Bayesian Networks (Directed Graphical Models ) Conditional Independence An important feature of graphical models is that conditional independence properties of the joint distribution can be read directly from the graph without performing any analytical manipulations Example: IID data (a) Given mu all x s are independent; (b) If we integrate out mu then x s are not longer independent
10 Bayesian Networks (Directed Graphical Models ) Markov Blanket in Directed Models - The Markov blanket of a node is the minimal set of nodes that must be observed to make this node independent of all other nodes In a directed model, the Markov blanket includes parents, children and co-parents (i.e. all the parents of the node s children) due to explaining away. The Markov blanket property has the advantage that in graphs where we have many RVs, then we can do inference for each RV in parallel given its Markov blanket. This property will become basis of Markov chain, belief networks etc.
11 Markov Random Fields (Undirected Graphical Models) Directed graphs are useful for expressing causal relationships between random variables, whereas undirected graphs are useful for expressing soft constraints between random variables The joint distribution defined by the graph is given by the product of non-negative potential functions over the maximal cliques (connected subset of nodes) where the normalizing constant Z is called a partition function. Example:
12 Markov Random Fields (Undirected Graphical Models) Cliques: It is a subset of nodes such that there exists a link between all pairs of nodes in a subset. Maximal Clique: a clique such that it is not possible to include any other nodes in the set without it ceasing to be a clique. This graph has 7 cliques: Two maximal cliques:
13 Markov Random Fields (Undirected Graphical Models) Each potential function is a mapping from the joint configurations of random variables in a clique to non-negative real numbers. In contrast to directed graphs, the potential functions do not have a specific probabilistic interpretation. This gives us greater flexibility in choosing the potential functions. Potential functions are often represented as exponentials: where E(x) is called an energy function.
14 Markov Random Fields (Undirected Graphical Models) For many interesting real-world problems, we need to introduce hidden or latent variables. Our random variables will contain both visible and hidden variables x = (v, h). In general, computing both partition function and summation over hidden variables will be intractable, except for special cases. Thus, Parameter learning becomes a very challenging task.
15 Markov Random Fields (Undirected Graphical Models) Conditional Independence in Undirected graphs: - It is easier compared to directed models. - Two sets of nodes are conditionally independent if the observations block all paths between them.
16 Markov Random Fields (Undirected Graphical Models) Markov Blanket in Undirected graphs: - This is simpler than in directed models, since there is no explaining away. - The conditional distribution of conditioned on all the variables in the graph is dependent only on the variables in the Markov blanket.
17 Markov Random Fields (Undirected Graphical Models) Since conditional independence of random variables and the factorization properties of the joint probability distribution are closely related, one can ask if there exists a general factorization form of the distributions of MRFs. Hammersley-Clifford Theorem: Below two sets of distributions are the same. The set of distributions consistent with the conditional independence relationships defined by the undirected graph. The set of distributions consistent with the factorization defined by potential functions on maximal cliques of the graph.
18 Restricted Boltzmann Machines RBMs are undirected bipartite graphical models with maximal clique size of 2. Stochastic binary visible variables: Stochastic binary hidden variables:
19 Restricted Boltzmann Machines The energy of the joint configuration is given as: Note that the graph of an RBM has only connections between the layer of hidden and visible variables but not between two variables of the same layer. In terms of probability this means that the hidden variables are independent given the state of the visible variables and vice versa: We can also show that (derivation in Fisher s paper):
20 Restricted Boltzmann Machines RBM learning algorithms are based on gradient ascent on the log-likelihood i.e Log-Likelihood Gradient of MRFs with Latent Variables For a RBM model with parameters given a single training example v is, the log-likelihood
21 Restricted Boltzmann Machines In the last step we used that the conditional probability can be written in the following way:
22 Restricted Boltzmann Machines The first term can be computed efficiently because it factorizes nicely. For example, w.r.t. the parameter we get:
23 Restricted Boltzmann Machines Similarly, the second term can also be written as Therefore, w.r.t. the parameter we get (if using outer summation over v):
24 Restricted Boltzmann Machines Thus, the derivative of the log-likelihood of a single training pattern v w.r.t. the weight becomes
25 Restricted Boltzmann Machines For the mean of this derivative over a training set notations are used: often the following with q denoting the empirical distribution. This gives the often stated rule:
26 Restricted Boltzmann Machines Analogously we can obtain In all these equations, first term can be computed analytically, because of conditional independence of variables. But, the second term which runs over all possible configuration of visible variables needs exponential number of terms and thus is intractable. To avoid this computational burden, the second expectation can be approximated by samples drawn from the model distribution based on MCMC techniques.
27 Restricted Boltzmann Machines Contrastive Divergence Start sampling chain at a training example Obtain the point by Gibbs sampling (usually k=1) Replace the expectation by a point estimate at
28 Restricted Boltzmann Machines The independence between the variables in one layer makes Gibbs sampling especially easy: Instead of sampling new values for all variables subsequently, the states of all variables in one layer can be sampled jointly. This is also referred to as block Gibbs sampling. Each step t consists of sampling from and sampling from subsequently. The log-likelihood gradient w.r.t. θ of the log-likelihood for one training pattern is then approximated by Note that since energy function is linear function of parameters, hence it is easy to compute these derivative.
29 Restricted Boltzmann Machines These equations says that for positive samples we want to decrease the energy and for negative sample (that when I am away from true data) I want to increase the energy. This can be plotted like below.
30 Restricted Boltzmann Machines
31 Restricted Boltzmann Machines Persistent Contrastive Divergence The algorithm corresponds to standard CD learning without reinitializing the visible units of the Markov chain with a training sample each time we want to draw a sample v(k) approximately from the RBM distribution. Instead one keeps persistent chains which are run for k Gibbs steps after each parameter update (i.e., the initial state of the current Gibbs chain is equal to v(k) from the previous update step). The fundamental idea underlying PCD is that one could assume that the chains stay close to the stationary distribution if the learning rate is sufficiently small and thus the model changes only slightly between parameter updates
32 Restricted Boltzmann Machines Example
33 Restricted Boltzmann Machines Gaussian-Bernoulli RBM: GBRBMs are variant of RBMs that can be used for modeling real-valued vectors such as pixel intensities and filter responses. To do this, we only need to modify the energy function, such that now each visible unit will correspond to a Gaussian distributed RV, instead of binomial distribution as in case of RBMs. To obtain Gaussian-distributed units, one adds quadratic terms to the energy. Adding gives rise to a diagonal covariance matrix between units of the same layer, where is the continuous value of a Gaussian unit and is the variance of RV. Recommend to normalize the training set by: subtracting the mean of each input dividing each input by the training set standard deviation
34 Restricted Boltzmann Machines Gaussian-Bernoulli RBM Consider modeling visible real-valued units and be binary stochastic hidden units. The energy of the state {v; h} of the Gaussian-Bernoulli RBM is defined as The density that the model assigns to a visible vector v is given by
35 Restricted Boltzmann Machines Similar to the standard RBMs, the conditional distributions factorize as Observe that conditioned on the states of the hidden units, each visible unit is modeled by a Gaussian distribution, whose mean is shifted by the weighted combination of the hidden unit activations. Given a set of observations, the derivative of the log-likelihood with respect to the model parameters takes a very similar form when compared to binary RBMs.
36 Restricted Boltzmann Machines The Replicated Softmax Model is useful for modeling sparse count data, such as word count vectors in a document. Again, we only need to modify the energy function, such that now each visible unit will correspond to a Multinomial distributed RV, instead of binomial distribution as in case of RBMs. Specifically, let K be the dictionary size, M be the number of words appearing in a document, and be binary stochastic hidden topic features. Let V be a M K observed binary matrix with iff the multinomial visible unit i takes on value (meaning the word in the document is the dictionary word).
37 Restricted Boltzmann Machines The energy of the state {V; h} can be defined as is a symmetric interaction term between visible unit i that takes on value k, and hidden feature j, is the bias of unit i that takes on value k, and is the bias of hidden feature j.
38 Restricted Boltzmann Machines
39 Restricted Boltzmann Machines Collaborative Filtering
40 Restricted Boltzmann Machines Local vs. Distributed Representations
41 Deep Belief Networks DBNs are generative model that mixes undirected and directed connections between variables. In given network top 2 layers distribution is an RBM! other layers form a Bayesian network with conditional distributions: This is not a feed-forward neural network.
42 Deep Belief Networks The joint distribution of a DBN is as follows:
43 Deep Belief Networks Layer-wise Pretraining: Improve prior on last layer by adding another hidden layer Keep all lower layers as constant when training the upper layers
44 Deep Belief Networks Variational Bound The reason why stacking of layers increase likelihood is because of the by stacking we are increasing the ELBO.
45 Deep Belief Networks The above equation is called variational bound if is equal to the true conditional, then we have an equality the bound is tight! In fact, difference between the left and right terms is the KL divergence between and The ELBO equation can be rewritten as Thus if we increase the term then we can improve. This is the basis for DBNs i.e., we take and model it with another network, so that its value can be maximized. Thus, layerwise pretraining improves variational lower bound.
46 Deep Belief Networks
47 Deep Belief Networks Sampling from DBNs To sample from the DBN model: Sample using alternating Gibbs sampling from RBM Sample lower layers using sigmoid belief network
48 Deep Belief Networks Note that, if we replace the top RBM in DBNs with a Gaussian prior i.e. if p(h2) is a Gaussian, then whole system will become a VAE, or if RBM is replaced with iid samples or independent Bernoulli's then the system will become Sigmoid belief network or Helmholtz machine.
49 Deep Belief Networks
50 Deep Belief Networks DBNs for Classification After layer-by-layer unsupervised pretraining, discriminative fine-tuning by backpropagation achieves an error rate of 1.2% on MNIST. SVM s get 1.4% and randomly initialized backprop gets 1.6%. Clearly unsupervised learning helps generalization. This is because w/o unsupervised pretraining we are only supplying only the one bit of information to the model (i.e., the label), while if we do pretraining then we are providing much information along with label.
51 Deep Boltzmann Machines DBMs are undirected graphical models with multiple layers of hidden variables.
52 Deep Boltzmann Machines
53 Deep Boltzmann Machines DBMs has bottom-up + top-down inference this allows to model input better compared to DBNs (or conventional neural nets) which only have bottom-up inference.
54 Deep Boltzmann Machines
55 Deep Boltzmann Machines The conditional distributions can be given as: In RBMs, all the hidden units in a layer were independent of each other that allowed us to calculate the data dependent expectation analytically. While, in DBMs, the hidden units are no longer independent of each therefore now we need to use some technique to approximate the data dependent expectation also.
56 Deep Boltzmann Machines Hidden units in a layer are no longer independent of each other, hence both expectations are intractable.
57 Deep Boltzmann Machines Minimize KL between approximating and true distributions with respect to variational parameters
58 Deep Boltzmann Machines
59 Deep Boltzmann Machines sampling from a two hidden layer DBM by running Markov chain
60 Deep Boltzmann Machines In practice we simulate several Markov chains in parallel to generate M samples.
61 Deep Boltzmann Machines
62 References Fischer, Asja, and Christian Igel. "An introduction to restricted Boltzmann machines." Iberoamerican Congress on Pattern Recognition. Springer, Berlin, Heidelberg, Russ Salakhutdinov: Buy7_UEVQkyfhHapa Hugo Larochelle:
63 THANK YOU!!
Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari
Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous
More informationLecture 16 Deep Neural Generative Models
Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed
More informationIntroduction to Restricted Boltzmann Machines
Introduction to Restricted Boltzmann Machines Ilija Bogunovic and Edo Collins EPFL {ilija.bogunovic,edo.collins}@epfl.ch October 13, 2014 Introduction Ingredients: 1. Probabilistic graphical models (undirected,
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationThe Origin of Deep Learning. Lili Mou Jan, 2015
The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationReading Group on Deep Learning Session 4 Unsupervised Neural Networks
Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Jakob Verbeek & Daan Wynen 206-09-22 Jakob Verbeek & Daan Wynen Unsupervised Neural Networks Outline Autoencoders Restricted) Boltzmann
More informationGreedy Layer-Wise Training of Deep Networks
Greedy Layer-Wise Training of Deep Networks Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle NIPS 2007 Presented by Ahmed Hefny Story so far Deep neural nets are more expressive: Can learn
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationDeep unsupervised learning
Deep unsupervised learning Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea Unsupervised learning In machine learning, there are 3 kinds of learning paradigm.
More informationPart I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS
Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a
More informationUnsupervised Learning
CS 3750 Advanced Machine Learning hkc6@pitt.edu Unsupervised Learning Data: Just data, no labels Goal: Learn some underlying hidden structure of the data P(, ) P( ) Principle Component Analysis (Dimensionality
More informationLarge-Scale Feature Learning with Spike-and-Slab Sparse Coding
Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab
More informationLearning Deep Architectures
Learning Deep Architectures Yoshua Bengio, U. Montreal Microsoft Cambridge, U.K. July 7th, 2009, Montreal Thanks to: Aaron Courville, Pascal Vincent, Dumitru Erhan, Olivier Delalleau, Olivier Breuleux,
More informationRestricted Boltzmann Machines
Restricted Boltzmann Machines Boltzmann Machine(BM) A Boltzmann machine extends a stochastic Hopfield network to include hidden units. It has binary (0 or 1) visible vector unit x and hidden (latent) vector
More information3 : Representation of Undirected GM
10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:
More informationIntroduction to Probabilistic Graphical Models
Introduction to Probabilistic Graphical Models Sargur Srihari srihari@cedar.buffalo.edu 1 Topics 1. What are probabilistic graphical models (PGMs) 2. Use of PGMs Engineering and AI 3. Directionality in
More information1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)
Machine Learning (ML, F16) Lecture#07 (Thursday Nov. 3rd) Lecturer: Byron Boots Undirected Graphical Models 1 Undirected Graphical Models In the previous lecture, we discussed directed graphical models.
More informationUndirected Graphical Models: Markov Random Fields
Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationAu-delà de la Machine de Boltzmann Restreinte. Hugo Larochelle University of Toronto
Au-delà de la Machine de Boltzmann Restreinte Hugo Larochelle University of Toronto Introduction Restricted Boltzmann Machines (RBMs) are useful feature extractors They are mostly used to initialize deep
More informationProbabilistic Graphical Models (I)
Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationDeep Boltzmann Machines
Deep Boltzmann Machines Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel University of Illinois Urbana Champaign agoel10@illinois.edu December 2, 2016 Ruslan Salakutdinov and Geoffrey E. Hinton Amish
More informationChapter 20. Deep Generative Models
Peng et al.: Deep Learning and Practice 1 Chapter 20 Deep Generative Models Peng et al.: Deep Learning and Practice 2 Generative Models Models that are able to Provide an estimate of the probability distribution
More informationDirected and Undirected Graphical Models
Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationLearning Energy-Based Models of High-Dimensional Data
Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero www.cs.toronto.edu/~hinton/energybasedmodelsweb.htm Discovering causal structure as a goal
More informationUNSUPERVISED LEARNING
UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationRestricted Boltzmann Machines for Collaborative Filtering
Restricted Boltzmann Machines for Collaborative Filtering Authors: Ruslan Salakhutdinov Andriy Mnih Geoffrey Hinton Benjamin Schwehn Presentation by: Ioan Stanculescu 1 Overview The Netflix prize problem
More informationLearning Deep Architectures for AI. Part II - Vijay Chakilam
Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationBias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions
- Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions Simon Luo The University of Sydney Data61, CSIRO simon.luo@data61.csiro.au Mahito Sugiyama National Institute of
More informationIntroduction to Graphical Models
Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationGaussian Cardinality Restricted Boltzmann Machines
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Gaussian Cardinality Restricted Boltzmann Machines Cheng Wan, Xiaoming Jin, Guiguang Ding and Dou Shen School of Software, Tsinghua
More informationGibbs Fields & Markov Random Fields
Statistical Techniques in Robotics (16-831, F10) Lecture#7 (Tuesday September 21) Gibbs Fields & Markov Random Fields Lecturer: Drew Bagnell Scribe: Bradford Neuman 1 1 Gibbs Fields Like a Bayes Net, a
More informationPart 2. Representation Learning Algorithms
53 Part 2 Representation Learning Algorithms 54 A neural network = running several logistic regressions at the same time If we feed a vector of inputs through a bunch of logis;c regression func;ons, then
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationRepresentation of undirected GM. Kayhan Batmanghelich
Representation of undirected GM Kayhan Batmanghelich Review Review: Directed Graphical Model Represent distribution of the form ny p(x 1,,X n = p(x i (X i i=1 Factorizes in terms of local conditional probabilities
More informationNotes on Boltzmann Machines
1 Notes on Boltzmann Machines Patrick Kenny Centre de recherche informatique de Montréal Patrick.Kenny@crim.ca I. INTRODUCTION Boltzmann machines are probability distributions on high dimensional binary
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More informationModeling Documents with a Deep Boltzmann Machine
Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava, Ruslan Salakhutdinov & Geoffrey Hinton UAI 2013 Presented by Zhe Gan, Duke University November 14, 2014 1 / 15 Outline Replicated Softmax
More informationCS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional
More informationMachine Learning Lecture 14
Many slides adapted from B. Schiele, S. Roth, Z. Gharahmani Machine Learning Lecture 14 Undirected Graphical Models & Inference 23.06.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de
More informationINITIALIZING NEURAL NETWORKS USING RESTRICTED BOLTZMANN MACHINES. by Amanda Anna Erhard B.S. in Electrical Engineering, University of Pittsburgh, 2014
INITIALIZING NEURAL NETWORKS USING RESTRICTED BOLTZMANN MACHINES by Amanda Anna Erhard B.S. in Electrical Engineering, University of Pittsburgh, 2014 Submitted to the Graduate Faculty of the Swanson School
More informationAn Introduction to Bayesian Machine Learning
1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems
More informationReplicated Softmax: an Undirected Topic Model. Stephen Turner
Replicated Softmax: an Undirected Topic Model Stephen Turner 1. Introduction 2. Replicated Softmax: A Generative Model of Word Counts 3. Evaluating Replicated Softmax as a Generative Model 4. Experimental
More informationCOMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017
COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines COMP9444 17s2 Boltzmann Machines 1 Outline Content Addressable Memory Hopfield Network Generative Models Boltzmann Machine Restricted Boltzmann
More informationRobust Classification using Boltzmann machines by Vasileios Vasilakakis
Robust Classification using Boltzmann machines by Vasileios Vasilakakis The scope of this report is to propose an architecture of Boltzmann machines that could be used in the context of classification,
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationCSC 412 (Lecture 4): Undirected Graphical Models
CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationDensity estimation. Computing, and avoiding, partition functions. Iain Murray
Density estimation Computing, and avoiding, partition functions Roadmap: Motivation: density estimation Understanding annealing/tempering NADE Iain Murray School of Informatics, University of Edinburgh
More informationUsing Graphs to Describe Model Structure. Sargur N. Srihari
Using Graphs to Describe Model Structure Sargur N. srihari@cedar.buffalo.edu 1 Topics in Structured PGMs for Deep Learning 0. Overview 1. Challenge of Unstructured Modeling 2. Using graphs to describe
More informationEmpirical Analysis of the Divergence of Gibbs Sampling Based Learning Algorithms for Restricted Boltzmann Machines
Empirical Analysis of the Divergence of Gibbs Sampling Based Learning Algorithms for Restricted Boltzmann Machines Asja Fischer and Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum,
More informationHow to do backpropagation in a brain
How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep
More informationCSC321 Lecture 20: Autoencoders
CSC321 Lecture 20: Autoencoders Roger Grosse Roger Grosse CSC321 Lecture 20: Autoencoders 1 / 16 Overview Latent variable models so far: mixture models Boltzmann machines Both of these involve discrete
More informationarxiv: v2 [cs.ne] 22 Feb 2013
Sparse Penalty in Deep Belief Networks: Using the Mixed Norm Constraint arxiv:1301.3533v2 [cs.ne] 22 Feb 2013 Xanadu C. Halkias DYNI, LSIS, Universitè du Sud, Avenue de l Université - BP20132, 83957 LA
More informationLecture 4 October 18th
Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations
More informationKnowledge Extraction from DBNs for Images
Knowledge Extraction from DBNs for Images Son N. Tran and Artur d Avila Garcez Department of Computer Science City University London Contents 1 Introduction 2 Knowledge Extraction from DBNs 3 Experimental
More informationUndirected graphical models
Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP-652 and ECSE-608, February 16, 2017 1 Undirected graphical
More informationDeep Belief Networks are compact universal approximators
1 Deep Belief Networks are compact universal approximators Nicolas Le Roux 1, Yoshua Bengio 2 1 Microsoft Research Cambridge 2 University of Montreal Keywords: Deep Belief Networks, Universal Approximation
More informationLearning MN Parameters with Alternative Objective Functions. Sargur Srihari
Learning MN Parameters with Alternative Objective Functions Sargur srihari@cedar.buffalo.edu 1 Topics Max Likelihood & Contrastive Objectives Contrastive Objective Learning Methods Pseudo-likelihood Gradient
More information6.867 Machine learning, lecture 23 (Jaakkola)
Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context
More informationInductive Principles for Restricted Boltzmann Machine Learning
Inductive Principles for Restricted Boltzmann Machine Learning Benjamin Marlin Department of Computer Science University of British Columbia Joint work with Kevin Swersky, Bo Chen and Nando de Freitas
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationDeep Learning Autoencoder Models
Deep Learning Autoencoder Models Davide Bacciu Dipartimento di Informatica Università di Pisa Intelligent Systems for Pattern Recognition (ISPR) Generative Models Wrap-up Deep Learning Module Lecture Generative
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationConditional Independence and Factorization
Conditional Independence and Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationDeep Generative Models. (Unsupervised Learning)
Deep Generative Models (Unsupervised Learning) CEng 783 Deep Learning Fall 2017 Emre Akbaş Reminders Next week: project progress demos in class Describe your problem/goal What you have done so far What
More informationLearning Deep Architectures
Learning Deep Architectures Yoshua Bengio, U. Montreal CIFAR NCAP Summer School 2009 August 6th, 2009, Montreal Main reference: Learning Deep Architectures for AI, Y. Bengio, to appear in Foundations and
More informationProbabilistic Graphical Networks: Definitions and Basic Results
This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical
More informationReview: Directed Models (Bayes Nets)
X Review: Directed Models (Bayes Nets) Lecture 3: Undirected Graphical Models Sam Roweis January 2, 24 Semantics: x y z if z d-separates x and y d-separation: z d-separates x from y if along every undirected
More informationContrastive Divergence
Contrastive Divergence Training Products of Experts by Minimizing CD Hinton, 2002 Helmut Puhr Institute for Theoretical Computer Science TU Graz June 9, 2010 Contents 1 Theory 2 Argument 3 Contrastive
More informationProbabilistic Graphical Models
10-708 Probabilistic Graphical Models Homework 3 (v1.1.0) Due Apr 14, 7:00 PM Rules: 1. Homework is due on the due date at 7:00 PM. The homework should be submitted via Gradescope. Solution to each problem
More informationHow to do backpropagation in a brain. Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto
1 How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto What is wrong with back-propagation? It requires labeled training data. (fixed) Almost
More information2 : Directed GMs: Bayesian Networks
10-708: Probabilistic Graphical Models 10-708, Spring 2017 2 : Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Jayanth Koushik, Hiroaki Hayashi, Christian Perez Topic: Directed GMs 1 Types
More informationLearning to Disentangle Factors of Variation with Manifold Learning
Learning to Disentangle Factors of Variation with Manifold Learning Scott Reed Kihyuk Sohn Yuting Zhang Honglak Lee University of Michigan, Department of Electrical Engineering and Computer Science 08
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationVariational Inference. Sargur Srihari
Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of discussion We first describe inference with PGMs and the intractability of exact inference Then give a taxonomy of inference algorithms
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationMarkov Chains and MCMC
Markov Chains and MCMC Markov chains Let S = {1, 2,..., N} be a finite set consisting of N states. A Markov chain Y 0, Y 1, Y 2,... is a sequence of random variables, with Y t S for all points in time
More informationSpeaker Representation and Verification Part II. by Vasileios Vasilakakis
Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation
More informationMeasuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information
Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information Mathias Berglund, Tapani Raiko, and KyungHyun Cho Department of Information and Computer Science Aalto University
More informationSum-Product Networks: A New Deep Architecture
Sum-Product Networks: A New Deep Architecture Pedro Domingos Dept. Computer Science & Eng. University of Washington Joint work with Hoifung Poon 1 Graphical Models: Challenges Bayesian Network Markov Network
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3
More informationFast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine
Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava nitish@cs.toronto.edu Ruslan Salahutdinov rsalahu@cs.toronto.edu Geoffrey Hinton hinton@cs.toronto.edu
More informationThe connection of dropout and Bayesian statistics
The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University
More informationUsually the estimation of the partition function is intractable and it becomes exponentially hard when the complexity of the model increases. However,
Odyssey 2012 The Speaker and Language Recognition Workshop 25-28 June 2012, Singapore First attempt of Boltzmann Machines for Speaker Verification Mohammed Senoussaoui 1,2, Najim Dehak 3, Patrick Kenny
More informationCourse Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch
Psychology 452 Week 12: Deep Learning What Is Deep Learning? Preliminary Ideas (that we already know!) The Restricted Boltzmann Machine (RBM) Many Layers of RBMs Pros and Cons of Deep Learning Course Structure
More informationGraphical Models 359
8 Graphical Models Probabilities play a central role in modern pattern recognition. We have seen in Chapter 1 that probability theory can be expressed in terms of two simple equations corresponding to
More informationCS Lecture 4. Markov Random Fields
CS 6347 Lecture 4 Markov Random Fields Recap Announcements First homework is available on elearning Reminder: Office hours Tuesday from 10am-11am Last Time Bayesian networks Today Markov random fields
More information