Unsupervised Learning

Size: px
Start display at page:

Download "Unsupervised Learning"

Transcription

1 CS 3750 Advanced Machine Learning Unsupervised Learning Data: Just data, no labels Goal: Learn some underlying hidden structure of the data P(, ) P( ) Principle Component Analysis (Dimensionality reduction) Autoencoders (Feature learning) P( ) Generative Models 1

2 Generative Models Given training data, generate new samples from same distribution CIFAR-10 dataset (Krihevsky and Hinton, 2009) Training data p data () Generated sampled p model () Want to learn p model () similar to p data () Why Generative Models? Realistic samples for artwork, super-resolution, coloriation, etc. Generative models of time-series data can be used for simulation and planning (reinforcement learning applications!) Training generative models can also enable inference of latent representation that can be useful as general features 2

3 Taonomy of Generative Models Generative Models Eplicit density Implicit density Tractable density Approimate density Markov Chain Direct Fully visible belief nets - NADE - MADE - PielRNN/CNN Change of variables models (nonlinear ICA) Variational Variational Autoencoder GSN Markov Chain Boltmann Machine GAN Restricted Boltmann Machines (RBM) 3

4 Restricted Boltmann Machines Many interesting theoretical results about undirected models depends on the assumption that, p > 0. A convenient way to enforce this condition is to use an energy-based model where E() is known as the energy function p = ep( E ) Remember: normalied probability distribution p() = 1 p Z Any distribution of this form is an eample of a Boltmann distribution. For this reason, many energy-based models are called Boltmann machines. a b c d e f E(a, b, c, d, e, f) can be written as E a,b (a,b) + E b,c (b,c) + E a,d (a,d) + E b,e (b,e) + E e,f (e,f) φ a,b (a, b) = ep( E(a, b) Restricted Boltmann Machines Boltmann machines were originally introduced as a general connectionist approach to learning arbitrary probability distributions over binary vectors While Boltmann machines were defined to encompass both models with and without latent variables, the term Boltmann machine is today most often used to designate models with latent variables Joint probability distribution: p, h = 1 ep( E, h ) Z Energy function: E, h = T R T Wh h T Sh c T b T h 4

5 Restricted Boltmann Machines Restricted Boltmann machines (RBMs) are undirected probabilistic graphical models containing a layer of observable variables and a single layer of latent variables RBM is a bipartite graph, with no connections permitted between any variables in the observed layer or between any units in the latent layer Restricted Boltmann machine Deep belief network Deep Boltmann machine Restricted Boltmann Machines b j h hidden layer (binary units) bias W connections c k visible layer (binary units) p, h = ep E, h /Z = ep h T W + c T + b T h /Z = ep h T W ep c T ep b T h /Z Factors Z = ep( E, h ) The notation based on an energy function is simply an alternative to the representation as the product of factors,h partition function (intractable) 5

6 Restricted Boltmann Machines Pair-wise Factors Hidden variables h 1 h 2... h 3 h H p, h = 1 Z j ep W j,k h j k k Visible variables D Bipartite Structure ep c k k k ep b j h j j Unary Factors The scalar visualiation is more informative of the structure within the vectors RBM: Inference Hidden variables h 1 h 2... h 3 h H Bipartite Structure Restricted: No interaction between hidden variables Visible variables p h Similarly: p h D = p(h j ) j = p( k h) k Factories: easy to compute Markov random fields, Boltmann machines, log-linear models Inferring the distribution over the hidden variables is easy 6

7 RBM: Inference Conditional Distributions h p h = p(h j ) j 1 p h j = 1 = 1 + ep b j + W j. = sigm(b j + W j. ) j th row if W h p h = p( k h) k 1 p k = 1 h = 1 + ep c k + h T W. k = sigm(c k + h T W. k ) k th column if W RBM: Free Energy What about computing marginal p()? p = p, h = ep E(, h) /Z h 0,1 H h 0,1 H H = ep c T + log 1 + ep b j + W j. j=1 = ep F() /Z /Z X X X X X X h Free energy 7

8 RBM: Free Energy What about computing marginal p()? p = h 0,1 H ep h T W + c T + b T h /Z = ep c T h 1 {0,1} h H {0,1} ep h i W j. + b j h j j /Z = ep c T ep(h 1 W 1. + b 1 h 1 ) ep h H W H. + b H h H h 1 {0,1} h H 0,1 /Z = ep c T 1 + ep b 1 + W ep b H + W H. /Z = ep c T ep(log 1 + ep b 1 + W 1. ) ep(log( 1 + ep b H + W H. )/Z H = ep c T + log 1 + ep b j + W j. /Z j=1 Also known as Product of Eperts model RBM: Free Energy H p = ep c T + log 1 + ep b j + W j. j=1 H = ep c T + softplus(b j + W j.) /Z j=1 /Z X X X X X X h bias of the probability of each i bias of each feature feature epected in 8

9 RBM: Model Learning h 1 h 2... h 3 h H 1 Hidden variables 2... Visible variables D Derivative of the negative log-likelihood objective (stochastic gradient descent): log p( (t) ) E( t, h) = E h (t) E(, h) E,h Given a set of i.i.d. training eamples we want to minimie the average negative log-likelihood: 1 T l(f( (t) ) = 1 T log p( (t) ) t t Remember: log p( (t) ) = log p( (t), h) h = log h ep( E t, h ) Z = log ep( E t, h ) h log Z Positive phase Negative phase Hard to compute Restricted Boltmann machines Tutorial - Chris Maddison RBM: Contrastive Divergence Key idea behind Contrastive Divergence: Replace the epectation by a point estimate at Obtain the point by Gibbs sampling Start sampling chain at (t) ~p(h ) ~p( h) (t) 1 k = negative sample 9

10 RBM: Contrastive Divergence Intuition: log p( (t) ) E( t, h) = E h (t) E(, h) E,h E h E( t, h) (t) E( t, h (t) ) E,h E(, h) E(, h) RBM: Contrastive Divergence Intuition: log p( (t) ) E( t, h) = E h (t) E(, h) E,h E h E( t, h) (t) E( t, h (t) ) E,h E(, h) E(, h) 10

11 RBM: Deriving Learning Rule Let us look at derivative of E(,h) for θ = W jk E(, h) = W jk W j,k h j k c k k b j h j jk = W W j,k h j k jk jk = h j k k j Remember: E, h = h T W c T b T h Hence: W E, h = h T RBM: Deriving Learning Rule Let us now derive E h E(,h) E h E(, h) W j,k = E h h j k = h j 0,1 h j k p(h j ) Hence: = k p h j = 1 E h W E, h = h() T h = p(h 1 = 1 ) p(h H = 1 ) = sigm(b + W) 11

12 RBM: Deriving Learning Rule (t) θ = W W W α W log p( (t) ) W α E h W E t, h (t) W α E h W E t, h (t) E,h W E, h E h W E, h W + α h( t ) t T h( ) T Learning rate RBM: CD-k Algorithm For each training eample t Generate a negative sample using k steps of Gibbs sampling, starting at the data point t Update model parameters: W W + α h( t ) t T h( ) T b b + α h( t ) h( ) c c + α (t) Go back to the first step until stopping criteria 12

13 RBM: CD-k Algorithm CD-k: contrastive divergence with k iterations of Gibbs sampling In general, the bigger k is, the less biased the estimate of the gradient will be In practice, k = 1 works pretty well for learning good features and for pre-training RBM: Persistent CD: Stochastic ML Estimator Idea: instead of initialiing the chain of t, initialie the chain to the negative sample of the last iteration ~p(h ) ~p( h) (t) 1 k = comes from the previous iteration negative sample 13

14 Variational Autoencoders (VAE) Autoencoders (Recap) Unsupervised approach for learning a lower-dimensional feature representation from unlabeled training data Reconstructed data Reconstructed input data Decoder Originally: Linear + nonlinearity (sigmoid) Later: Deep, fully-connected Later: ReLU CNN (upconv) Encoder - Decoder Features Input data Encoder Input data 14

15 Autoencoders (Recap) Train a model such that features can be used to reconstruct original data L2 Loss function 2 Doesn t use labels! Reconstructed data Reconstructed input data Decoder Encoder - Decoder Features Input data Encoder Input data Autoencoders (Recap) Loss function (Softma, etc) bird plane panther truck dog Predicted label y y Encoder can be used to initialie a supervised model Features Classifier Encoder Fine-tune encoder jointly with classifier Input data 15

16 Autoencoders (Recap) Autoencoders can reconstruct data, and can learn features to initialie a supervised model Reconstructed input data Decoder Features capture factors of variation in training data. Can we generate new data from an autoencoder? Features Encoder Input data Probabilistic spin on autoencoders - will let us sample from the model to generate data! Assume training data (i) N i=1 representation Z Sample from true conditional p θ (i) is generated from underlying unobserved (latent) Intuition: is an image, is latent factors used to generate : attributes, orientation, etc. Sample from true prior p θ 16

17 We want to estimate the true parameters θ of this generative model Sample from true conditional p θ Sample from true prior p θ (i) Decoder network How should we represent this model? Choose prior p() to be simple, e.g. Gaussian Conditional p( ) is comple (generates image) => represent with neural network We want to estimate the true parameters θ of this generative model Sample from true conditional p θ Sample from true prior p θ (i) Decoder network How train this model? Learn model parameters to maimie likelihood of training data p θ = න p θ p θ d 17

18 Data likelihood: p θ = න p θ p θ d Intractable to compute p( ) for every! Posterior density also intractable: p θ = p θ p θ p θ Solution: in addition to decoder network modeling p θ ( ), define additional encoder network q φ ( ) that approimates p θ ( ) Will see that this allows us to derive a lower bound on the data likelihood that is tractable, which we can optimie Since we re modeling probabilistic generation of data, encoder and decoder networks are probabilistic Mean and (diagonal) covariance of Mean and (diagonal) covariance of μ Σ μ Σ Encoder network q φ ( ) (parameter φ) Decoder network p θ ( ) (parameter θ) 18

19 Since we re modeling probabilistic generation of data, encoder and decoder networks are probabilistic Sample from ~N μ, Σ Sample from ~N μ, Σ μ Σ μ Σ Encoder network q φ ( ) (parameter φ) Decoder network p θ ( ) (parameter θ) Now equipped with our encoder and decoder networks, let s work out the (log) data likelihood: log p θ (i) = E ~qφ (i) log p θ (i) ( p θ (i) does not depend on ) = E log p θ i p θ () p θ (i) (Bayes Rule) = E log p θ i p θ () q φ ( (i) ) p θ (i) q φ ( (i) ) (Multiply by constant) = E log p θ i E log q φ i p θ + E log q φ i p θ i (Logarithms) = E log p θ i D KL (q φ i p θ ) + D KL (q φ i p θ i ) Decoder network gives p θ ( ), can compute estimate of this term through sampling This KL term (between Gaussians for encoder and prior) has nice closed-form solution! p θ ( ) intractable (saw earlier), can t compute this KL term. But we know KL divergence always >= 0. 19

20 Now equipped with our encoder and decoder networks, let s work out the (log) data likelihood: log p θ (i) = E ~qφ (i) log p θ (i) ( p θ (i) does not depend on ) = E log p θ i p θ () p θ (i) (Bayes Rule) = E log p θ i p θ () q φ ( (i) ) p θ (i) q φ ( (i) ) (Multiply by constant) = E log p θ i E log q φ i p θ + E log q φ i p θ i (Logarithms) = E log p θ i D KL (q φ i p θ ) + D KL (q φ i p θ i ) L( i, θ, φ) 0 Tractable lower bound which we can take gradient of and optimie! (p θ ( ) differentiable, KL term differentiable) Now equipped with our encoder and decoder networks, let s work out the (log) data likelihood: log p θ (i) = E ~qφ (i) log p θ (i) ( p θ (i) does not depend on ) = E log p θ i p θ () p θ (i) = E log p θ i p θ () q φ ( (i) ) p θ (i) q φ ( (i) ) = E log p θ i E log q φ i (Bayes Rule) p θ (Multiply by constant) + E log q φ i p θ i (Logarithms) = E log p θ i D KL (q φ i p θ ) + D KL (q φ i p θ i ) log p θ (i) L( i, θ, φ) Variational lower bound ( ELBO ) L( i, θ, φ) θ, φ = arg ma θ,φ i=1 0 L( i, θ, φ) Training: Maimie lower bound N 20

21 Putting it all together: maimiing the likelihood lower bound E log p θ i D KL (q φ i p θ ) L( i, θ, φ) Putting it all together: maimiing the likelihood lower bound E log p θ i D KL (q φ i p θ ) L( i, θ, φ) Sample from ~N μ, Σ Encoder network q φ ( ) Input data μ Σ 21

22 Putting it all together: maimiing the likelihood lower bound E log p θ i D KL (q φ i p θ ) L( i, θ, φ) Make approimate posterior distribution close to prior Encoder network q φ ( ) Input data μ Σ Putting it all together: maimiing the likelihood lower bound E log p θ i D KL (q φ i p θ ) L( i, θ, φ) Make approimate posterior distribution close to prior Encoder network q φ ( ) Input data Sample from ~N μ, Σ μ Σ 22

23 Putting it all together: maimiing the likelihood lower bound E log p θ i D KL (q φ i p θ ) L( i, θ, φ) Make approimate posterior distribution close to prior Decoder network p θ ( ) Encoder network q φ ( ) Input data μ Σ Sample from ~N μ, Σ μ Σ Putting it all together: maimiing the likelihood lower bound Maimie likelihood of original input being reconstructed Sample from ~N μ, Σ E log p θ i D KL (q φ i p θ ) L( i, θ, φ) Make approimate posterior distribution close to prior Decoder network p θ ( ) Encoder network q φ ( ) Input data μ Σ Sample from ~N μ, Σ μ Σ 23

24 Use decoder network. Now sample from prior! Sample from ~N μ, Σ Decoder network p θ ( ) μ Σ Sample from ~N 0, I Diagonal prior on => independent latent variables Degree of smile Different dimensions of encode interpretable factors of variation Vary 1 Vary 2 Head pose 24

25 Diagonal prior on => independent latent variables Degree of smile Different dimensions of encode interpretable factors of variation Vary 1 Also good feature representation that can be computed using q ɸ ( )! Vary 2 Head pose : Generating Data 3232 CIFAR-10 Labeled Faces in the Wild Figures (L) from Dirk Kingma et al. 2016; (R) from Anders Larsen et al

26 Probabilistic spin to traditional autoencoders => allows generating data Pros: - Principled approach to generative models - Allows inference of q( ), can be useful feature representation for other tasks Cons: - Maimies lower bound of likelihood: okay - Samples blurrier and lower quality compared to state-of-the-art (GANs) Active areas of research: - More fleible approimations, e.g. richer approimate posterior instead of diagonal Gaussian - Incorporating structure in latent variables Tools Python library for Bernoulli Restricted Boltmann Machines: sklearn.neural_network.bernoullirbm Python Keras for Variational auto-encoder Generative models (including RBM and VAE): Variational Auto-encoder: Tutorials: Codes: 26

27 References and Resources Goodfellow, Ian, et al. Deep learning. Vol. 1. Cambridge: MIT press, Diederik P Kingma, Ma Welling: Auto-Encoding Variational Bayes. ICLR 2014 Restricted Boltmann machines Tutorial - Chris Maddison: Geoffrey E. Hinton: Training products of eperts by minimiing contrastive divergence. Neural Computation (2002) Generative Models Lecture (Stanford University): Restricted Boltmann Machines (CMU): Hugo Larochelle s class on Neural Networks: Image Generation (ICLR 2018): THANK YOU!!! Hung Kim Chau University of Pittsburgh School of Computing and Information Department of Informatics and Networked Systems 135 N Bellefield Ave, Pittsburgh, PA, Phone: +1 (412)

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

Greedy Layer-Wise Training of Deep Networks

Greedy Layer-Wise Training of Deep Networks Greedy Layer-Wise Training of Deep Networks Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle NIPS 2007 Presented by Ahmed Hefny Story so far Deep neural nets are more expressive: Can learn

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Deep unsupervised learning

Deep unsupervised learning Deep unsupervised learning Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea Unsupervised learning In machine learning, there are 3 kinds of learning paradigm.

More information

Variational Autoencoder

Variational Autoencoder Variational Autoencoder Göker Erdo gan August 8, 2017 The variational autoencoder (VA) [1] is a nonlinear latent variable model with an efficient gradient-based training procedure based on variational

More information

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab

More information

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous

More information

Latent Variable Models

Latent Variable Models Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:

More information

Deep Generative Models. (Unsupervised Learning)

Deep Generative Models. (Unsupervised Learning) Deep Generative Models (Unsupervised Learning) CEng 783 Deep Learning Fall 2017 Emre Akbaş Reminders Next week: project progress demos in class Describe your problem/goal What you have done so far What

More information

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Learning Deep Architectures for AI. Part II - Vijay Chakilam Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model

More information

UNSUPERVISED LEARNING

UNSUPERVISED LEARNING UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training

More information

Generative models for missing value completion

Generative models for missing value completion Generative models for missing value completion Kousuke Ariga Department of Computer Science and Engineering University of Washington Seattle, WA 98105 koar8470@cs.washington.edu Abstract Deep generative

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

Chapter 20. Deep Generative Models

Chapter 20. Deep Generative Models Peng et al.: Deep Learning and Practice 1 Chapter 20 Deep Generative Models Peng et al.: Deep Learning and Practice 2 Generative Models Models that are able to Provide an estimate of the probability distribution

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

TUTORIAL PART 1 Unsupervised Learning

TUTORIAL PART 1 Unsupervised Learning TUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu Co-organizers: Honglak Lee, Yoshua Bengio, Geoff Hinton, Yann LeCun, Andrew

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 10-708 Probabilistic Graphical Models Homework 3 (v1.1.0) Due Apr 14, 7:00 PM Rules: 1. Homework is due on the due date at 7:00 PM. The homework should be submitted via Gradescope. Solution to each problem

More information

Auto-Encoding Variational Bayes

Auto-Encoding Variational Bayes Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower

More information

Lecture 14: Deep Generative Learning

Lecture 14: Deep Generative Learning Generative Modeling CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 14: Deep Generative Learning Density estimation Reconstructing probability density function using samples Bohyung Han

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Jakob Verbeek & Daan Wynen 206-09-22 Jakob Verbeek & Daan Wynen Unsupervised Neural Networks Outline Autoencoders Restricted) Boltzmann

More information

Bayesian Semi-supervised Learning with Deep Generative Models

Bayesian Semi-supervised Learning with Deep Generative Models Bayesian Semi-supervised Learning with Deep Generative Models Jonathan Gordon Department of Engineering Cambridge University jg801@cam.ac.uk José Miguel Hernández-Lobato Department of Engineering Cambridge

More information

Deep Belief Networks are compact universal approximators

Deep Belief Networks are compact universal approximators 1 Deep Belief Networks are compact universal approximators Nicolas Le Roux 1, Yoshua Bengio 2 1 Microsoft Research Cambridge 2 University of Montreal Keywords: Deep Belief Networks, Universal Approximation

More information

CSC321 Lecture 20: Autoencoders

CSC321 Lecture 20: Autoencoders CSC321 Lecture 20: Autoencoders Roger Grosse Roger Grosse CSC321 Lecture 20: Autoencoders 1 / 16 Overview Latent variable models so far: mixture models Boltzmann machines Both of these involve discrete

More information

Contrastive Divergence

Contrastive Divergence Contrastive Divergence Training Products of Experts by Minimizing CD Hinton, 2002 Helmut Puhr Institute for Theoretical Computer Science TU Graz June 9, 2010 Contents 1 Theory 2 Argument 3 Contrastive

More information

Variational Autoencoders

Variational Autoencoders Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 7 Unsupervised Learning Statistical Perspective Probability Models Discrete & Continuous: Gaussian, Bernoulli, Multinomial Maimum Likelihood Logistic

More information

Auto-Encoding Variational Bayes. Stochastic Backpropagation and Approximate Inference in Deep Generative Models

Auto-Encoding Variational Bayes. Stochastic Backpropagation and Approximate Inference in Deep Generative Models Auto-Encoding Variational Bayes Diederik Kingma and Max Welling Stochastic Backpropagation and Approximate Inference in Deep Generative Models Danilo J. Rezende, Shakir Mohamed, Daan Wierstra Neural Variational

More information

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Speaker Representation and Verification Part II. by Vasileios Vasilakakis Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation

More information

Learning Deep Architectures

Learning Deep Architectures Learning Deep Architectures Yoshua Bengio, U. Montreal Microsoft Cambridge, U.K. July 7th, 2009, Montreal Thanks to: Aaron Courville, Pascal Vincent, Dumitru Erhan, Olivier Delalleau, Olivier Breuleux,

More information

Introduction to Restricted Boltzmann Machines

Introduction to Restricted Boltzmann Machines Introduction to Restricted Boltzmann Machines Ilija Bogunovic and Edo Collins EPFL {ilija.bogunovic,edo.collins}@epfl.ch October 13, 2014 Introduction Ingredients: 1. Probabilistic graphical models (undirected,

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch Psychology 452 Week 12: Deep Learning What Is Deep Learning? Preliminary Ideas (that we already know!) The Restricted Boltzmann Machine (RBM) Many Layers of RBMs Pros and Cons of Deep Learning Course Structure

More information

Restricted Boltzmann Machines for Collaborative Filtering

Restricted Boltzmann Machines for Collaborative Filtering Restricted Boltzmann Machines for Collaborative Filtering Authors: Ruslan Salakhutdinov Andriy Mnih Geoffrey Hinton Benjamin Schwehn Presentation by: Ioan Stanculescu 1 Overview The Netflix prize problem

More information

How to do backpropagation in a brain

How to do backpropagation in a brain How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep

More information

Gaussian Cardinality Restricted Boltzmann Machines

Gaussian Cardinality Restricted Boltzmann Machines Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Gaussian Cardinality Restricted Boltzmann Machines Cheng Wan, Xiaoming Jin, Guiguang Ding and Dou Shen School of Software, Tsinghua

More information

The connection of dropout and Bayesian statistics

The connection of dropout and Bayesian statistics The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning

Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning Diederik (Durk) Kingma Danilo J. Rezende (*) Max Welling Shakir Mohamed (**) Stochastic Gradient Variational Inference Bayesian

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 9: Expectation Maximiation (EM) Algorithm, Learning in Undirected Graphical Models Some figures courtesy

More information

Density estimation. Computing, and avoiding, partition functions. Iain Murray

Density estimation. Computing, and avoiding, partition functions. Iain Murray Density estimation Computing, and avoiding, partition functions Roadmap: Motivation: density estimation Understanding annealing/tempering NADE Iain Murray School of Informatics, University of Edinburgh

More information

Nonparametric Inference for Auto-Encoding Variational Bayes

Nonparametric Inference for Auto-Encoding Variational Bayes Nonparametric Inference for Auto-Encoding Variational Bayes Erik Bodin * Iman Malik * Carl Henrik Ek * Neill D. F. Campbell * University of Bristol University of Bath Variational approximations are an

More information

Knowledge Extraction from DBNs for Images

Knowledge Extraction from DBNs for Images Knowledge Extraction from DBNs for Images Son N. Tran and Artur d Avila Garcez Department of Computer Science City University London Contents 1 Introduction 2 Knowledge Extraction from DBNs 3 Experimental

More information

Deep Learning Basics Lecture 8: Autoencoder & DBM. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 8: Autoencoder & DBM. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 8: Autoencoder & DBM Princeton University COS 495 Instructor: Yingyu Liang Autoencoder Autoencoder Neural networks trained to attempt to copy its input to its output Contain

More information

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Algorithms other than SGD. CS6787 Lecture 10 Fall 2017

Algorithms other than SGD. CS6787 Lecture 10 Fall 2017 Algorithms other than SGD CS6787 Lecture 10 Fall 2017 Machine learning is not just SGD Once a model is trained, we need to use it to classify new examples This inference task is not computed with SGD There

More information

Variational Autoencoders (VAEs)

Variational Autoencoders (VAEs) September 26 & October 3, 2017 Section 1 Preliminaries Kullback-Leibler divergence KL divergence (continuous case) p(x) andq(x) are two density distributions. Then the KL-divergence is defined as Z KL(p

More information

Learning Tetris. 1 Tetris. February 3, 2009

Learning Tetris. 1 Tetris. February 3, 2009 Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are

More information

Representational Power of Restricted Boltzmann Machines and Deep Belief Networks. Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber

Representational Power of Restricted Boltzmann Machines and Deep Belief Networks. Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber Representational Power of Restricted Boltzmann Machines and Deep Belief Networks Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber Introduction Representational abilities of functions with some

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

Part 2. Representation Learning Algorithms

Part 2. Representation Learning Algorithms 53 Part 2 Representation Learning Algorithms 54 A neural network = running several logistic regressions at the same time If we feed a vector of inputs through a bunch of logis;c regression func;ons, then

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions - Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions Simon Luo The University of Sydney Data61, CSIRO simon.luo@data61.csiro.au Mahito Sugiyama National Institute of

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

Au-delà de la Machine de Boltzmann Restreinte. Hugo Larochelle University of Toronto

Au-delà de la Machine de Boltzmann Restreinte. Hugo Larochelle University of Toronto Au-delà de la Machine de Boltzmann Restreinte Hugo Larochelle University of Toronto Introduction Restricted Boltzmann Machines (RBMs) are useful feature extractors They are mostly used to initialize deep

More information

Deep Learning Autoencoder Models

Deep Learning Autoencoder Models Deep Learning Autoencoder Models Davide Bacciu Dipartimento di Informatica Università di Pisa Intelligent Systems for Pattern Recognition (ISPR) Generative Models Wrap-up Deep Learning Module Lecture Generative

More information

Generative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist Presentation at Berkeley Artificial Intelligence Lab,

Generative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist Presentation at Berkeley Artificial Intelligence Lab, Generative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist Presentation at Berkeley Artificial Intelligence Lab, 2016-08-31 Generative Modeling Density estimation Sample generation

More information

Need for Sampling in Machine Learning. Sargur Srihari

Need for Sampling in Machine Learning. Sargur Srihari Need for Sampling in Machine Learning Sargur srihari@cedar.buffalo.edu 1 Rationale for Sampling 1. ML methods model data with probability distributions E.g., p(x,y; θ) 2. Models are used to answer queries,

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

Learning Energy-Based Models of High-Dimensional Data

Learning Energy-Based Models of High-Dimensional Data Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero www.cs.toronto.edu/~hinton/energybasedmodelsweb.htm Discovering causal structure as a goal

More information

Basic Principles of Unsupervised and Unsupervised

Basic Principles of Unsupervised and Unsupervised Basic Principles of Unsupervised and Unsupervised Learning Toward Deep Learning Shun ichi Amari (RIKEN Brain Science Institute) collaborators: R. Karakida, M. Okada (U. Tokyo) Deep Learning Self Organization

More information

Speech and Language Processing

Speech and Language Processing Speech and Language Processing Lecture 5 Neural network based acoustic and language models Information and Communications Engineering Course Takahiro Shinoaki 08//6 Lecture Plan (Shinoaki s part) I gives

More information

Variational Inference in TensorFlow. Danijar Hafner Stanford CS University College London, Google Brain

Variational Inference in TensorFlow. Danijar Hafner Stanford CS University College London, Google Brain Variational Inference in TensorFlow Danijar Hafner Stanford CS 20 2018-02-16 University College London, Google Brain Outline Variational Inference Tensorflow Distributions VAE in TensorFlow Variational

More information

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always

More information

Deep Boltzmann Machines

Deep Boltzmann Machines Deep Boltzmann Machines Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel University of Illinois Urbana Champaign agoel10@illinois.edu December 2, 2016 Ruslan Salakutdinov and Geoffrey E. Hinton Amish

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable

More information

arxiv: v2 [cs.ne] 22 Feb 2013

arxiv: v2 [cs.ne] 22 Feb 2013 Sparse Penalty in Deep Belief Networks: Using the Mixed Norm Constraint arxiv:1301.3533v2 [cs.ne] 22 Feb 2013 Xanadu C. Halkias DYNI, LSIS, Universitè du Sud, Avenue de l Université - BP20132, 83957 LA

More information

arxiv: v4 [cs.lg] 16 Apr 2015

arxiv: v4 [cs.lg] 16 Apr 2015 REWEIGHTED WAKE-SLEEP Jörg Bornschein and Yoshua Bengio Department of Computer Science and Operations Research University of Montreal Montreal, Quebec, Canada ABSTRACT arxiv:1406.2751v4 [cs.lg] 16 Apr

More information

Modeling Documents with a Deep Boltzmann Machine

Modeling Documents with a Deep Boltzmann Machine Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava, Ruslan Salakhutdinov & Geoffrey Hinton UAI 2013 Presented by Zhe Gan, Duke University November 14, 2014 1 / 15 Outline Replicated Softmax

More information

Autoencoders and Score Matching. Based Models. Kevin Swersky Marc Aurelio Ranzato David Buchman Benjamin M. Marlin Nando de Freitas

Autoencoders and Score Matching. Based Models. Kevin Swersky Marc Aurelio Ranzato David Buchman Benjamin M. Marlin Nando de Freitas On for Energy Based Models Kevin Swersky Marc Aurelio Ranzato David Buchman Benjamin M. Marlin Nando de Freitas Toronto Machine Learning Group Meeting, 2011 Motivation Models Learning Goal: Unsupervised

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,

More information

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

GENERATIVE ADVERSARIAL LEARNING

GENERATIVE ADVERSARIAL LEARNING GENERATIVE ADVERSARIAL LEARNING OF MARKOV CHAINS Jiaming Song, Shengjia Zhao & Stefano Ermon Computer Science Department Stanford University {tsong,zhaosj12,ermon}@cs.stanford.edu ABSTRACT We investigate

More information

Variational Auto Encoders

Variational Auto Encoders Variational Auto Encoders 1 Recap Deep Neural Models consist of 2 primary components A feature extraction network Where important aspects of the data are amplified and projected into a linearly separable

More information

Robust Classification using Boltzmann machines by Vasileios Vasilakakis

Robust Classification using Boltzmann machines by Vasileios Vasilakakis Robust Classification using Boltzmann machines by Vasileios Vasilakakis The scope of this report is to propose an architecture of Boltzmann machines that could be used in the context of classification,

More information

Natural Gradients via the Variational Predictive Distribution

Natural Gradients via the Variational Predictive Distribution Natural Gradients via the Variational Predictive Distribution Da Tang Columbia University datang@cs.columbia.edu Rajesh Ranganath New York University rajeshr@cims.nyu.edu Abstract Variational inference

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 14. Sinan Kalkan

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 14. Sinan Kalkan CENG 783 Special topics in Deep Learning AlchemyAPI Week 14 Sinan Kalkan Today Hopfield Networks Boltzmann Machines Deep BM, Restricted BM Generative Adversarial Networks Variational Auto-encoders Autoregressive

More information

Replicated Softmax: an Undirected Topic Model. Stephen Turner

Replicated Softmax: an Undirected Topic Model. Stephen Turner Replicated Softmax: an Undirected Topic Model Stephen Turner 1. Introduction 2. Replicated Softmax: A Generative Model of Word Counts 3. Evaluating Replicated Softmax as a Generative Model 4. Experimental

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Variational Inference via Stochastic Backpropagation

Variational Inference via Stochastic Backpropagation Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Generative Models for Sentences

Generative Models for Sentences Generative Models for Sentences Amjad Almahairi PhD student August 16 th 2014 Outline 1. Motivation Language modelling Full Sentence Embeddings 2. Approach Bayesian Networks Variational Autoencoders (VAE)

More information

Deep latent variable models

Deep latent variable models Deep latent variable models Pierre-Alexandre Mattei IT University of Copenhagen http://pamattei.github.io @pamattei 19 avril 2018 Séminaire de statistique du CNAM 1 Overview of talk A short introduction

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

ECE521 Lecture 7/8. Logistic Regression

ECE521 Lecture 7/8. Logistic Regression ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression

More information

An efficient way to learn deep generative models

An efficient way to learn deep generative models An efficient way to learn deep generative models Geoffrey Hinton Canadian Institute for Advanced Research & Department of Computer Science University of Toronto Joint work with: Ruslan Salakhutdinov, Yee-Whye

More information

Representation of undirected GM. Kayhan Batmanghelich

Representation of undirected GM. Kayhan Batmanghelich Representation of undirected GM Kayhan Batmanghelich Review Review: Directed Graphical Model Represent distribution of the form ny p(x 1,,X n = p(x i (X i i=1 Factorizes in terms of local conditional probabilities

More information

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional

More information