Supplementary Material of High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep Models

Size: px
Start display at page:

Download "Supplementary Material of High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep Models"

Transcription

1 Supplementary Material of High-Order Stochastic Gradient hermostats for Bayesian Learning of Deep Models Chunyuan Li, Changyou Chen, Kai Fan 2 and Lawrence Carin Department of Electrical and Computer Engineering, Duke University 2 Computational Biology and Bioinformatics, Duke University chunyuan.li@duke.edu, cchangyou@gmail.com, kai.fan@duke.edu, lcarin@duke.edu A he proof of Lemma Proof In msgnh, X t = (θ t, p t, ξ t. he update equations with a Euler integrator are: θ t+ = θ t + p t h p t+ = p t θ Ũ t (θ t+ h diag(ξ t p t h + 2Dζ t+ ξ t+ = ξ t + (p t+ p t+ h Based on the update equations, it is easily seen that the corresponding Kolmogorov operator P h l for msgnh is P h l = e hl e hl2 e hl3, ( where L (p p ξ, L 2 diag(ξp t p θ Ũ l (θ p + 2DI : p p, and L 3 p θ. Using the BCH formula, we have P l h = e h(l+l2+l3 + O(h 2. (2 On the other hand, the local generator of msgnh at the t-th iteration can be seen to be: L t = L + L 2 + L 3, (3 where L and L 3 are defined previously, and L 2 = diag(ξp p θ Ũ l (θ p +2DI : p p. According to the Kolmogorov s backward equation, we have E[f(X t+ ] = e h L t f(x t. (4 Substitute (4 into (2 and use the fact that p = p t + O(h by aylor expansion, it is easily seen P l h = e h L t+o(h 2 + O(h 2 = e h L t + O(h 2. his completes the proof. B he proof of Lemma 2 Proof In symmetric splitting scheme for msgnh, according to the splitting in (4 in the main text, the generator L t is split into the following sub-generators which can be solved analytically: L l = L A + L B + L Ol, where A L A = p θ + (p p ξ, B L B = diag(ξp t p, O l L Ol = θ Ũ l (θ p + 2D : p p. Copyright c 26, Association for the Advancement of Artificial Intelligence ( All rights reserved. he corresponding Kolmogorov operator P h l for the splitting integrator can be seen to be: P l h e h 2 L A e h 2 L B e hl O l e h 2 L B e h 2 L A, In the following we use the Baker Campbell Hausdorff (BCH formula (Rossmann to show that P h l is a 2ndorder integrator. Specifically, e h 2 A e h 2 B = e h 2 A+ h 2 B+ 8 [A,B]+ ([A,[A,B]]+[B,[B,A]]+ (5 96 = e h 2 A+ h 2 B+ 8 [A,B] + O(h 3, (6 where [X, Y ] XY Y X is the commutator of X and Y, (5 follows from the BCH formula, and (6 follows by moving high order terms O(h 3 out of the exponential map using aylor expansion. Similarly, for the other composition, we have e ho l e h 2 A e h 2 B = e ho l (e h 2 A+ h 2 B+ 8 [A,B] + O(h 3 =e ho l+ h 2 A+ h 2 B+ 8 [A,B]+ 2 [ho l, h 2 A+ h 2 B+ 8 [A,B]] + O(h 3 =e ho l+ h 2 A+ h 2 e h 2 A e ho l e h 2 A e h 2 B =e h 2 A ( e ho l+ h 2 A+ h 2 =e ho l+ha+ h 2 B+ 8 [A,B]+ 4 [O l,a]+ 4 [Ol,B] + O(h 3 B+ 8 [A,B]+ 4 [O l,a]+ 4 [Ol,B] + O(h 3 B+ 4 [A,B]+ 2 [Ol,B] + O(h 3 As a result P l h e h 2 B e h 2 A e hz e h 2 A e h 2 B =e h 2 B ( e ho l+ha+ h 2 =e ho l+ha+hb+ 4 [A,B]+ =e h(b+a+o l + O(h 3 B+ 4 [A,B]+ 2 [Ol,B] + O(h 3 2 [O l,b]+ 4 =e h(l+ V l + O(h 3 = e h L l + O(h 3. his completes the proof. [B,A]+ 4 [B,O l]+ 8 [B,B] + O(h 3 C More details on Lemma 3 Our justification of the symmetric splitting integrator is based on Lemma 3, which is a simplification of the main

2 theorems in (Chen, Ding, and Carin 25. For completeness, we give details of their main theorems in this section. o recap notation, for an Itó diffusion with an invariant measure ρ(x, the posterior average is defined as: φ φ(xρ(xdx for some test function φ(x of interest. X Given samples (x t t= from a SG-MCMC, we use the sample average ˆφ t= φ(x t to approximate φ. In addition, we define an operator for the t-th iteration as: V t ( θ Ũ t θ U p. (7 heorem and heorem 2 summarize the convergence of a SG-MCMC algorithm with a K-th order integrator with respect to the Bias and MSE, under certain assumptions. Please refer to (Chen, Ding, and Carin 25 for detailed proofs. heorem Let be the operator norm. he bias of an SG-MCMC with a Kth-order integrator at time = h can be bounded as: E ˆφ φ = O ( h + t E V t + h K. heorem 2 For a smooth test function φ, the MSE of an SG-MCMC with a Kth-order integrator at time = h is bounded, for some C > independent of (, h, as E ( ˆφ φ 2 C ( B mse C ( t E V t 2 t E Vt 2 + h + h + K We simplify heorem and heorem ( 2 to Lemma 3, where the functions B bias O h + t E Vt and, independent of the order of an integrator. In addition, from heorem and heorem 2, we can get the optimal convergence rate of a SG-MCMC algorithm with respect to the Bias and MSE by optimizing the bounds. Specifically, the optimal convergence rates of the Bias for the Euler integrator is /2 with optimal stepsize /2, while this is 2/3 for the symmetric splitting operator with optimal stepsize /3. For the MSE, the rate for the Euler integrator is 2/3 with optimal stepsize /3, compared to 4/5 with optimal stepsize /5 for the symmetric splitting integrator. D Latent Dirichlet Allocation Following (Ding et al., this section describes the semicollapsed posterior of the LDA model, and the Expanded- Natural representation of the prabability simplexes used in (Patterson and eh 23. Let W = {w jv } be the observed words, Z = {z jv } be the topic indicator variables, where j indexes the documents and v indexes the words. Let (π kw be the topic-word distribution, n jkw be the number of word w in document j allocated to topic k, means marginal sum, i.e., n jk = w n dkw. he semi-collapsed posterior of the LDA model is J p(w, Z, π α, τ = p(π τ p(w j, z j α, π, (8 j=. where J is the number of documents, α is the parameter in the Dirichlet prior of the topic distribution for each document, τ is the parameter in the prior of π, and p(w j, z j α, π = K k= Γ (α + n jk Γ (α W w= π n jkw kw. (9 he Expanded-Natural representation of the simplexes π k s in (Patterson and eh 23 is used, where π kw = e θ kw w eθ kw. ( Following (Ding et al., a Gaussian prior on θ kw is adopted, p(θ kw τ = {β, σ} = N (θ kw, σ 2. he stochastic gradient of the log-posterior of parameter θ kw with a mini-batch S t becomes, Ũ(θ log p(θ W, τ, α = ( θ kw θ kw = β θ kw σ 2 + J E zj w S t j,θ,α (n jkw π kw n jk. i S t for the t-th iteration, where S t {, 2,, J}, and is the cardinality of a set. When σ =, we obtain the same stochastic gradient as in (Patterson and eh 23 using Riemannian manifold. o calculate the expectation term, we use the same Gibbs sampling method as in (Patterson and eh 23, ( α + n \v jk e θ kw jv p(z jv = k w d, θ, α =, (2 k (α + n \v jk e θ k w jv where \v denotes the count excluding the v-th topic assignment variable. he expectation is estimated by the samples. Running time For the results reported in the main text, to achieve reported results of LDA on ICML dataset,, and Gibbs take iterations, the running times are 6.99, 8.55 and 56.2 second, respectively. E Logistic Regression For each data {x, y}, x R P is the input data, y {, } is the label. Logistic Regression gives the prabability P (y x g θ (x = + exp ( (W x + c. (3 A Gaussian prior is placed on model parameters θ = {W, c} N (, σ 2 I. We set σ 2 = in our experiment. We study the effectiveness of for different h and D. Learning curves of test errors and training loglikelihoods are shown in Fig.. Generally, the performances of the proposed are consistently more stable than the and SGHMC, across varying h and D.

3 est Error (% raining Log Likelihood h =.5, D = # 5 h =.5, D = est Error (% raining Log Likelihood h =.5, D = # 4 h =.5, D = SGHMC-E.4 SGHMC-S est Error (% raining Log Likelihood h =., D = # 4 h =., D = est Error (% raining Log Likelihood h =.5, D = # 4 h =.5, D = SGHMC-E SGHMC-S Figure : Learning curves for logistic regression on a9a dataset for varying h and D. Column share the same h; column 2-4 share the same D. op row is testing error; bottom row is training log-likelihood h = 2e-4, # h = e-4, # h = 5e-5, # h = 2e-4, # h = e-4, # h = 5e-5, # Figure 2: Learning curves for FNN (ReLU link on MNIS dataset for varying stepsize h. Step size decreases top-down # # # # # # Figure 3: Learning curves for FNN (Sigmoid link on MNIS dataset for varying network depth. Depth increases top-down. Furthermore, converges faster than msgnh- E, especially at the beginning of learning. Comparing column (fixing h, varying D, is more robust to the choice of diffusion factor D. Comparing column 2-4 (fixing D, varying h, larger h potentially brings larger gradient-estimation errors and numerical errors. msgnh is shown to significantly outperform others when h is large. F Deep Neural Networks F. Sensitivity to Step-size For feedforward neural nets (FNN with a 2-layer model of - ReLU, we study the performance of for different h. D = 5. We test a wide range of h, and show in Fig. 2 the learning curves of the test accuracy and training negative log-likelihood for h = 2 4, 4, 5 5. is less sensitive to step size; it maintains fast convergence when h is large, while significantly slows down.

4 (% h =., D = (% h =.5, D = Figure 4: Learning curves of CNN for different step sizes. F.2 Sigmoid Activation Function We compare different methods in the case of sigmoid link. Similar to the main paper, we test the FNNs with varying depths, e.g., {, 2, 3}, respectively. We set D = and h = 3. Fig. 3 displays learning curves on testing accuracy and training negative log-likelihood. he gaps between and becomes larger in deeper models. When a 3-layer network is emplyed, msgnh- E fails at the first 5 epochs, whilst converges pretty well. Moreover, converges more accurately, yielding an accuracy 97.36%, while gives 96.86%. his manifests the importance of numerical accuracy in SG-MCMC for practical applications, and is desirable in this regard. F.3 Comparison with Other Methods Based on network size 4-4 with ReLU, we also compared with a recent state-of-the-art inference method for neural nets, Bayes by Backprop (BBB (Blundell et al. 25. h = 2 4 and D = 6. he comparison is in able. BBB and SGD are results taken from (Blundell et al. 25. able : Classification accuracy on MNIS. Method Accuracy (% BBB 98.8 SGD 98.7 F.4 Convolutional Neural Networks We also performed the comparison with a standard network convolutional neural networks, LeNet (Jarrett et al. 29, on MNIS dataset. It is 2-layer convolution networks followed by a 2-layer fully-connected FNN, each containing 2 hidden nodes that uses ReLU. Both convolutional layers use 5 5 filter size with 32 and 64 channels, respectively, 2 2 max pooling are used after each convolutional layer. 4 epochs are used, and L is set to 2. In Fig. 4, we tested the stepsizes h = 4 and h = 5 4 for and, and D = 5. Again, under the same network architecture, CNN trained with converges fater than. In particular, when a large stepsize used, has a significant improvement over. G Deep Poisson Factor Analysis G. Model Specification We first provide model details of Deep Poisson Factor Analysis (DPFA (Gan et al. 25. Given a discrete matrix W Z V + J containing counts from J documents and V words, Poisson factor analysis (PFA (Zhou et al. assumes the entries of W are summations of K < latent counts, each produced by a latent factor (in the case of topic modeling, a hidden topic. For W, the generative process of DPFA with L-layer Sigmoid Belief Networks (SBN, is as follows p(h (L k L = g(b (L k L (4 p(h (l n = g ( (w (l h (l+ n + c (l (5 n = h(l+ W Pois(Φ(Ψ h ( (6 where g( is the Sigmoid link. Equation (4 and (5 define Deep Sigmoid Belief Networks (DSBN. Φ is the factor loading matrix. Each column of Φ, φ k V, encodes the relative importance of each word in topic k, with V representing the V -dimensional simplex. Θ R K J + is the factor score matrix. Each column ψ j, contains relative topic intensities specific to document j. h (l {, } K is a latent binary feature vector, which defines whether certain topics are associated with documents. he factor scores for document j at bottom layer are the element-wise multiplication ψ j h (. DPFA is constructed by placing Dirichlet priors on Φ k and gamma priors on ψ j. his is summarized as, K x vj = x vjk, x vjk Pois(φ vk ψ kj h k (7 k= with priors specified as φ k Dir(a φ,..., a φ, ψ kn Gamma(r k, p k /( p k, r k Gamma(γ, /c, and γ Gamma(e, /f. G.2 Model Inference Following (Gan et al. 25, we use stochastic gradient Riemannian Langevin dynamics (SGRLD (Patterson and eh 23 to sample the topic-word distributions {φ k }. ms- GNH is used to sample the parameters in DSBN, i.e., θ = ( W (l, c (l, b (L, where l =,, L. Specifically, the stochastic gradients of W (l and c (l evaluated on a mini-batch of data (denote S as the index set of a minibatch are calculated, Ũ w (l Ũ c (l = J S = J S i D i D E (l h i,h (l+ i E (l h i,h (l+ i [( σ (l i h(l i ] h (l+ i, (8 [ ] σ (l i h(l i, (9 where σ (l i = σ((w(l h (l+ i + c (l, and the expectation is taken over posteriors. Monte Carlo integration is used to approximate this quantity.

5 References Blundell, C.; Cornebise, J.; Kavukcuoglu, K.; and Wierstra, D. 25. Weight uncertainty in neural networks. In ICML. Chen, C.; Ding, N.; and Carin, L. 25. On the convergence of stochastic gradient MCMC algorithms with high-order integrators. In NIPS. Ding, N.; Fang, Y.; Babbush, R.; Chen, C.; Skeel, R. D.; and Neven, H.. Bayesian sampling using stochastic gradient thermostats. In NIPS. Gan, Z.; Chen, C.; Henao, R.; Carlson, D.; and Carin, L. 25. Scalable deep Poisson factor analysis for topic modeling. In ICML. Jarrett, K.; Kavukcuoglu, K.; Ranzato, M.; and LeCun, Y. 29. What is the best multi-stage architecture for object recognition? In ICCV. Patterson, S., and eh, Y. W. 23. Stochastic gradient Riemannian Langevin dynamics on the probability simplex. In NIPS. Rossmann, W.. Lie Groups An Introduction hrough Linear Groups. Oxford Graduate exts in Mathematics, Oxford Science Publications. Zhou, M.; Hannah, L.; Dunson, D. B.; and Carin, L.. Beta-negative Binomial process and Poisson factor analysis. In AISAS.

High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep Models

High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep Models High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep Models Chunyuan Li, Changyou Chen, Kai Fan 2 and Lawrence Carin Department of Electrical and Computer Engineering, Duke University

More information

High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep Models

High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep Models High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep Models Chunyuan Li, Changyou Chen, Kai Fan 2 and Lawrence Carin Department of Electrical and Computer Engineering, Duke University

More information

Scalable Deep Poisson Factor Analysis for Topic Modeling: Supplementary Material

Scalable Deep Poisson Factor Analysis for Topic Modeling: Supplementary Material : Supplementary Material Zhe Gan ZHEGAN@DUKEEDU Changyou Chen CHANGYOUCHEN@DUKEEDU Ricardo Henao RICARDOHENAO@DUKEEDU David Carlson DAVIDCARLSON@DUKEEDU Lawrence Carin LCARIN@DUKEEDU Department of Electrical

More information

Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization

Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization Changyou Chen, David Carlson, Zhe Gan, Chunyuan Li, Lawrence Carin May 2, 2016 1 Changyou Chen Bridging the Gap between Stochastic

More information

On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators

On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators Changyou Chen Nan Ding awrence Carin Dept. of Electrical and Computer Engineering, Duke University, Durham, NC, USA

More information

Introduction to Stochastic Gradient Markov Chain Monte Carlo Methods

Introduction to Stochastic Gradient Markov Chain Monte Carlo Methods Introduction to Stochastic Gradient Markov Chain Monte Carlo Methods Changyou Chen Department of Electrical and Computer Engineering, Duke University cc448@duke.edu Duke-Tsinghua Machine Learning Summer

More information

Stochastic Gradient MCMC with Stale Gradients

Stochastic Gradient MCMC with Stale Gradients Stochastic Gradient MCMC with Stale Gradients Changyou Chen Nan Ding Chunyuan Li Yizhe Zhang Lawrence Carin Dept. of Electrical and Computer Engineering, Duke University, Durham, NC, USA Google Inc., Venice,

More information

Scalable Non-linear Beta Process Factor Analysis

Scalable Non-linear Beta Process Factor Analysis Scalable Non-linear Beta Process Factor Analysis Kai Fan Duke University kai.fan@stat.duke.edu Katherine Heller Duke University kheller@stat.duke.com Abstract We propose a non-linear extension of the factor

More information

Introduction to Deep Neural Networks

Introduction to Deep Neural Networks Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic

More information

Afternoon Meeting on Bayesian Computation 2018 University of Reading

Afternoon Meeting on Bayesian Computation 2018 University of Reading Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of

More information

Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks

Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence AAAI-6 Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks Chunyuan Li, Changyou Chen, David Carlson and

More information

Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization

Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization Changyou Chen David Carlson Zhe Gan Chunyuan i awrence Carin Department of Electrical and Computer Engineering, Duke University

More information

Variational Inference via Stochastic Backpropagation

Variational Inference via Stochastic Backpropagation Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation

More information

Sparse Stochastic Inference for Latent Dirichlet Allocation

Sparse Stochastic Inference for Latent Dirichlet Allocation Sparse Stochastic Inference for Latent Dirichlet Allocation David Mimno 1, Matthew D. Hoffman 2, David M. Blei 1 1 Dept. of Computer Science, Princeton U. 2 Dept. of Statistics, Columbia U. Presentation

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d

More information

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Lecture 14: Deep Generative Learning

Lecture 14: Deep Generative Learning Generative Modeling CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 14: Deep Generative Learning Density estimation Reconstructing probability density function using samples Bohyung Han

More information

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Hamiltonian Monte Carlo for Scalable Deep Learning

Hamiltonian Monte Carlo for Scalable Deep Learning Hamiltonian Monte Carlo for Scalable Deep Learning Isaac Robson Department of Statistics and Operations Research, University of North Carolina at Chapel Hill isrobson@email.unc.edu BIOS 740 May 4, 2018

More information

Variational Bayes on Monte Carlo Steroids

Variational Bayes on Monte Carlo Steroids Variational Bayes on Monte Carlo Steroids Aditya Grover, Stefano Ermon Department of Computer Science Stanford University {adityag,ermon}@cs.stanford.edu Abstract Variational approaches are often used

More information

Deep Feedforward Networks. Seung-Hoon Na Chonbuk National University

Deep Feedforward Networks. Seung-Hoon Na Chonbuk National University Deep Feedforward Networks Seung-Hoon Na Chonbuk National University Neural Network: Types Feedforward neural networks (FNN) = Deep feedforward networks = multilayer perceptrons (MLP) No feedback connections

More information

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

arxiv: v3 [cs.lg] 6 Nov 2015

arxiv: v3 [cs.lg] 6 Nov 2015 Bayesian Dark Knowledge Anoop Korattikara, Vivek Rathod, Kevin Murphy Google Research {kbanoop, rathodv, kpmurphy}@google.com Max Welling University of Amsterdam m.welling@uva.nl arxiv:1506.04416v3 [cs.lg]

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

arxiv: v1 [cs.lg] 14 Jun 2015

arxiv: v1 [cs.lg] 14 Jun 2015 Bayesian Dark Knowledge Anoop Korattikara, Vivek Rathod, Kevin Murphy Google Inc. {kbanoop, rathodv, kpmurphy}@google.com Max Welling University of Amsterdam m.welling@uva.nl arxiv:1506.04416v1 [cs.lg]

More information

Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning

Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning Diederik (Durk) Kingma Danilo J. Rezende (*) Max Welling Shakir Mohamed (**) Stochastic Gradient Variational Inference Bayesian

More information

Nonlinear Statistical Learning with Truncated Gaussian Graphical Models

Nonlinear Statistical Learning with Truncated Gaussian Graphical Models Nonlinear Statistical Learning with Truncated Gaussian Graphical Models Qinliang Su, Xuejun Liao, Changyou Chen, Lawrence Carin Department of Electrical & Computer Engineering, Duke University Presented

More information

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Dynamic Data Modeling, Recognition, and Synthesis Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Contents Introduction Related Work Dynamic Data Modeling & Analysis Temporal localization Insufficient

More information

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Introduction to (Convolutional) Neural Networks

Introduction to (Convolutional) Neural Networks Introduction to (Convolutional) Neural Networks Philipp Grohs Summer School DL and Vis, Sept 2018 Syllabus 1 Motivation and Definition 2 Universal Approximation 3 Backpropagation 4 Stochastic Gradient

More information

Bayesian Sampling Using Stochastic Gradient Thermostats

Bayesian Sampling Using Stochastic Gradient Thermostats Bayesian Sampling Using Stochastic Gradient Thermostats Nan Ding Google Inc. dingnan@google.com Youhan Fang Purdue University yfang@cs.purdue.edu Ryan Babbush Google Inc. babbush@google.com Changyou Chen

More information

Bayesian Deep Learning

Bayesian Deep Learning Bayesian Deep Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp June 06, 2018 Mohammad Emtiyaz Khan 2018 1 What will you learn? Why is Bayesian inference

More information

Neural Networks: Backpropagation

Neural Networks: Backpropagation Neural Networks: Backpropagation Seung-Hoon Na 1 1 Department of Computer Science Chonbuk National University 2018.10.25 eung-hoon Na (Chonbuk National University) Neural Networks: Backpropagation 2018.10.25

More information

Index. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow,

Index. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow, Index A Activation functions, neuron/perceptron binary threshold activation function, 102 103 linear activation function, 102 rectified linear unit, 106 sigmoid activation function, 103 104 SoftMax activation

More information

CS60010: Deep Learning

CS60010: Deep Learning CS60010: Deep Learning Sudeshna Sarkar Spring 2018 16 Jan 2018 FFN Goal: Approximate some unknown ideal function f : X! Y Ideal classifier: y = f*(x) with x and category y Feedforward Network: Define parametric

More information

Convolutional Neural Networks

Convolutional Neural Networks Convolutional Neural Networks Books» http://www.deeplearningbook.org/ Books http://neuralnetworksanddeeplearning.com/.org/ reviews» http://www.deeplearningbook.org/contents/linear_algebra.html» http://www.deeplearningbook.org/contents/prob.html»

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Oliver Schulte - CMPT 310 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will focus on

More information

Scalable Model Selection for Belief Networks

Scalable Model Selection for Belief Networks Scalable Model Selection for Belief Networks Zhao Song, Yusuke Muraoka, Ryohei Fujimaki, Lawrence Carin Department of ECE, Duke University Durham, NC 27708, USA {zhao.song, lcarin}@duke.edu NEC Data Science

More information

Bayesian Semi-supervised Learning with Deep Generative Models

Bayesian Semi-supervised Learning with Deep Generative Models Bayesian Semi-supervised Learning with Deep Generative Models Jonathan Gordon Department of Engineering Cambridge University jg801@cam.ac.uk José Miguel Hernández-Lobato Department of Engineering Cambridge

More information

Appendices: Stochastic Backpropagation and Approximate Inference in Deep Generative Models

Appendices: Stochastic Backpropagation and Approximate Inference in Deep Generative Models Appendices: Stochastic Backpropagation and Approximate Inference in Deep Generative Models Danilo Jimenez Rezende Shakir Mohamed Daan Wierstra Google DeepMind, London, United Kingdom DANILOR@GOOGLE.COM

More information

Introduction to Convolutional Neural Networks (CNNs)

Introduction to Convolutional Neural Networks (CNNs) Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

Greedy Layer-Wise Training of Deep Networks

Greedy Layer-Wise Training of Deep Networks Greedy Layer-Wise Training of Deep Networks Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle NIPS 2007 Presented by Ahmed Hefny Story so far Deep neural nets are more expressive: Can learn

More information

Deep Feedforward Networks. Han Shao, Hou Pong Chan, and Hongyi Zhang

Deep Feedforward Networks. Han Shao, Hou Pong Chan, and Hongyi Zhang Deep Feedforward Networks Han Shao, Hou Pong Chan, and Hongyi Zhang Deep Feedforward Networks Goal: approximate some function f e.g., a classifier, maps input to a class y = f (x) x y Defines a mapping

More information

Introduction to Machine Learning (67577)

Introduction to Machine Learning (67577) Introduction to Machine Learning (67577) Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Deep Learning Shai Shalev-Shwartz (Hebrew U) IML Deep Learning Neural Networks

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Introduction to Neural Networks

Introduction to Neural Networks CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character

More information

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check

More information

Unsupervised Learning

Unsupervised Learning CS 3750 Advanced Machine Learning hkc6@pitt.edu Unsupervised Learning Data: Just data, no labels Goal: Learn some underlying hidden structure of the data P(, ) P( ) Principle Component Analysis (Dimensionality

More information

Streaming Variational Bayes

Streaming Variational Bayes Streaming Variational Bayes Tamara Broderick, Nicholas Boyd, Andre Wibisono, Ashia C. Wilson, Michael I. Jordan UC Berkeley Discussion led by Miao Liu September 13, 2013 Introduction The SDA-Bayes Framework

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Final Examination CS 540-2: Introduction to Artificial Intelligence

Final Examination CS 540-2: Introduction to Artificial Intelligence Final Examination CS 540-2: Introduction to Artificial Intelligence May 7, 2017 LAST NAME: SOLUTIONS FIRST NAME: Problem Score Max Score 1 14 2 10 3 6 4 10 5 11 6 9 7 8 9 10 8 12 12 8 Total 100 1 of 11

More information

Improved Bayesian Compression

Improved Bayesian Compression Improved Bayesian Compression Marco Federici University of Amsterdam marco.federici@student.uva.nl Karen Ullrich University of Amsterdam karen.ullrich@uva.nl Max Welling University of Amsterdam Canadian

More information

Distributed Bayesian Learning with Stochastic Natural-gradient EP and the Posterior Server

Distributed Bayesian Learning with Stochastic Natural-gradient EP and the Posterior Server Distributed Bayesian Learning with Stochastic Natural-gradient EP and the Posterior Server in collaboration with: Minjie Xu, Balaji Lakshminarayanan, Leonard Hasenclever, Thibaut Lienart, Stefan Webb,

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Convolutional Neural Networks. Srikumar Ramalingam

Convolutional Neural Networks. Srikumar Ramalingam Convolutional Neural Networks Srikumar Ramalingam Reference Many of the slides are prepared using the following resources: neuralnetworksanddeeplearning.com (mainly Chapter 6) http://cs231n.github.io/convolutional-networks/

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

Bayesian Sampling Using Stochastic Gradient Thermostats

Bayesian Sampling Using Stochastic Gradient Thermostats Bayesian Sampling Using Stochastic Gradient Thermostats Nan Ding Google Inc. dingnan@google.com Youhan Fang Purdue University yfang@cs.purdue.edu Ryan Babbush Google Inc. babbush@google.com Changyou Chen

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

The connection of dropout and Bayesian statistics

The connection of dropout and Bayesian statistics The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University

More information

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation) Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

Machine Learning Final Exam May 5, 2015

Machine Learning Final Exam May 5, 2015 Name: Andrew ID: Instructions Anything on paper is OK in arbitrary shape size, and quantity. Electronic devices are not acceptable. This includes ipods, ipads, Android tablets, Blackberries, Nokias, Windows

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Fast yet Simple Natural-Gradient Variational Inference in Complex Models

Fast yet Simple Natural-Gradient Variational Inference in Complex Models Fast yet Simple Natural-Gradient Variational Inference in Complex Models Mohammad Emtiyaz Khan RIKEN Center for AI Project, Tokyo, Japan http://emtiyaz.github.io Joint work with Wu Lin (UBC) DidrikNielsen,

More information

Machine Learning Summer School, Austin, TX January 08, 2015

Machine Learning Summer School, Austin, TX January 08, 2015 Parametric Department of Information, Risk, and Operations Management Department of Statistics and Data Sciences The University of Texas at Austin Machine Learning Summer School, Austin, TX January 08,

More information

Deep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning Ian Goodfellow Last updated

Deep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning  Ian Goodfellow Last updated Deep Feedforward Networks Lecture slides for Chapter 6 of Deep Learning www.deeplearningbook.org Ian Goodfellow Last updated 2016-10-04 Roadmap Example: Learning XOR Gradient-Based Learning Hidden Units

More information

Probabilistic Graphical Models & Applications

Probabilistic Graphical Models & Applications Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Deep unsupervised learning

Deep unsupervised learning Deep unsupervised learning Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea Unsupervised learning In machine learning, there are 3 kinds of learning paradigm.

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch. Neural Networks Oliver Schulte - CMPT 726 Bishop PRML Ch. 5 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will

More information

1 Machine Learning Concepts (16 points)

1 Machine Learning Concepts (16 points) CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

Local Expectation Gradients for Doubly Stochastic. Variational Inference

Local Expectation Gradients for Doubly Stochastic. Variational Inference Local Expectation Gradients for Doubly Stochastic Variational Inference arxiv:1503.01494v1 [stat.ml] 4 Mar 2015 Michalis K. Titsias Athens University of Economics and Business, 76, Patission Str. GR10434,

More information

Scaling Neighbourhood Methods

Scaling Neighbourhood Methods Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Computational statistics

Computational statistics Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Auto-Encoding Variational Bayes

Auto-Encoding Variational Bayes Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower

More information

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic

More information

Bayesian Networks (Part I)

Bayesian Networks (Part I) 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Networks (Part I) Graphical Model Readings: Murphy 10 10.2.1 Bishop 8.1,

More information

Variational Autoencoders

Variational Autoencoders Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information