Scalable Deep Poisson Factor Analysis for Topic Modeling: Supplementary Material

Size: px
Start display at page:

Download "Scalable Deep Poisson Factor Analysis for Topic Modeling: Supplementary Material"

Transcription

1 : Supplementary Material Zhe Gan Changyou Chen Ricardo Henao David Carlson Lawrence Carin Department of Electrical and Computer Engineering, Duke University, Durham, NC 7708, USA A Conditional Densities used in BCDF Using dot notation to represent marginal sums, eg, x nk p x pnk, we can write the conditional densities for DPFA as Zhou & Carin, 015 x pnk Multix pn ; ζ pn1,, ζ pnk, 1 φ k Dira φ + x 1 k,, a φ + x P k, θ kn Gammar k h 1 kn + x nk, p n, N 1 r k Gamma γ 0 + l kn, n=1 c 0 N n=1 h1 kn ln1 p, n K γ 0 Gamma e 0 + l k, 1 k=1 f 0 K k=1 ln1 p k, 3 h 1 kn π kn 1 p n rk δx nk = 0Ber π kn 1 p n rk + 1 π kn + δx nk > 0, where N l kn CRT x nk, r k h 1 kn, l k CRT l kn, γ 0, ζ pnk = φ pk θ kn K k=1 φ pkθ kn, p k = π kn = σ w 1 k h n + c 1 k n=1 N n=1 h1 kn ln1 p n c 0 N n=1 h1 kn ln1 p n, CRT represents the Chinese Restaurant Table distribution A CRT random variable l CRTm, r can be generated with the summation of independent Bernoulli random variables as Zhou & Carin, 015 l = m n=1 r b n, b n Ber n 1 + r B Proof of Theorem 1 Consider a general stochastic differential equation of the form dγ = QΓdt + DΓdW, 4 where Γ R N, Q : R N R N, D : R M R N P are measurable functions with P unnecessarily equal to N, and W is the standard P -dimensional Brownian motion Taking Γ = θ v ξ, QΓ = v f ξv 1 M vt v 1, and DΓ to be constant, ie, D, recovers the setting in the original stochastic gradient Nóse-Hoover thermostats algorithm Ding et al, 014 with the notation of this paper Furthermore, write the joint distribution of Γ as pγ = 1 exp { HΓ}, Z where HΓ is usually called the Hamiltonian function of a system In the following we decompose Γ as Γ = θ, x and HΓ as HΓ = Uθ + Eθ, x where Uθ may be the negative log-posterior of a Bayesian model To prove Theorem 1 in the main text, we make use of Lemma 1 below, which is essentially the main theorem in Ding et al, 014, which again is a consequence of the celebrated Fokker-Planck Equation Risken, 1989 Lemma 1 The stochastic process of θ generated by the stochastic differential equation 4 has the target distribution p θ θ = 1 Z exp{ Uθ} as its stationary distribution, if pγ satisfies the following marginalization condition: exp{ Uθ} exp{ Uθ Eθ, x}dx, 5

2 and if the following condition is also satisfied: pq = : pd, 6 where / θ, / x, represents the vector inner product operator, : represents a matrix double dot product, ie, X : Y trx Y Proof of Theorem 1 For conciseness, we replace the model parameters Ψ g in the main text with notation θ in the following We first reformulate our proposed SGNHT as a special case of the general SDE in 4, ie, Γ = θ, v, ξ 1,, ξ M, QΓ = v, f Ξv, v1 1,, vm 1, DΓ = diag 0,, 0, D,, D, 0,, 0 }{{}}{{}}{{} M M M From Theorem 1 in the main text we know pγ = 1 Z exp 1 v v Uθ 1 tr {Ξ D Ξ D}, 7 } with HΓ = 1 v v +Uθ+ 1 {Ξ tr D Ξ D The marginalization condition 5 is trivially satisfied, we are left to verify condition 6 Substituting p and Q into 6, we have the left-hand side LHS = i = i = i = i = i Γ i pq i p Q i + Q i p Γ i Γ i Qi H p Γ i Γ i ξ i v T f Ξv + f T v i D ii v i 1 p ξ i D v i 1 Since D is independent of Γ, we have the right-hand side RHS = D ij p Γ i j i Γ j = D ij H p v i j j v i = D ii v i 1 p i LHS Now we have verified both conditions of Lemma 1, this makes the distribution 7 the equilibrium distribution of our proposed SGNHT algorithm C Sampling Topic-Word Distributions using the Stochastic Gradient Riemannian Langevin Dynamics The Langevin dynamic diffusion is defined via a stochastic differential equation of the following form: dθ t = 1 θ t log Uθ t dt + dw t, 8 where t is the time index, θ t R M is the model parameter, Uθ t N i=1 px i θ t pθ t is the model posterior, and W t is the standard M-dimensional Brownian motion The law of the Langevin dynamic is described by the Fokker-Planck equation Risken, 1989, and the equilibrium distribution pθ of θ t can be shown to be the model posterior Uθ Øksendal, 003 In Bayesian learning with big data, the stochastic gradient Langevin dynamic SGLD Welling & Teh, 011 generalizes the Langevin dynamic 8 by replacing Uθ t with a stochastic version Ũθ t evaluated on a subset of data, eg, Ũθ t i D px i θ t pθ t, where D is a random subset of {1,, N} Furthermore, the corresponding SDE of SGLD is then solved via the Euler-Maruyama scheme Tuckerman, 010 using a decreasing step sizes sequence Welling & Teh, 011, ie, samples of the SGLD are generated as θ t+1 = θ t + δ t θ log Ũθ t + ζ t, ζ t N0, δ t I, where I is the identity matrix and {δ t } is a decreasing sequence such that lim t δ t = 0 and t=1 δ t = Under certain assumptions, it is shown that the SGLD is consistent with the Langevin dynamics of 8, ie, it generates correct samples from the posterior Uθ Teh et al, 014; Vollmer et al, 015 The stochastic gradient Riemannian Langevin dynamics SGRLD Patterson & Teh, 013 extends the SGLD by defining it on Riemannian manifolds Girolami & Calderhead, 011; Byrne & Girolami, 013 Specifically, given a Riemannian metric Gθ Jürgen, 008, the SGRLD generalizes the Langevin dynamic 8 on a Riemannian manifold and generates samples using the following proposal θ t+1 = θ t + δ t µθ t + Gθ t 1/ ζ t, ζ t N0, δ t I, where the j-th component of µθ t is given by µθ t j = M G 1 θ t θt Ũθ t j G 1 θ t Gθ t G 1 θ t θ tk k=1 jk M + G 1 θ t G jk tr 1 θ t Gθ t θ tk k=1 9

3 The SGRLD makes moves along a Riemannian manifold defined by the metric Gθ, thus having the advantage of converging faster towards the optimal solution compared to the naive SGLD In the Bayesian setting, the Fisher Information matrix Rao, 1945; Amari, 1990; 1997 is usually chosen as the metric Gθ, which has convenient forms for some models SGRLD for Deep Poisson Factor Analysis We use the SGRLD algorithm to sample the topic-word distributions {φ k } s From Section A, we see that based on the augmentation x pnk, the posterior of φ k is a Dirichlet distribution, ie, the joint conditional likelihood of {x pnk }, φ k is p{x pnk }, φ k p φ pk x p k Now we reparameterize φ k using the expanded-mean φ schema Patterson & Teh, 013 as φ pk = pk p φ Together with the prior on φ k as φ k Dira φ, this gives us pk a stochastic gradient on a subset of data D as log p φ k φ a φ 1 1 pk φ pk + N E {xpnk } D n D [ ] x pnk x nk, φ pk φ k where the expectation is taken over the posterior of {x pnk }, which is infeasible As a result, we use samples from the posterior to approximate the expectation To this end, we use the conditional posterior of {x pnk } in 1 to collect samples for the current mini-batch D after a few burn-in steps To apply the SGRLD algorithm, we need to choose a metric Gθ for the Riemannian manifold Because the posterior of φ k is Dirichlet, we thus use the same Riemannian metric as in LDA Patterson & Teh, 013, eg, Gφ = diagφ 11,, φ P 1,, φ 1K,, φ P K 1, and use the same mirror idea Patterson & Teh, 013 to avoid parameters going out of boundary After plugging Gθ into 9 and simplifying, we get the update for φ pk as: φ pk = φ pk + δ a φ φ pk + N E {xpnk } [x pnk x nk φ pk ] + D φ pk 1/ ζ, i D where we have omitted the time index t in the parameters for simplicity For the other global model parameters such as r k and γ 0, we make use of their posterior distributions in and 3, respectively We only briefly describe the sampling for r k, sampling γ 0 is similar To simplified, rewrite the posterior r k as pr k Gamma a k + c 0, 1/c 0 + b k, where a k and b k are parameters for the Gamma distribution containing local parameters or augmented parameters from the current mini-batch, eg, a k contains {l kn } for the current mini-batch Denote these local parameters as L x Now we use the reparameterization for r k as r k = e r k with r k Log-Gammaγ 0, 1/c 0, this is equivalent as putting a Gammaγ 0, 1/c 0 prior on r k Now the stochastic gradient for r k can be easily seen as log p r k r k = r k E Lx [ a k + γ 0 1 r k e c0+b k r k ] = E Lx [a k + γ 0 1 c 0 + b k r c0+b k k ] Again, the expectation can be approximated using Monte Carlo integration by drawing samples from the posterior, then the stochastic gradient Langevin dynamic can be straightforwardly applied D Evaluation Details on Perplexities For each test document, we randomly partition the words into a 80/0% split We learn document-specific local parameters using the 80% portion, and then calculate the predictive perplexities on the remaining 0% subset, denoted as Y For the PFA-based models, the test perplexity is calculated as Zhou et al, 01 exp 1 y P p=1 n=1 N y pn log S K s=1 k=1 φs pk θs kn S P K s=1 p=1 k=1 φs pk θs kn where S is the total number of collected samples, y = P N p=1 n=1 y pn and y pn is an element of matrix Y The conditional distribution of y n given h n, in the Replicated Softmax model RSM is specified as y n MultiD n ; β n, β pn = expw p h n + c p P p =1 expw p h n + c p, where y n is the nth column of Y, and D n = P p=1 y pn W = [w 1, w P ] R P K is the mapping from h n to y n, and c = [c 1, c P ] R P 1 is the bias term Based on this, the predictive test perplexity for RSM can be calculated as exp 1 y P p=1 n=1 E Sensitivity Analysis N y pn log β pn We examined the sensitivity of the model performance with respect to batch sizes in SGNHT on the three corpora considered The results are shown in Figure 1 We found that,

4 Batch_1 Batch_10 Batch_30 Batch_50 Batch_80 Batch_ Batch_10 Batch_50 Batch_80 Batch_ Batch_10 Batch_30 Batch_50 Batch_80 Batch_ Figure 1 Test perplexities wrt mini-batch sizes on the three corpora The number of hidden units in each layer is 18, 64, 3, respectively Left 0 Newsgroups Middle RCV1-v Right Wikipedia x 10 6 Figure Test perplexities as a function of training documents seen The number of hidden units in each layer is 18, 64, 3, respectively Left RCV1-v Right Wikipedia overall performance, both convergence speed and test perplexity, suffer considerably when the batch size is smaller than 10 documents However, for batch sizes larger than for RCV1-v we obtain performances comparable to those shown in Tables 1 and 3 We run the SGNHT algorithm on the RCV1-v and Wikipedia datasets long enough so that the whole corpora can be traversed The results are shown in Figure As can be seen, performance smoothly improves as the amount of data processed increases F Additional Results We compare our DPFA model with stronger shallow baselines, such as the Marked-Gamma-NB and Marked-Beta- NB model described in Zhou & Carin 015 Direct Gibbs sampling is only suitable for relatively small datasets, so only results on the 0Newsgroups are reported, shown in Table 1 As can be seen, these advanced models can achieve perplexity results comparable to those of a twolayer DPFA model Table 1 Additional results on 0 Newsgroups MODEL METHOD DIM PERP MARKED-BETA-NB GIBBS MARKED-GAMMA-NB GIBBS G Source Code The source code, along with the topics we inferred from the model, are available at zhegan7/dpfa_icml015 This package is made publicly available for reproducibility purposes, and it is not optimized for speed, minimally documented but fully functional References Amari, S Differential geometrical methods in statistics Lecture Notes in Statistics 8, Springer-Verlag, 1990 Amari, S Natural gradient works efficiently in learning Neural computation, 1997 Byrne, S and Girolami, M Geodesic Monte Carlo on embedded manifolds Scandinavian J Statist, 013 Ding, N, Fang, Y, Babbush, R, Chen, C, Skeel, R D, and

5 Figure 3 Full graph induced by the correlation structure learned by DPFA-SBN for the 0 Newsgroups corpus

6 Figure 4 Full graph induced by the correlation structure learned by DPFA-SBN for the RCV1-v corpus

7 Neven, H Bayesian sampling using stochastic gradient thermostats NIPS, 014 Girolami, M and Calderhead, B Riemann manifold Langevin and Hamiltonian Monte Carlo methods J R Statist Soc B, 011 Jürgen, J Riemannian Geometry and Geometric Analysis Springer-Verlag, 008 Øksendal, B Stochastic differential equations Springer, 003 Patterson, S and Teh, Y W Stochastic gradient Riemannian Langevin dynamics on the probability simplex NIPS, 013 Rao, C R Information and accuracy attainable in the estimation of statistical parameters Bulletin of the Calcutta mathematical society, 1945 Risken, H The Fokker-Planck Equation Springer-Verlag, New York, 1989 Teh, Y W, Thiery, A H, and Vollmer, S J Consistency and fluctuations for stochastic gradient Langevin dynamics arxiv: , 014 Tuckerman, M E Statistical Mechanics: Theory and Molecular Simulation Oxford University Press, 010 Vollmer, S J, Zygalakis, K C, and Teh, Y W Non- asymptotic properties of stochastic gradient Langevin dynamics arxiv: , 015 Welling, M and Teh, Y W Bayesian learning via stochastic gradient Langevin dynamics ICML, 011 Zhou, M and Carin, L Negative binomial process count and mixture modeling PAMI, 015 Zhou, M, Hannah, L, Dunson, D, and Carin, L Betanegative binomial process and Poisson factor analysis AISTATS, 01 Scalable Deep Poisson Factor Analysis for Topic Modeling

Supplementary Material of High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep Models

Supplementary Material of High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep Models Supplementary Material of High-Order Stochastic Gradient hermostats for Bayesian Learning of Deep Models Chunyuan Li, Changyou Chen, Kai Fan 2 and Lawrence Carin Department of Electrical and Computer Engineering,

More information

Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization

Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization Changyou Chen, David Carlson, Zhe Gan, Chunyuan Li, Lawrence Carin May 2, 2016 1 Changyou Chen Bridging the Gap between Stochastic

More information

High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep Models

High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep Models High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep Models Chunyuan Li, Changyou Chen, Kai Fan 2 and Lawrence Carin Department of Electrical and Computer Engineering, Duke University

More information

Introduction to Stochastic Gradient Markov Chain Monte Carlo Methods

Introduction to Stochastic Gradient Markov Chain Monte Carlo Methods Introduction to Stochastic Gradient Markov Chain Monte Carlo Methods Changyou Chen Department of Electrical and Computer Engineering, Duke University cc448@duke.edu Duke-Tsinghua Machine Learning Summer

More information

High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep Models

High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep Models High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep Models Chunyuan Li, Changyou Chen, Kai Fan 2 and Lawrence Carin Department of Electrical and Computer Engineering, Duke University

More information

On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators

On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators Changyou Chen Nan Ding awrence Carin Dept. of Electrical and Computer Engineering, Duke University, Durham, NC, USA

More information

Afternoon Meeting on Bayesian Computation 2018 University of Reading

Afternoon Meeting on Bayesian Computation 2018 University of Reading Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of

More information

Bayesian Sampling Using Stochastic Gradient Thermostats

Bayesian Sampling Using Stochastic Gradient Thermostats Bayesian Sampling Using Stochastic Gradient Thermostats Nan Ding Google Inc. dingnan@google.com Youhan Fang Purdue University yfang@cs.purdue.edu Ryan Babbush Google Inc. babbush@google.com Changyou Chen

More information

Bayesian Sampling Using Stochastic Gradient Thermostats

Bayesian Sampling Using Stochastic Gradient Thermostats Bayesian Sampling Using Stochastic Gradient Thermostats Nan Ding Google Inc. dingnan@google.com Youhan Fang Purdue University yfang@cs.purdue.edu Ryan Babbush Google Inc. babbush@google.com Changyou Chen

More information

Manifold Monte Carlo Methods

Manifold Monte Carlo Methods Manifold Monte Carlo Methods Mark Girolami Department of Statistical Science University College London Joint work with Ben Calderhead Research Section Ordinary Meeting The Royal Statistical Society October

More information

Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization

Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization Changyou Chen David Carlson Zhe Gan Chunyuan i awrence Carin Department of Electrical and Computer Engineering, Duke University

More information

Scalable Non-linear Beta Process Factor Analysis

Scalable Non-linear Beta Process Factor Analysis Scalable Non-linear Beta Process Factor Analysis Kai Fan Duke University kai.fan@stat.duke.edu Katherine Heller Duke University kheller@stat.duke.com Abstract We propose a non-linear extension of the factor

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d

More information

Riemannian Stein Variational Gradient Descent for Bayesian Inference

Riemannian Stein Variational Gradient Descent for Bayesian Inference Riemannian Stein Variational Gradient Descent for Bayesian Inference Chang Liu, Jun Zhu 1 Dept. of Comp. Sci. & Tech., TNList Lab; Center for Bio-Inspired Computing Research State Key Lab for Intell. Tech.

More information

arxiv: v2 [stat.ml] 10 Jan 2018

arxiv: v2 [stat.ml] 10 Jan 2018 Yizhe Zhang Changyou Chen Zhe Gan Ricardo Henao Lawrence Carin arxiv:76.498v2 [stat.ml] Jan 28 Abstract Recent advances in stochastic gradient techniques have made it possible to estimate posterior distributions

More information

Priors for Random Count Matrices with Random or Fixed Row Sums

Priors for Random Count Matrices with Random or Fixed Row Sums Priors for Random Count Matrices with Random or Fixed Row Sums Mingyuan Zhou Joint work with Oscar Madrid and James Scott IROM Department, McCombs School of Business Department of Statistics and Data Sciences

More information

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic

More information

Monte Carlo in Bayesian Statistics

Monte Carlo in Bayesian Statistics Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview

More information

Thermostatic Controls for Noisy Gradient Systems and Applications to Machine Learning

Thermostatic Controls for Noisy Gradient Systems and Applications to Machine Learning Thermostatic Controls for Noisy Gradient Systems and Applications to Machine Learning Ben Leimkuhler University of Edinburgh Joint work with C. Matthews (Chicago), G. Stoltz (ENPC-Paris), M. Tretyakov

More information

Replicated Softmax: an Undirected Topic Model. Stephen Turner

Replicated Softmax: an Undirected Topic Model. Stephen Turner Replicated Softmax: an Undirected Topic Model Stephen Turner 1. Introduction 2. Replicated Softmax: A Generative Model of Word Counts 3. Evaluating Replicated Softmax as a Generative Model 4. Experimental

More information

Machine Learning Summer School, Austin, TX January 08, 2015

Machine Learning Summer School, Austin, TX January 08, 2015 Parametric Department of Information, Risk, and Operations Management Department of Statistics and Data Sciences The University of Texas at Austin Machine Learning Summer School, Austin, TX January 08,

More information

Modeling Documents with a Deep Boltzmann Machine

Modeling Documents with a Deep Boltzmann Machine Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava, Ruslan Salakhutdinov & Geoffrey Hinton UAI 2013 Presented by Zhe Gan, Duke University November 14, 2014 1 / 15 Outline Replicated Softmax

More information

CPSG-MCMC: Clustering-Based Preprocessing method for Stochastic Gradient MCMC

CPSG-MCMC: Clustering-Based Preprocessing method for Stochastic Gradient MCMC CPSG-MCMC: Clustering-Based Preprocessing method for Stochastic Gradient MCMC Tianfan Fu Shanghai Jiao Tong University Zhihua Zhang Peking University & Beijing Institute of Big Data Research Abstract In

More information

CPSG-MCMC: Clustering-Based Preprocessing method for Stochastic Gradient MCMC

CPSG-MCMC: Clustering-Based Preprocessing method for Stochastic Gradient MCMC CPSG-MCMC: Clustering-Based Preprocessing method for Stochastic Gradient MCMC Tianfan Fu Shanghai Jiao Tong University Zhihua Zhang Peking University & Beijing Institute of Big Data Research Abstract In

More information

Distributed Stochastic Gradient MCMC

Distributed Stochastic Gradient MCMC Sungjin Ahn Department of Computer Science, University of California, Irvine Babak Shahbaba Department of Statistics, University of California, Irvine Max Welling Machine Learning Group, University of

More information

Variational Inference via Stochastic Backpropagation

Variational Inference via Stochastic Backpropagation Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation

More information

A Unified Particle-Optimization Framework for Scalable Bayesian Sampling

A Unified Particle-Optimization Framework for Scalable Bayesian Sampling A Unified Particle-Optimization Framework for Scalable Bayesian Sampling Changyou Chen University at Buffalo (cchangyou@gmail.com) Ruiyi Zhang, Wenlin Wang, Bai Li, Liqun Chen Duke University Abstract

More information

HYBRID DETERMINISTIC-STOCHASTIC GRADIENT LANGEVIN DYNAMICS FOR BAYESIAN LEARNING

HYBRID DETERMINISTIC-STOCHASTIC GRADIENT LANGEVIN DYNAMICS FOR BAYESIAN LEARNING COMMUNICATIONS IN INFORMATION AND SYSTEMS c 01 International Press Vol. 1, No. 3, pp. 1-3, 01 003 HYBRID DETERMINISTIC-STOCHASTIC GRADIENT LANGEVIN DYNAMICS FOR BAYESIAN LEARNING QI HE AND JACK XIN Abstract.

More information

Sparse Stochastic Inference for Latent Dirichlet Allocation

Sparse Stochastic Inference for Latent Dirichlet Allocation Sparse Stochastic Inference for Latent Dirichlet Allocation David Mimno 1, Matthew D. Hoffman 2, David M. Blei 1 1 Dept. of Computer Science, Princeton U. 2 Dept. of Statistics, Columbia U. Presentation

More information

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

arxiv: v1 [stat.ml] 4 Dec 2018

arxiv: v1 [stat.ml] 4 Dec 2018 1st Symposium on Advances in Approximate Bayesian Inference, 2018 1 6 Parallel-tempered Stochastic Gradient Hamiltonian Monte Carlo for Approximate Multimodal Posterior Sampling arxiv:1812.01181v1 [stat.ml]

More information

On Markov chain Monte Carlo methods for tall data

On Markov chain Monte Carlo methods for tall data On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational

More information

Streaming Variational Bayes

Streaming Variational Bayes Streaming Variational Bayes Tamara Broderick, Nicholas Boyd, Andre Wibisono, Ashia C. Wilson, Michael I. Jordan UC Berkeley Discussion led by Miao Liu September 13, 2013 Introduction The SDA-Bayes Framework

More information

Hyperparameter estimation in Dirichlet process mixture models

Hyperparameter estimation in Dirichlet process mixture models Hyperparameter estimation in Dirichlet process mixture models By MIKE WEST Institute of Statistics and Decision Sciences Duke University, Durham NC 27706, USA. SUMMARY In Bayesian density estimation and

More information

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions - Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions Simon Luo The University of Sydney Data61, CSIRO simon.luo@data61.csiro.au Mahito Sugiyama National Institute of

More information

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava nitish@cs.toronto.edu Ruslan Salahutdinov rsalahu@cs.toronto.edu Geoffrey Hinton hinton@cs.toronto.edu

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Learning Deep Sigmoid Belief Networks with Data Augmentation

Learning Deep Sigmoid Belief Networks with Data Augmentation Learning Deep Sigmoid Belief Networs with Data Augmentation Zhe Gan Ricardo Henao David Carlson Lawrence Carin Department of Electrical and Computer Engineering, Due University, Durham NC 27708, USA Abstract

More information

Stochastic Gradient MCMC with Stale Gradients

Stochastic Gradient MCMC with Stale Gradients Stochastic Gradient MCMC with Stale Gradients Changyou Chen Nan Ding Chunyuan Li Yizhe Zhang Lawrence Carin Dept. of Electrical and Computer Engineering, Duke University, Durham, NC, USA Google Inc., Venice,

More information

Riemann Manifold Methods in Bayesian Statistics

Riemann Manifold Methods in Bayesian Statistics Ricardo Ehlers ehlers@icmc.usp.br Applied Maths and Stats University of São Paulo, Brazil Working Group in Statistical Learning University College Dublin September 2015 Bayesian inference is based on Bayes

More information

Information geometry of mirror descent

Information geometry of mirror descent Information geometry of mirror descent Geometric Science of Information Anthea Monod Department of Statistical Science Duke University Information Initiative at Duke G. Raskutti (UW Madison) and S. Mukherjee

More information

Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks

Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence AAAI-6 Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks Chunyuan Li, Changyou Chen, David Carlson and

More information

Spatial Normalized Gamma Process

Spatial Normalized Gamma Process Spatial Normalized Gamma Process Vinayak Rao Yee Whye Teh Presented at NIPS 2009 Discussion and Slides by Eric Wang June 23, 2010 Outline Introduction Motivation The Gamma Process Spatial Normalized Gamma

More information

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

A new iterated filtering algorithm

A new iterated filtering algorithm A new iterated filtering algorithm Edward Ionides University of Michigan, Ann Arbor ionides@umich.edu Statistics and Nonlinear Dynamics in Biology and Medicine Thursday July 31, 2014 Overview 1 Introduction

More information

Document and Topic Models: plsa and LDA

Document and Topic Models: plsa and LDA Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Approximate Slice Sampling for Bayesian Posterior Inference

Approximate Slice Sampling for Bayesian Posterior Inference Approximate Slice Sampling for Bayesian Posterior Inference Anonymous Author 1 Anonymous Author 2 Anonymous Author 3 Unknown Institution 1 Unknown Institution 2 Unknown Institution 3 Abstract In this paper,

More information

Hamiltonian Monte Carlo for Scalable Deep Learning

Hamiltonian Monte Carlo for Scalable Deep Learning Hamiltonian Monte Carlo for Scalable Deep Learning Isaac Robson Department of Statistics and Operations Research, University of North Carolina at Chapel Hill isrobson@email.unc.edu BIOS 740 May 4, 2018

More information

Thermostat-assisted Continuous-tempered Hamiltonian Monte Carlo for Multimodal Posterior Sampling

Thermostat-assisted Continuous-tempered Hamiltonian Monte Carlo for Multimodal Posterior Sampling Thermostat-assisted Continuous-tempered Hamiltonian Monte Carlo for Multimodal Posterior Sampling Rui Luo, Yaodong Yang, Jun Wang, and Yuanyuan Liu Department of Computer Science, University College London

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Deep Latent Dirichlet Allocation with Topic-Layer-Adaptive Stochastic Gradient Riemannian MCMC

Deep Latent Dirichlet Allocation with Topic-Layer-Adaptive Stochastic Gradient Riemannian MCMC Deep Latent Dirichlet Allocation with Topic-Layer-Adaptive Stochastic Gradient Riemannian MCMC Yulai Cong 1 Bo Chen 1 Hongwei Liu 1 Mingyuan Zhou 2 Abstract It is challenging to develop stochastic gradient

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Deep Poisson Factor Modeling

Deep Poisson Factor Modeling Deep Poisson Factor Modeling Ricardo Henao, Zhe Gan, James Lu and Lawrence Carin Department of Electrical and Computer Engineering Due University, Durham, NC 27708 {r.henao,zhe.gan,james.lu,lcarin}@due.edu

More information

Construction of Dependent Dirichlet Processes based on Poisson Processes

Construction of Dependent Dirichlet Processes based on Poisson Processes 1 / 31 Construction of Dependent Dirichlet Processes based on Poisson Processes Dahua Lin Eric Grimson John Fisher CSAIL MIT NIPS 2010 Outstanding Student Paper Award Presented by Shouyuan Chen Outline

More information

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis Summarizing a posterior Given the data and prior the posterior is determined Summarizing the posterior gives parameter estimates, intervals, and hypothesis tests Most of these computations are integrals

More information

Collapsed Variational Bayesian Inference for Hidden Markov Models

Collapsed Variational Bayesian Inference for Hidden Markov Models Collapsed Variational Bayesian Inference for Hidden Markov Models Pengyu Wang, Phil Blunsom Department of Computer Science, University of Oxford International Conference on Artificial Intelligence and

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing

More information

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) Latent Dirichlet Allocation (LDA) A review of topic modeling and customer interactions application 3/11/2015 1 Agenda Agenda Items 1 What is topic modeling? Intro Text Mining & Pre-Processing Natural Language

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

Large-scale Ordinal Collaborative Filtering

Large-scale Ordinal Collaborative Filtering Large-scale Ordinal Collaborative Filtering Ulrich Paquet, Blaise Thomson, and Ole Winther Microsoft Research Cambridge, University of Cambridge, Technical University of Denmark ulripa@microsoft.com,brmt2@cam.ac.uk,owi@imm.dtu.dk

More information

Lecture 4. Generative Models for Discrete Data - Part 3. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza.

Lecture 4. Generative Models for Discrete Data - Part 3. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. Lecture 4 Generative Models for Discrete Data - Part 3 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza October 6, 2017 Luigi Freda ( La Sapienza University) Lecture 4 October 6, 2017 1 / 46 Outline

More information

Efficient adaptive covariate modelling for extremes

Efficient adaptive covariate modelling for extremes Efficient adaptive covariate modelling for extremes Slides at www.lancs.ac.uk/ jonathan Matthew Jones, David Randell, Emma Ross, Elena Zanini, Philip Jonathan Copyright of Shell December 218 1 / 23 Structural

More information

Dirichlet Enhanced Latent Semantic Analysis

Dirichlet Enhanced Latent Semantic Analysis Dirichlet Enhanced Latent Semantic Analysis Kai Yu Siemens Corporate Technology D-81730 Munich, Germany Kai.Yu@siemens.com Shipeng Yu Institute for Computer Science University of Munich D-80538 Munich,

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Augment-and-Conquer Negative Binomial Processes

Augment-and-Conquer Negative Binomial Processes Augment-and-Conquer Negative Binomial Processes Mingyuan Zhou Dept. of Electrical and Computer Engineering Duke University, Durham, NC 27708 mz@ee.duke.edu Lawrence Carin Dept. of Electrical and Computer

More information

Image segmentation combining Markov Random Fields and Dirichlet Processes

Image segmentation combining Markov Random Fields and Dirichlet Processes Image segmentation combining Markov Random Fields and Dirichlet Processes Jessica SODJO IMS, Groupe Signal Image, Talence Encadrants : A. Giremus, J.-F. Giovannelli, F. Caron, N. Dobigeon Jessica SODJO

More information

Information Geometric view of Belief Propagation

Information Geometric view of Belief Propagation Information Geometric view of Belief Propagation Yunshu Liu 2013-10-17 References: [1]. Shiro Ikeda, Toshiyuki Tanaka and Shun-ichi Amari, Stochastic reasoning, Free energy and Information Geometry, Neural

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17 MCMC for big data Geir Storvik BigInsight lunch - May 2 2018 Geir Storvik MCMC for big data BigInsight lunch - May 2 2018 1 / 17 Outline Why ordinary MCMC is not scalable Different approaches for making

More information

Lecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu

Lecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu Lecture 16-17: Bayesian Nonparametrics I STAT 6474 Instructor: Hongxiao Zhu Plan for today Why Bayesian Nonparametrics? Dirichlet Distribution and Dirichlet Processes. 2 Parameter and Patterns Reference:

More information

Variational Bayes on Monte Carlo Steroids

Variational Bayes on Monte Carlo Steroids Variational Bayes on Monte Carlo Steroids Aditya Grover, Stefano Ermon Department of Computer Science Stanford University {adityag,ermon}@cs.stanford.edu Abstract Variational approaches are often used

More information

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008 Readings: K&F: 16.3, 16.4, 17.3 Bayesian Param. Learning Bayesian Structure Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University October 6 th, 2008 10-708 Carlos Guestrin 2006-2008

More information

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance

More information

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Advanced computational methods X Selected Topics: SGD

Advanced computational methods X Selected Topics: SGD Advanced computational methods X071521-Selected Topics: SGD. In this lecture, we look at the stochastic gradient descent (SGD) method 1 An illustrating example The MNIST is a simple dataset of variety

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

Latent Dirichlet Bayesian Co-Clustering

Latent Dirichlet Bayesian Co-Clustering Latent Dirichlet Bayesian Co-Clustering Pu Wang 1, Carlotta Domeniconi 1, and athryn Blackmond Laskey 1 Department of Computer Science Department of Systems Engineering and Operations Research George Mason

More information

Nonparametric Bayesian Matrix Factorization for Assortative Networks

Nonparametric Bayesian Matrix Factorization for Assortative Networks Nonparametric Bayesian Matrix Factorization for Assortative Networks Mingyuan Zhou IROM Department, McCombs School of Business Department of Statistics and Data Sciences The University of Texas at Austin

More information

Introduction to Deep Neural Networks

Introduction to Deep Neural Networks Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Statistical Debugging with Latent Topic Models

Statistical Debugging with Latent Topic Models Statistical Debugging with Latent Topic Models David Andrzejewski, Anne Mulhern, Ben Liblit, Xiaojin Zhu Department of Computer Sciences University of Wisconsin Madison European Conference on Machine Learning,

More information

LDA with Amortized Inference

LDA with Amortized Inference LDA with Amortied Inference Nanbo Sun Abstract This report describes how to frame Latent Dirichlet Allocation LDA as a Variational Auto- Encoder VAE and use the Amortied Variational Inference AVI to optimie

More information

Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling

Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling Mingyuan Zhou IROM Department, McCombs School of Business The University of Texas at Austin, Austin, TX 77,

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

INFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES

INFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES INFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES SHILIANG SUN Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 20024, China E-MAIL: slsun@cs.ecnu.edu.cn,

More information

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015 Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Stat 516, Homework 1

Stat 516, Homework 1 Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball

More information

Algorithms for Variational Learning of Mixture of Gaussians

Algorithms for Variational Learning of Mixture of Gaussians Algorithms for Variational Learning of Mixture of Gaussians Instructors: Tapani Raiko and Antti Honkela Bayes Group Adaptive Informatics Research Center 28.08.2008 Variational Bayesian Inference Mixture

More information

arxiv: v1 [stat.ml] 10 Feb 2016

arxiv: v1 [stat.ml] 10 Feb 2016 arxiv:1602.03442v1 [stat.ml] 10 Feb 2016 Stochastic Quasi-Newton Langevin Monte Carlo Umut Şimşekli 1, Roland Badeau 1, A. Taylan Cemgil 2, Gaël Richard 1 1: LTCI, CNRS, Télécom ParisTech, Université Paris-Saclay,

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information