Online Bayesian Passive-Aggressive Learning"

Size: px
Start display at page:

Download "Online Bayesian Passive-Aggressive Learning""

Transcription

1 Online Bayesian Passive-Aggressive Learning" Tianlin Shi! Jun Zhu!

2 The BIG DATA challenge" Large amounts of data.! Big data:!! Big Science: 25 PB annual data.! Streaming data.! Image Courtesy: h0p://robo4c- rodents.com/ Complex data: text, images, genomic, etc.!

3 Online Learning" n Batch Learning" Data Learning Algorithm" Loss "L Model 1. Data may come in as a stream. 2. We don t have memory/4me to compute it! There is redundancy in data.

4 Online Learning" n Online Learning" Data instantaneous Loss Online" Learning Algorithm" Loss "L Predic4on (Supervised Case) Model Update

5 ! Online Passive-Aggressive Learning Crammer et al [1]" Online update of a large-margin classifier.! SVM weight! w {xt } t 1 {y t } t 1 From sequential data and labels.! Close-form update rule!! Drawback:!! 1. Limited model complexity.! 2. Single estimate of the model.!

6 Bayesian models" Flexibility! Can be non-parametric! e.g. Infinite number of components in a topic model.! " " " " " " " "Teh et al. HDP. JASA 2006.! Posterior inference is challenging!! Both VB and MCMC can be expensive in big data.! Attempts to speed up the inference:! Online LDA. Hoffman et al. NIPS 2010! Online Sparse Stochastic Inference. Mimno et al. ICML 2012! Stochastic Gradient Fisher Score. Ahn et al. ICML 2012.! Typically are lack of discriminative ability.!

7 Max-Margin Bayesian Models" MED: Max-entropy discrimination. Jaakkola et al. 1999! MED with latent variables.! MedLDA. Zhu el al. JMLR 2012.! MED with nonparametric Bayesian inference.! M3F: Max-Margin Matrix Factorization. Xu el at. NIPS 2012.! Posterior inference remains a big challenge!"

8 Online Bayesian Passive-Aggressive Learning (BayesPA) "

9 Outline" General formulation! Online max-margin topic models! Experiment! Future work!

10 Outline" General formulation Online max-margin topic models! Experiment! Future work!

11 Online PA Algorithms" Update weight: w t+1 = min w w w t 2 s.t.:l ε (w; x t, y t ) = 0. Case I (Passive Update): l ε = 0 Online BayesPA Learning" Update distribu4on of weight: q t+1 (w) = min KL[q(w) q t (w)] E q [log p(x t w)] q F t s.t.:l ε (q(w); x t, y t ) = 0. Case I (Passive Update): l ε = 0 w t+1 = w t q t (w) q t+1 (w) feasible zone. Case II (Aggressive Update): l ε > 0 feasible zone. Case II (Aggressive Update): l ε = 0 q t+1 (w) w t feasible zone. q t (w) feasible zone.

12 Online PA Algorithms" Online BayesPA Learning" Soft-margin Constraints:" w t+1 = q t+1 (w) = Soft-margin Constraints:" min w w w t 2 + 2l ε (w; x t, y t ). Loss l ε " Hinge loss for classification." max(0,ε y t w x t ) Epsilon-insensitive loss for regression." max(0, w x t y t ε) Close-form update rule:" w t+1 = w t + y t τ t x t, min KL[q(w) q t (w)p(x t w)] q F t +2l ε (q(w); x t, y t ). We focus on classifiers for now." Averaging classifiers" Gibbs Classifiers, draw sample w ~ q(w) " and predict " τ t = min(c, l ε x t 2 ) where" (x) + = max(0, x)

13 Lemma 1. The expected hinge loss hinge loss! l ε Bayes l ε Gibbs l ε Gibbs l ε Bayes is an upper bound of the Proof. Straightforward: convexity of " (x) +

14 Lemma 2. If q 0 (w) = N (0, I), F t = P and we use averaging classifier, the non-likelihood BayesPA subsumes the online PA.! Non-likelihood! BayesPA:! min q F t KL[q(w) q t (w)] E q [log p(x t w)] s.t.:l ε (q(w); x t, y t ) = 0.

15 Proof Sketch. min KL[q(w) q t (w)]+ 2cmax(0,ε y t E q [w x t ]) q(w) P Conjugacy." (Zhu et al RegBayes) For feature function ψ and convex function g,! min KL[q(M) p(m,d)]+ g(e q [ψ (M)]) q(m) P = max log φ M Where the optimal solution is! p(m,d)exp( φ,ψ (M) ) g * ( φ) q(m) p(m,d)exp( φ *,ψ (M) )

16 min KL[q(w) q t (w)]+ 2cmax(0,ε y t E q [w x t ]) q(w) P = max logγ(τ ) I[0 τ c] τ Where " q * (w) = 1 Γ(τ ) q t (w)exp(τ (y t w x t ε)) Use induction, assume " q t (w) = N (w;µ t,σ 2 I) Initial Case" q 0 (w) = N (w;0,σ 2 I) So" q t+1 (w) exp( 1 2σ 2 w µ t 2 +τ (y t w x t ε)) Dual form:" Primal form:" min τ y t µ t x t 1 τ 2σ τ 2 x 2 t x t + ετ min µ µ µ t 2 2σ 2 + cmax(0,ε y t µ x t )

17 Lemma 3. If F t = P BayesPA is! and we use Gibbs classifier, the update rule of q t+1 (w) q t (w)p(x t w)e 2c(ε y tw x t ) + Prior" Likelihood" Pseudo-likelihood"

18 Extension: Learning with Mini-Batch" At 4me t, we have incoming batch B t where min q F t KL[q(w) q t (w)p(x t w t )]+ 2l ε (q(w); X t,y t ). X t = {x t } t Bt Y t = {x t } t Bt and l ε (q(w); X t,y t ) = l ε (q(w); x d, y d ) d B t

19 Extension: Learning with Latent Structures" Data" Classifier weight" w x 5 x 4 x 3 x 2 x 1 h 5 h 4 h 3 h 2 h 1 Latent Structure " H Model M

20 Extension: Learning with Latent Structures" Uncertainty in H t "" " "à Infer H together with M,w t via BayesPA rule." min q F t KL[q(w,M,H t ) q t (w,m)p(x t w,m,h t )]+ 2l ε (q(w,m,h t ); x t, y t ). " But how can we obtain" q t+1 "(w,m) " "?" Marginalize " q(w,m,h " " t )" " " " " " " " "à Intractable" Mean-Field Assumption! " " q(w,m,h t ) = q(w)q(m)q(h t ) Solve the objective and use " q t+1 (w,m) = q * (w)q * (M) "

21 Outline" General formulation! Online max-margin topic models" Experiment! Future work!

22 Batch MedLDA" n Graphical Interpretation" β φ k K α θ d z di x di n d y d D w v 2

23 For each topic k = 1, 2,, K" φ k ~ Dir(β) w k ~ N (w k ;0,v 2 ) For each document d = 1, 2,, D" θ d ~ Dir(α ) For each i-th word in document d" z di ~ Multi(θ d ) x di ~ Multi(Φ zdi ) Predict" f (w,z d ) = w z d where" z dk = 1 I[z di = k] n d i

24 Batch MedLDA" n Inference of LDA" Let" Φ = {φ k } K k=1,θ = {θ d } D d=1,z = {z d } D D d=1, X = {x d } d=1 LDA infers posterior " Or equivalently solves" p(φ,θ,z X) p 0 (Φ,Θ,Z)p(X Z,Φ) min KL[q(Φ,Θ,Z) p(φ,θ,z X)] q P

25 Batch MedLDA" n Inference of MedLDA" Inference problem:" min KL[q(Φ,Θ,Z) p(φ,θ,z X)]+ 2 l ε(q(w,z d ); x d, y d )} q P Define a prediction model:" Loss function:" f (w,z d ) = w z d where" z dk = 1 I[z di = k] n d Averaging loss: " l Avg ε (q(w,z d ); x d, y d ) = (ε y d E q [ f (w,z d )]) + Gibbs loss:" l Gibbs ε (q(w,z d ); x d, y d ) = E q [(ε y d f (w,z d )) + ] D d=1 i

26 Online MedLDA " Recall BayesPA with latent structures.! min q F t KL[q(w,M,H t ) q t (w,m)p(x t w,m,h t )]+ 2l ε (q(w,m,h t ); x t, y t ). In MedLDA, we have M = Φ,H t = (Θ t,z t )! But to reduce parameter space, we collapse out! Pr[Z d α ] = Pr[Z d θ d ]Pr[θ d α ]dθ d = D(α + C d ), d B t θd D(α ) So! M = Φ,H t = Z t Exact inference is hard!! " " "à Mean-field assumption! Θ t q(w,φ,z t ) = q(w)q(φ)q(z t )

27 ! Online MedLDA with Gibbs classifiers" By Lemma 3, the optimal solution has form! "where! q t+1 (w,φ,z t ) q t (w,φ)p 0 (Z t α )p(x t Z t,m)ψ (Y t w,z t ) ψ (Y t w,z t ) = ψ (Y d w,z d ) d B t ψ (y d w,z d ) = e 2c(ε y d w z d ) + Looks not friendly!"

28 Lemma: Scale of Mixture (Zhu el al. 2013) The pseudo-likelihood can be expressed as! ψ (y d w,z d ) = λd =0 1 exp( (λ + cξ d d )2 )dλ d 2πλ d 2λ d Let! ψ (Y t,λ t w,z t ) = ψ (Y d,λ d w,z d ) ψ (y d,λ d w,z d ) = d B t 1 exp( (λ + cξ d d )2 ) 2πλ d 2λ d So our posterior at round t can be expressed as! q t+1 (w,φ,z t ) q t (w,φ)p 0 (Z t α )p(x t Z t,φ)ψ (Y t,λ t w,z t ) Again, mean-field assumption:! q(w,φ,z t,λ t ) = q(w)q(φ)q(z t,λ t )

29 Fix q(z t,λ t ) Online Gibbs MedLDA: Global Update" with the mean-field assumption, obviously" If initially" Then we have the update rule"

30 Fix q(w,φ) Online Gibbs MedLDA:, we have! Local Update" 1 q(z t,λ t ) p 0 (Z t ) exp( 2πλ Λ E zdi,x q(φ,w) [ (λ + cξ d d )2 ]) di d i [n d ] 2λ d where! d B t * E q(φ) [log(φ zdi,x di )] = Ψ(Δ zdi,x di ) Ψ( Δ * z di,x ) x But hard to evaluate the expectation using the above formula.!! No close form! Z has a large number of combinations! Gibbs Sampling!"

31 Online Gibbs MedLDA: Gibbs Sampling" For! Z t For! λ t λ d 1 follows inverse Gaussian distribution:! 1 λ 1 d ~ IG(λ 1 d ; c ξ 2 d + z d Σ * z )

32

33 Nonparametric Extension" n MED Hierarchical Dirichlet Process (MedHDP)" Stick Breaking Process" β φ k k = 1,2,..., π k θ d z di x di k = 1,2,..., n d y d D w Gaussian Process"

34 Nonparametric Extension" n Stick Breaking Process" π 1 π 2 π 3. n Generate Topic Portion" π 1 π 2 π 3. θ d ~ Dir(απ )

35 MedHDP" Draw topic portion from, the rest is the same as LDA." " Inference " π min q P KL[q(w,π,Φ,Θ,Z) p(w,π,φ,θ,z X)]+ 2c l ε (q(w,z d ); x d, y d )) Loss function is almost the same as LDA, expect for prediction rule." f (w,z d ) = The term is essentially finite." k=1 w k z dk D d=1

36 ! Online Nonparametric MedLDA" Recall BayesPA with latent structures.! min q F t KL[q(w,M,H t ) q t (w,m)p(x t w,m,h t )]+ 2l ε (q(w,m,h t ); x t, y t ). In MedHDP, we! Θ Collapse out.! Introduce an auxiliary latent variable! "We can show! p(s d,z d π ) S(n d z dk,s dk )(απ k ) s dk Exact inference is hard à Mean-field assumption k=1 p(z d π ) = p(s d,z d π ) s d

37 !! Online Nonparametric MedLDA Global Update" For For Φ,w, same update rule as in online MedLDA.! π, by mean-field assumption,! If initially,! By induction,! q(π k ) = Beta(u k 0,v k 0 ) where the update rule is! u k * = u k t v k * = v k t + E q [s dk ] d B t + E q [ s dj ] d B t j>k

38 Online Nonparametric MedLDA Local Update" Fix global distribution,! Φ!q(Z t,s t ) = exp(e q(φ)q(π ) [log(p(x t Φ,Z t ) + log(z t,s t π )])!q(z t,λ t ) = exp(e q [logψ (Y t,λ t w,z t )]) But has infinite number of components! à Solution:! Borrow ideas from Wang & Blei NIPS 2012, approximate! π Z t,s t,λ t sample together with,! use the direct sampling scheme for HDP. Teh et al. HDP. JASA 2006.!!!

39 Online Nonparametric MedLDA Gibbs Sampling" For! Z t λ t For, the same as in online MedLDA.! For! π k a k = u k * + b k = v k * + d B t s dk d B t j>k s dj

40 Outline" General formulation! Online max-margin topic models! Experiment Future work!

41 Classification on 20NG" 20Newsgroup! 20 Categories of documents.! Training/testing split: 11269/7505.! Test online MedLDA (pamedlda)! " "and online MedHDP (pamedhdp).! Compare with! Batch counterparts.! Gibbs MedLDA. (Zhu et al. ICML 2013).! Topic model + SVM. Sparse Stochastic LDA (mimno et al. ICML 2012), truncation-free HDP (Wang & Blei NIPS 2012).!

42

43

44 Sensitivity with Batch Size"

45 Sensitivity with Iterations and Samples"

46 Multi-Task Classification" Extend our algorithm to multi-task learning.! Label 1 y d 1 x d Label 2 y d 2 Label 2 y d T Simply solve!

47 Multi-Task Classification" 1.1 M wikipedia dataset.! 20 kinds of label, not necessarily exclusive! Training/Testing split: 1.1 M / 5 K! F1 score:! 2 precision recall precision + recall

48 Future Work" Theoretical analysis of BayesPA.! Parallel asynchronous BayesPA learning.! BayesPA learning for regression problems.!

49 Reference"

50

51 Thank you."

Online Bayesian Passive-Aggressive Learning

Online Bayesian Passive-Aggressive Learning Online Bayesian Passive-Aggressive Learning Full Journal Version: http://qr.net/b1rd Tianlin Shi Jun Zhu ICML 2014 T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 1 / 35 Outline Introduction Motivation Framework

More information

Online Bayesian Passive-Agressive Learning

Online Bayesian Passive-Agressive Learning Online Bayesian Passive-Agressive Learning International Conference on Machine Learning, 2014 Tianlin Shi Jun Zhu Tsinghua University, China 21 August 2015 Presented by: Kyle Ulrich Introduction Online

More information

Online Bayesian Passive-Aggressive Learning

Online Bayesian Passive-Aggressive Learning Journal of Machine Learning Research 1 2014 1-48 Submitted 4/00; Published 10/00 Online Bayesian Passive-Aggressive Learning Tianlin Shi Institute for Interdisciplinary Information Sciences Tsinghua University

More information

Sparse Stochastic Inference for Latent Dirichlet Allocation

Sparse Stochastic Inference for Latent Dirichlet Allocation Sparse Stochastic Inference for Latent Dirichlet Allocation David Mimno 1, Matthew D. Hoffman 2, David M. Blei 1 1 Dept. of Computer Science, Princeton U. 2 Dept. of Statistics, Columbia U. Presentation

More information

Classical Predictive Models

Classical Predictive Models Laplace Max-margin Markov Networks Recent Advances in Learning SPARSE Structured I/O Models: models, algorithms, and applications Eric Xing epxing@cs.cmu.edu Machine Learning Dept./Language Technology

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Streaming Variational Bayes

Streaming Variational Bayes Streaming Variational Bayes Tamara Broderick, Nicholas Boyd, Andre Wibisono, Ashia C. Wilson, Michael I. Jordan UC Berkeley Discussion led by Miao Liu September 13, 2013 Introduction The SDA-Bayes Framework

More information

A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank

A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank Shoaib Jameel Shoaib Jameel 1, Wai Lam 2, Steven Schockaert 1, and Lidong Bing 3 1 School of Computer Science and Informatics,

More information

Using Both Latent and Supervised Shared Topics for Multitask Learning

Using Both Latent and Supervised Shared Topics for Multitask Learning Using Both Latent and Supervised Shared Topics for Multitask Learning Ayan Acharya, Aditya Rawal, Raymond J. Mooney, Eduardo R. Hruschka UT Austin, Dept. of ECE September 21, 2013 Problem Definition An

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

29 : Posterior Regularization

29 : Posterior Regularization 10-708: Probabilistic Graphical Models 10-708, Spring 2014 29 : Posterior Regularization Lecturer: Eric P. Xing Scribes: Felix Juefei Xu, Abhishek Chugh 1 Introduction This is the last lecture which tends

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava nitish@cs.toronto.edu Ruslan Salahutdinov rsalahu@cs.toronto.edu Geoffrey Hinton hinton@cs.toronto.edu

More information

Gibbs Sampling. Héctor Corrada Bravo. University of Maryland, College Park, USA CMSC 644:

Gibbs Sampling. Héctor Corrada Bravo. University of Maryland, College Park, USA CMSC 644: Gibbs Sampling Héctor Corrada Bravo University of Maryland, College Park, USA CMSC 644: 2019 03 27 Latent semantic analysis Documents as mixtures of topics (Hoffman 1999) 1 / 60 Latent semantic analysis

More information

Gaussian Mixture Model

Gaussian Mixture Model Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Max-margin learning of GM Eric Xing Lecture 28, Apr 28, 2014 b r a c e Reading: 1 Classical Predictive Models Input and output space: Predictive

More information

Small-variance Asymptotics for Dirichlet Process Mixtures of SVMs

Small-variance Asymptotics for Dirichlet Process Mixtures of SVMs Small-variance Asymptotics for Dirichlet Process Mixtures of SVMs Yining Wang Jun Zhu Tsinghua University July, 2014 Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, 2014 1 / 25 Outline

More information

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Collapsed Variational Inference for HDP

Collapsed Variational Inference for HDP Collapse Variational Inference for HDP Yee W. Teh Davi Newman an Max Welling Publishe on NIPS 2007 Discussion le by Iulian Pruteanu Outline Introuction Hierarchical Bayesian moel for LDA Collapse VB inference

More information

Collapsed Variational Dirichlet Process Mixture Models

Collapsed Variational Dirichlet Process Mixture Models Collapsed Variational Dirichlet Process Mixture Models Kenichi Kurihara Dept. of Computer Science Tokyo Institute of Technology, Japan kurihara@mi.cs.titech.ac.jp Max Welling Dept. of Computer Science

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

CS Lecture 18. Topic Models and LDA

CS Lecture 18. Topic Models and LDA CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same

More information

Faster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions

Faster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions Faster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions Mohammad Emtiyaz Khan, Reza Babanezhad, Wu Lin, Mark Schmidt, Masashi Sugiyama Conference on Uncertainty

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Collapsed Variational Inference for Sum-Product Networks

Collapsed Variational Inference for Sum-Product Networks for Sum-Product Networks Han Zhao 1, Tameem Adel 2, Geoff Gordon 1, Brandon Amos 1 Presented by: Han Zhao Carnegie Mellon University 1, University of Amsterdam 2 June. 20th, 2016 1 / 26 Outline Background

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

Bayesian Nonparametrics for Speech and Signal Processing

Bayesian Nonparametrics for Speech and Signal Processing Bayesian Nonparametrics for Speech and Signal Processing Michael I. Jordan University of California, Berkeley June 28, 2011 Acknowledgments: Emily Fox, Erik Sudderth, Yee Whye Teh, and Romain Thibaux Computer

More information

Interpretable Latent Variable Models

Interpretable Latent Variable Models Interpretable Latent Variable Models Fernando Perez-Cruz Bell Labs (Nokia) Department of Signal Theory and Communications, University Carlos III in Madrid 1 / 24 Outline 1 Introduction to Machine Learning

More information

Variational Autoencoders

Variational Autoencoders Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly

More information

Introduction to Bayesian inference

Introduction to Bayesian inference Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Latent variable models for discrete data

Latent variable models for discrete data Latent variable models for discrete data Jianfei Chen Department of Computer Science and Technology Tsinghua University, Beijing 100084 chris.jianfei.chen@gmail.com Janurary 13, 2014 Murphy, Kevin P. Machine

More information

arxiv: v1 [stat.ml] 5 Dec 2016

arxiv: v1 [stat.ml] 5 Dec 2016 A Nonparametric Latent Factor Model For Location-Aware Video Recommendations arxiv:1612.01481v1 [stat.ml] 5 Dec 2016 Ehtsham Elahi Algorithms Engineering Netflix, Inc. Los Gatos, CA 95032 eelahi@netflix.com

More information

Graphical Models for Query-driven Analysis of Multimodal Data

Graphical Models for Query-driven Analysis of Multimodal Data Graphical Models for Query-driven Analysis of Multimodal Data John Fisher Sensing, Learning, & Inference Group Computer Science & Artificial Intelligence Laboratory Massachusetts Institute of Technology

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Two Useful Bounds for Variational Inference

Two Useful Bounds for Variational Inference Two Useful Bounds for Variational Inference John Paisley Department of Computer Science Princeton University, Princeton, NJ jpaisley@princeton.edu Abstract We review and derive two lower bounds on the

More information

Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems

Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems Scott W. Linderman Matthew J. Johnson Andrew C. Miller Columbia University Harvard and Google Brain Harvard University Ryan

More information

Image segmentation combining Markov Random Fields and Dirichlet Processes

Image segmentation combining Markov Random Fields and Dirichlet Processes Image segmentation combining Markov Random Fields and Dirichlet Processes Jessica SODJO IMS, Groupe Signal Image, Talence Encadrants : A. Giremus, J.-F. Giovannelli, F. Caron, N. Dobigeon Jessica SODJO

More information

Evaluation Methods for Topic Models

Evaluation Methods for Topic Models University of Massachusetts Amherst wallach@cs.umass.edu April 13, 2009 Joint work with Iain Murray, Ruslan Salakhutdinov and David Mimno Statistical Topic Models Useful for analyzing large, unstructured

More information

Latent Dirichlet Bayesian Co-Clustering

Latent Dirichlet Bayesian Co-Clustering Latent Dirichlet Bayesian Co-Clustering Pu Wang 1, Carlotta Domeniconi 1, and athryn Blackmond Laskey 1 Department of Computer Science Department of Systems Engineering and Operations Research George Mason

More information

Stochastic Variational Inference for the HDP-HMM

Stochastic Variational Inference for the HDP-HMM Stochastic Variational Inference for the HDP-HMM Aonan Zhang San Gultekin John Paisley Department of Electrical Engineering & Data Science Institute Columbia University, New York, NY Abstract We derive

More information

Probabilistic Time Series Classification

Probabilistic Time Series Classification Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign

More information

Dirichlet Enhanced Latent Semantic Analysis

Dirichlet Enhanced Latent Semantic Analysis Dirichlet Enhanced Latent Semantic Analysis Kai Yu Siemens Corporate Technology D-81730 Munich, Germany Kai.Yu@siemens.com Shipeng Yu Institute for Computer Science University of Munich D-80538 Munich,

More information

Truncation-free Stochastic Variational Inference for Bayesian Nonparametric Models

Truncation-free Stochastic Variational Inference for Bayesian Nonparametric Models Truncation-free Stochastic Variational Inference for Bayesian Nonparametric Models Chong Wang Machine Learning Department Carnegie Mellon University chongw@cs.cmu.edu David M. Blei Computer Science Department

More information

Non-parametric Clustering with Dirichlet Processes

Non-parametric Clustering with Dirichlet Processes Non-parametric Clustering with Dirichlet Processes Timothy Burns SUNY at Buffalo Mar. 31 2009 T. Burns (SUNY at Buffalo) Non-parametric Clustering with Dirichlet Processes Mar. 31 2009 1 / 24 Introduction

More information

Latent Dirichlet Allocation Introduction/Overview

Latent Dirichlet Allocation Introduction/Overview Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models

More information

Spatial Bayesian Nonparametrics for Natural Image Segmentation

Spatial Bayesian Nonparametrics for Natural Image Segmentation Spatial Bayesian Nonparametrics for Natural Image Segmentation Erik Sudderth Brown University Joint work with Michael Jordan University of California Soumya Ghosh Brown University Parsing Visual Scenes

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Large-scale Ordinal Collaborative Filtering

Large-scale Ordinal Collaborative Filtering Large-scale Ordinal Collaborative Filtering Ulrich Paquet, Blaise Thomson, and Ole Winther Microsoft Research Cambridge, University of Cambridge, Technical University of Denmark ulripa@microsoft.com,brmt2@cam.ac.uk,owi@imm.dtu.dk

More information

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University 2 Variational Bayes? Bayes Bayes Theorem But the integral is intractable! Sampling Gibbs, Metropolis

More information

Improving Topic Models with Latent Feature Word Representations

Improving Topic Models with Latent Feature Word Representations Improving Topic Models with Latent Feature Word Representations Dat Quoc Nguyen Joint work with Richard Billingsley, Lan Du and Mark Johnson Department of Computing Macquarie University Sydney, Australia

More information

Variational Inference via Stochastic Backpropagation

Variational Inference via Stochastic Backpropagation Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Auto-Encoding Variational Bayes

Auto-Encoding Variational Bayes Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Monte Carlo Methods for Maximum Margin Supervised Topic Models

Monte Carlo Methods for Maximum Margin Supervised Topic Models Monte Carlo Methods for Maximum Margin Supervised Topic Models Qixia Jiang, Jun Zhu, Maosong Sun, and Eric P. Xing Department of Computer Science & Technology, Tsinghua National TNList Lab, State Key Lab

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee Whye Teh Gatsby Computational Neuroscience Unit University College London 17 Queen Square, London WC1N 3AR, UK ywteh@gatsby.ucl.ac.uk

More information

Deep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016

Deep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016 Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is Variational Inference? What is Variational Inference? Want to estimate some distribution, p*(x) p*(x) What is

More information

Latent Dirichlet Alloca/on

Latent Dirichlet Alloca/on Latent Dirichlet Alloca/on Blei, Ng and Jordan ( 2002 ) Presented by Deepak Santhanam What is Latent Dirichlet Alloca/on? Genera/ve Model for collec/ons of discrete data Data generated by parameters which

More information

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families Midterm Review Topics we covered Machine Learning Optimization Basics of optimization Convexity Unconstrained: GD, SGD Constrained: Lagrange, KKT Duality Linear Methods Perceptrons Support Vector Machines

More information

16 : Approximate Inference: Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Spatial Normalized Gamma Process

Spatial Normalized Gamma Process Spatial Normalized Gamma Process Vinayak Rao Yee Whye Teh Presented at NIPS 2009 Discussion and Slides by Eric Wang June 23, 2010 Outline Introduction Motivation The Gamma Process Spatial Normalized Gamma

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up Much of this material is adapted from Blei 2003. Many of the images were taken from the Internet February 20, 2014 Suppose we have a large number of books. Each is about several unknown topics. How can

More information

The supervised hierarchical Dirichlet process

The supervised hierarchical Dirichlet process 1 The supervised hierarchical Dirichlet process Andrew M. Dai and Amos J. Storkey Abstract We propose the supervised hierarchical Dirichlet process (shdp), a nonparametric generative model for the joint

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina Indirect Rule Learning: Support Vector Machines Indirect learning: loss optimization It doesn t estimate the prediction rule f (x) directly, since most loss functions do not have explicit optimizers. Indirection

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Discriminative Training of Mixed Membership Models

Discriminative Training of Mixed Membership Models 18 Discriminative Training of Mixed Membership Models Jun Zhu Department of Computer Science and Technology, State Key Laboratory of Intelligent Technology and Systems; Tsinghua National Laboratory for

More information

Collapsed Variational Bayesian Inference for Hidden Markov Models

Collapsed Variational Bayesian Inference for Hidden Markov Models Collapsed Variational Bayesian Inference for Hidden Markov Models Pengyu Wang, Phil Blunsom Department of Computer Science, University of Oxford International Conference on Artificial Intelligence and

More information

Probabilistic Graphical Models for Image Analysis - Lecture 4

Probabilistic Graphical Models for Image Analysis - Lecture 4 Probabilistic Graphical Models for Image Analysis - Lecture 4 Stefan Bauer 12 October 2018 Max Planck ETH Center for Learning Systems Overview 1. Repetition 2. α-divergence 3. Variational Inference 4.

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

Infinite Latent SVM for Classification and Multi-task Learning

Infinite Latent SVM for Classification and Multi-task Learning Infinite Latent SVM for Classification and Multi-task Learning Jun Zhu, Ning Chen, and Eric P. Xing Dept. of Computer Science & Tech., TNList Lab, Tsinghua University, Beijing 100084, China Machine Learning

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu October 19, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures 17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter

More information

Online Passive-Aggressive Algorithms

Online Passive-Aggressive Algorithms Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic

More information

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts ICML 2015 Scalable Nonparametric Bayesian Inference on Point Processes with Gaussian Processes Machine Learning Research Group and Oxford-Man Institute University of Oxford July 8, 2015 Point Processes

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

Lecture 3a: Dirichlet processes

Lecture 3a: Dirichlet processes Lecture 3a: Dirichlet processes Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced Topics

More information

CMPS 242: Project Report

CMPS 242: Project Report CMPS 242: Project Report RadhaKrishna Vuppala Univ. of California, Santa Cruz vrk@soe.ucsc.edu Abstract The classification procedures impose certain models on the data and when the assumption match the

More information

Document and Topic Models: plsa and LDA

Document and Topic Models: plsa and LDA Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis

More information

Latent Dirichlet Allocation Based Multi-Document Summarization

Latent Dirichlet Allocation Based Multi-Document Summarization Latent Dirichlet Allocation Based Multi-Document Summarization Rachit Arora Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai - 600 036, India. rachitar@cse.iitm.ernet.in

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017

39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017 Permuted and IROM Department, McCombs School of Business The University of Texas at Austin 39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017 1 / 36 Joint work

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Scaling Neighbourhood Methods

Scaling Neighbourhood Methods Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)

More information

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information