Online Bayesian Passive-Aggressive Learning"
|
|
- Frank Beasley
- 5 years ago
- Views:
Transcription
1 Online Bayesian Passive-Aggressive Learning" Tianlin Shi! Jun Zhu!
2 The BIG DATA challenge" Large amounts of data.! Big data:!! Big Science: 25 PB annual data.! Streaming data.! Image Courtesy: h0p://robo4c- rodents.com/ Complex data: text, images, genomic, etc.!
3 Online Learning" n Batch Learning" Data Learning Algorithm" Loss "L Model 1. Data may come in as a stream. 2. We don t have memory/4me to compute it! There is redundancy in data.
4 Online Learning" n Online Learning" Data instantaneous Loss Online" Learning Algorithm" Loss "L Predic4on (Supervised Case) Model Update
5 ! Online Passive-Aggressive Learning Crammer et al [1]" Online update of a large-margin classifier.! SVM weight! w {xt } t 1 {y t } t 1 From sequential data and labels.! Close-form update rule!! Drawback:!! 1. Limited model complexity.! 2. Single estimate of the model.!
6 Bayesian models" Flexibility! Can be non-parametric! e.g. Infinite number of components in a topic model.! " " " " " " " "Teh et al. HDP. JASA 2006.! Posterior inference is challenging!! Both VB and MCMC can be expensive in big data.! Attempts to speed up the inference:! Online LDA. Hoffman et al. NIPS 2010! Online Sparse Stochastic Inference. Mimno et al. ICML 2012! Stochastic Gradient Fisher Score. Ahn et al. ICML 2012.! Typically are lack of discriminative ability.!
7 Max-Margin Bayesian Models" MED: Max-entropy discrimination. Jaakkola et al. 1999! MED with latent variables.! MedLDA. Zhu el al. JMLR 2012.! MED with nonparametric Bayesian inference.! M3F: Max-Margin Matrix Factorization. Xu el at. NIPS 2012.! Posterior inference remains a big challenge!"
8 Online Bayesian Passive-Aggressive Learning (BayesPA) "
9 Outline" General formulation! Online max-margin topic models! Experiment! Future work!
10 Outline" General formulation Online max-margin topic models! Experiment! Future work!
11 Online PA Algorithms" Update weight: w t+1 = min w w w t 2 s.t.:l ε (w; x t, y t ) = 0. Case I (Passive Update): l ε = 0 Online BayesPA Learning" Update distribu4on of weight: q t+1 (w) = min KL[q(w) q t (w)] E q [log p(x t w)] q F t s.t.:l ε (q(w); x t, y t ) = 0. Case I (Passive Update): l ε = 0 w t+1 = w t q t (w) q t+1 (w) feasible zone. Case II (Aggressive Update): l ε > 0 feasible zone. Case II (Aggressive Update): l ε = 0 q t+1 (w) w t feasible zone. q t (w) feasible zone.
12 Online PA Algorithms" Online BayesPA Learning" Soft-margin Constraints:" w t+1 = q t+1 (w) = Soft-margin Constraints:" min w w w t 2 + 2l ε (w; x t, y t ). Loss l ε " Hinge loss for classification." max(0,ε y t w x t ) Epsilon-insensitive loss for regression." max(0, w x t y t ε) Close-form update rule:" w t+1 = w t + y t τ t x t, min KL[q(w) q t (w)p(x t w)] q F t +2l ε (q(w); x t, y t ). We focus on classifiers for now." Averaging classifiers" Gibbs Classifiers, draw sample w ~ q(w) " and predict " τ t = min(c, l ε x t 2 ) where" (x) + = max(0, x)
13 Lemma 1. The expected hinge loss hinge loss! l ε Bayes l ε Gibbs l ε Gibbs l ε Bayes is an upper bound of the Proof. Straightforward: convexity of " (x) +
14 Lemma 2. If q 0 (w) = N (0, I), F t = P and we use averaging classifier, the non-likelihood BayesPA subsumes the online PA.! Non-likelihood! BayesPA:! min q F t KL[q(w) q t (w)] E q [log p(x t w)] s.t.:l ε (q(w); x t, y t ) = 0.
15 Proof Sketch. min KL[q(w) q t (w)]+ 2cmax(0,ε y t E q [w x t ]) q(w) P Conjugacy." (Zhu et al RegBayes) For feature function ψ and convex function g,! min KL[q(M) p(m,d)]+ g(e q [ψ (M)]) q(m) P = max log φ M Where the optimal solution is! p(m,d)exp( φ,ψ (M) ) g * ( φ) q(m) p(m,d)exp( φ *,ψ (M) )
16 min KL[q(w) q t (w)]+ 2cmax(0,ε y t E q [w x t ]) q(w) P = max logγ(τ ) I[0 τ c] τ Where " q * (w) = 1 Γ(τ ) q t (w)exp(τ (y t w x t ε)) Use induction, assume " q t (w) = N (w;µ t,σ 2 I) Initial Case" q 0 (w) = N (w;0,σ 2 I) So" q t+1 (w) exp( 1 2σ 2 w µ t 2 +τ (y t w x t ε)) Dual form:" Primal form:" min τ y t µ t x t 1 τ 2σ τ 2 x 2 t x t + ετ min µ µ µ t 2 2σ 2 + cmax(0,ε y t µ x t )
17 Lemma 3. If F t = P BayesPA is! and we use Gibbs classifier, the update rule of q t+1 (w) q t (w)p(x t w)e 2c(ε y tw x t ) + Prior" Likelihood" Pseudo-likelihood"
18 Extension: Learning with Mini-Batch" At 4me t, we have incoming batch B t where min q F t KL[q(w) q t (w)p(x t w t )]+ 2l ε (q(w); X t,y t ). X t = {x t } t Bt Y t = {x t } t Bt and l ε (q(w); X t,y t ) = l ε (q(w); x d, y d ) d B t
19 Extension: Learning with Latent Structures" Data" Classifier weight" w x 5 x 4 x 3 x 2 x 1 h 5 h 4 h 3 h 2 h 1 Latent Structure " H Model M
20 Extension: Learning with Latent Structures" Uncertainty in H t "" " "à Infer H together with M,w t via BayesPA rule." min q F t KL[q(w,M,H t ) q t (w,m)p(x t w,m,h t )]+ 2l ε (q(w,m,h t ); x t, y t ). " But how can we obtain" q t+1 "(w,m) " "?" Marginalize " q(w,m,h " " t )" " " " " " " " "à Intractable" Mean-Field Assumption! " " q(w,m,h t ) = q(w)q(m)q(h t ) Solve the objective and use " q t+1 (w,m) = q * (w)q * (M) "
21 Outline" General formulation! Online max-margin topic models" Experiment! Future work!
22 Batch MedLDA" n Graphical Interpretation" β φ k K α θ d z di x di n d y d D w v 2
23 For each topic k = 1, 2,, K" φ k ~ Dir(β) w k ~ N (w k ;0,v 2 ) For each document d = 1, 2,, D" θ d ~ Dir(α ) For each i-th word in document d" z di ~ Multi(θ d ) x di ~ Multi(Φ zdi ) Predict" f (w,z d ) = w z d where" z dk = 1 I[z di = k] n d i
24 Batch MedLDA" n Inference of LDA" Let" Φ = {φ k } K k=1,θ = {θ d } D d=1,z = {z d } D D d=1, X = {x d } d=1 LDA infers posterior " Or equivalently solves" p(φ,θ,z X) p 0 (Φ,Θ,Z)p(X Z,Φ) min KL[q(Φ,Θ,Z) p(φ,θ,z X)] q P
25 Batch MedLDA" n Inference of MedLDA" Inference problem:" min KL[q(Φ,Θ,Z) p(φ,θ,z X)]+ 2 l ε(q(w,z d ); x d, y d )} q P Define a prediction model:" Loss function:" f (w,z d ) = w z d where" z dk = 1 I[z di = k] n d Averaging loss: " l Avg ε (q(w,z d ); x d, y d ) = (ε y d E q [ f (w,z d )]) + Gibbs loss:" l Gibbs ε (q(w,z d ); x d, y d ) = E q [(ε y d f (w,z d )) + ] D d=1 i
26 Online MedLDA " Recall BayesPA with latent structures.! min q F t KL[q(w,M,H t ) q t (w,m)p(x t w,m,h t )]+ 2l ε (q(w,m,h t ); x t, y t ). In MedLDA, we have M = Φ,H t = (Θ t,z t )! But to reduce parameter space, we collapse out! Pr[Z d α ] = Pr[Z d θ d ]Pr[θ d α ]dθ d = D(α + C d ), d B t θd D(α ) So! M = Φ,H t = Z t Exact inference is hard!! " " "à Mean-field assumption! Θ t q(w,φ,z t ) = q(w)q(φ)q(z t )
27 ! Online MedLDA with Gibbs classifiers" By Lemma 3, the optimal solution has form! "where! q t+1 (w,φ,z t ) q t (w,φ)p 0 (Z t α )p(x t Z t,m)ψ (Y t w,z t ) ψ (Y t w,z t ) = ψ (Y d w,z d ) d B t ψ (y d w,z d ) = e 2c(ε y d w z d ) + Looks not friendly!"
28 Lemma: Scale of Mixture (Zhu el al. 2013) The pseudo-likelihood can be expressed as! ψ (y d w,z d ) = λd =0 1 exp( (λ + cξ d d )2 )dλ d 2πλ d 2λ d Let! ψ (Y t,λ t w,z t ) = ψ (Y d,λ d w,z d ) ψ (y d,λ d w,z d ) = d B t 1 exp( (λ + cξ d d )2 ) 2πλ d 2λ d So our posterior at round t can be expressed as! q t+1 (w,φ,z t ) q t (w,φ)p 0 (Z t α )p(x t Z t,φ)ψ (Y t,λ t w,z t ) Again, mean-field assumption:! q(w,φ,z t,λ t ) = q(w)q(φ)q(z t,λ t )
29 Fix q(z t,λ t ) Online Gibbs MedLDA: Global Update" with the mean-field assumption, obviously" If initially" Then we have the update rule"
30 Fix q(w,φ) Online Gibbs MedLDA:, we have! Local Update" 1 q(z t,λ t ) p 0 (Z t ) exp( 2πλ Λ E zdi,x q(φ,w) [ (λ + cξ d d )2 ]) di d i [n d ] 2λ d where! d B t * E q(φ) [log(φ zdi,x di )] = Ψ(Δ zdi,x di ) Ψ( Δ * z di,x ) x But hard to evaluate the expectation using the above formula.!! No close form! Z has a large number of combinations! Gibbs Sampling!"
31 Online Gibbs MedLDA: Gibbs Sampling" For! Z t For! λ t λ d 1 follows inverse Gaussian distribution:! 1 λ 1 d ~ IG(λ 1 d ; c ξ 2 d + z d Σ * z )
32
33 Nonparametric Extension" n MED Hierarchical Dirichlet Process (MedHDP)" Stick Breaking Process" β φ k k = 1,2,..., π k θ d z di x di k = 1,2,..., n d y d D w Gaussian Process"
34 Nonparametric Extension" n Stick Breaking Process" π 1 π 2 π 3. n Generate Topic Portion" π 1 π 2 π 3. θ d ~ Dir(απ )
35 MedHDP" Draw topic portion from, the rest is the same as LDA." " Inference " π min q P KL[q(w,π,Φ,Θ,Z) p(w,π,φ,θ,z X)]+ 2c l ε (q(w,z d ); x d, y d )) Loss function is almost the same as LDA, expect for prediction rule." f (w,z d ) = The term is essentially finite." k=1 w k z dk D d=1
36 ! Online Nonparametric MedLDA" Recall BayesPA with latent structures.! min q F t KL[q(w,M,H t ) q t (w,m)p(x t w,m,h t )]+ 2l ε (q(w,m,h t ); x t, y t ). In MedHDP, we! Θ Collapse out.! Introduce an auxiliary latent variable! "We can show! p(s d,z d π ) S(n d z dk,s dk )(απ k ) s dk Exact inference is hard à Mean-field assumption k=1 p(z d π ) = p(s d,z d π ) s d
37 !! Online Nonparametric MedLDA Global Update" For For Φ,w, same update rule as in online MedLDA.! π, by mean-field assumption,! If initially,! By induction,! q(π k ) = Beta(u k 0,v k 0 ) where the update rule is! u k * = u k t v k * = v k t + E q [s dk ] d B t + E q [ s dj ] d B t j>k
38 Online Nonparametric MedLDA Local Update" Fix global distribution,! Φ!q(Z t,s t ) = exp(e q(φ)q(π ) [log(p(x t Φ,Z t ) + log(z t,s t π )])!q(z t,λ t ) = exp(e q [logψ (Y t,λ t w,z t )]) But has infinite number of components! à Solution:! Borrow ideas from Wang & Blei NIPS 2012, approximate! π Z t,s t,λ t sample together with,! use the direct sampling scheme for HDP. Teh et al. HDP. JASA 2006.!!!
39 Online Nonparametric MedLDA Gibbs Sampling" For! Z t λ t For, the same as in online MedLDA.! For! π k a k = u k * + b k = v k * + d B t s dk d B t j>k s dj
40 Outline" General formulation! Online max-margin topic models! Experiment Future work!
41 Classification on 20NG" 20Newsgroup! 20 Categories of documents.! Training/testing split: 11269/7505.! Test online MedLDA (pamedlda)! " "and online MedHDP (pamedhdp).! Compare with! Batch counterparts.! Gibbs MedLDA. (Zhu et al. ICML 2013).! Topic model + SVM. Sparse Stochastic LDA (mimno et al. ICML 2012), truncation-free HDP (Wang & Blei NIPS 2012).!
42
43
44 Sensitivity with Batch Size"
45 Sensitivity with Iterations and Samples"
46 Multi-Task Classification" Extend our algorithm to multi-task learning.! Label 1 y d 1 x d Label 2 y d 2 Label 2 y d T Simply solve!
47 Multi-Task Classification" 1.1 M wikipedia dataset.! 20 kinds of label, not necessarily exclusive! Training/Testing split: 1.1 M / 5 K! F1 score:! 2 precision recall precision + recall
48 Future Work" Theoretical analysis of BayesPA.! Parallel asynchronous BayesPA learning.! BayesPA learning for regression problems.!
49 Reference"
50
51 Thank you."
Online Bayesian Passive-Aggressive Learning
Online Bayesian Passive-Aggressive Learning Full Journal Version: http://qr.net/b1rd Tianlin Shi Jun Zhu ICML 2014 T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 1 / 35 Outline Introduction Motivation Framework
More informationOnline Bayesian Passive-Agressive Learning
Online Bayesian Passive-Agressive Learning International Conference on Machine Learning, 2014 Tianlin Shi Jun Zhu Tsinghua University, China 21 August 2015 Presented by: Kyle Ulrich Introduction Online
More informationOnline Bayesian Passive-Aggressive Learning
Journal of Machine Learning Research 1 2014 1-48 Submitted 4/00; Published 10/00 Online Bayesian Passive-Aggressive Learning Tianlin Shi Institute for Interdisciplinary Information Sciences Tsinghua University
More informationSparse Stochastic Inference for Latent Dirichlet Allocation
Sparse Stochastic Inference for Latent Dirichlet Allocation David Mimno 1, Matthew D. Hoffman 2, David M. Blei 1 1 Dept. of Computer Science, Princeton U. 2 Dept. of Statistics, Columbia U. Presentation
More informationClassical Predictive Models
Laplace Max-margin Markov Networks Recent Advances in Learning SPARSE Structured I/O Models: models, algorithms, and applications Eric Xing epxing@cs.cmu.edu Machine Learning Dept./Language Technology
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationStreaming Variational Bayes
Streaming Variational Bayes Tamara Broderick, Nicholas Boyd, Andre Wibisono, Ashia C. Wilson, Michael I. Jordan UC Berkeley Discussion led by Miao Liu September 13, 2013 Introduction The SDA-Bayes Framework
More informationA Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank
A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank Shoaib Jameel Shoaib Jameel 1, Wai Lam 2, Steven Schockaert 1, and Lidong Bing 3 1 School of Computer Science and Informatics,
More informationUsing Both Latent and Supervised Shared Topics for Multitask Learning
Using Both Latent and Supervised Shared Topics for Multitask Learning Ayan Acharya, Aditya Rawal, Raymond J. Mooney, Eduardo R. Hruschka UT Austin, Dept. of ECE September 21, 2013 Problem Definition An
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More information29 : Posterior Regularization
10-708: Probabilistic Graphical Models 10-708, Spring 2014 29 : Posterior Regularization Lecturer: Eric P. Xing Scribes: Felix Juefei Xu, Abhishek Chugh 1 Introduction This is the last lecture which tends
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationFast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine
Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava nitish@cs.toronto.edu Ruslan Salahutdinov rsalahu@cs.toronto.edu Geoffrey Hinton hinton@cs.toronto.edu
More informationGibbs Sampling. Héctor Corrada Bravo. University of Maryland, College Park, USA CMSC 644:
Gibbs Sampling Héctor Corrada Bravo University of Maryland, College Park, USA CMSC 644: 2019 03 27 Latent semantic analysis Documents as mixtures of topics (Hoffman 1999) 1 / 60 Latent semantic analysis
More informationGaussian Mixture Model
Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Max-margin learning of GM Eric Xing Lecture 28, Apr 28, 2014 b r a c e Reading: 1 Classical Predictive Models Input and output space: Predictive
More informationSmall-variance Asymptotics for Dirichlet Process Mixtures of SVMs
Small-variance Asymptotics for Dirichlet Process Mixtures of SVMs Yining Wang Jun Zhu Tsinghua University July, 2014 Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, 2014 1 / 25 Outline
More informationDeep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationCollapsed Variational Inference for HDP
Collapse Variational Inference for HDP Yee W. Teh Davi Newman an Max Welling Publishe on NIPS 2007 Discussion le by Iulian Pruteanu Outline Introuction Hierarchical Bayesian moel for LDA Collapse VB inference
More informationCollapsed Variational Dirichlet Process Mixture Models
Collapsed Variational Dirichlet Process Mixture Models Kenichi Kurihara Dept. of Computer Science Tokyo Institute of Technology, Japan kurihara@mi.cs.titech.ac.jp Max Welling Dept. of Computer Science
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationCS Lecture 18. Topic Models and LDA
CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same
More informationFaster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions
Faster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions Mohammad Emtiyaz Khan, Reza Babanezhad, Wu Lin, Mark Schmidt, Masashi Sugiyama Conference on Uncertainty
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationCollapsed Variational Inference for Sum-Product Networks
for Sum-Product Networks Han Zhao 1, Tameem Adel 2, Geoff Gordon 1, Brandon Amos 1 Presented by: Han Zhao Carnegie Mellon University 1, University of Amsterdam 2 June. 20th, 2016 1 / 26 Outline Background
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationBayesian Nonparametrics for Speech and Signal Processing
Bayesian Nonparametrics for Speech and Signal Processing Michael I. Jordan University of California, Berkeley June 28, 2011 Acknowledgments: Emily Fox, Erik Sudderth, Yee Whye Teh, and Romain Thibaux Computer
More informationInterpretable Latent Variable Models
Interpretable Latent Variable Models Fernando Perez-Cruz Bell Labs (Nokia) Department of Signal Theory and Communications, University Carlos III in Madrid 1 / 24 Outline 1 Introduction to Machine Learning
More informationVariational Autoencoders
Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly
More informationIntroduction to Bayesian inference
Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationLatent variable models for discrete data
Latent variable models for discrete data Jianfei Chen Department of Computer Science and Technology Tsinghua University, Beijing 100084 chris.jianfei.chen@gmail.com Janurary 13, 2014 Murphy, Kevin P. Machine
More informationarxiv: v1 [stat.ml] 5 Dec 2016
A Nonparametric Latent Factor Model For Location-Aware Video Recommendations arxiv:1612.01481v1 [stat.ml] 5 Dec 2016 Ehtsham Elahi Algorithms Engineering Netflix, Inc. Los Gatos, CA 95032 eelahi@netflix.com
More informationGraphical Models for Query-driven Analysis of Multimodal Data
Graphical Models for Query-driven Analysis of Multimodal Data John Fisher Sensing, Learning, & Inference Group Computer Science & Artificial Intelligence Laboratory Massachusetts Institute of Technology
More informationPart 1: Expectation Propagation
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud
More informationTwo Useful Bounds for Variational Inference
Two Useful Bounds for Variational Inference John Paisley Department of Computer Science Princeton University, Princeton, NJ jpaisley@princeton.edu Abstract We review and derive two lower bounds on the
More informationBayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems
Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems Scott W. Linderman Matthew J. Johnson Andrew C. Miller Columbia University Harvard and Google Brain Harvard University Ryan
More informationImage segmentation combining Markov Random Fields and Dirichlet Processes
Image segmentation combining Markov Random Fields and Dirichlet Processes Jessica SODJO IMS, Groupe Signal Image, Talence Encadrants : A. Giremus, J.-F. Giovannelli, F. Caron, N. Dobigeon Jessica SODJO
More informationEvaluation Methods for Topic Models
University of Massachusetts Amherst wallach@cs.umass.edu April 13, 2009 Joint work with Iain Murray, Ruslan Salakhutdinov and David Mimno Statistical Topic Models Useful for analyzing large, unstructured
More informationLatent Dirichlet Bayesian Co-Clustering
Latent Dirichlet Bayesian Co-Clustering Pu Wang 1, Carlotta Domeniconi 1, and athryn Blackmond Laskey 1 Department of Computer Science Department of Systems Engineering and Operations Research George Mason
More informationStochastic Variational Inference for the HDP-HMM
Stochastic Variational Inference for the HDP-HMM Aonan Zhang San Gultekin John Paisley Department of Electrical Engineering & Data Science Institute Columbia University, New York, NY Abstract We derive
More informationProbabilistic Time Series Classification
Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign
More informationDirichlet Enhanced Latent Semantic Analysis
Dirichlet Enhanced Latent Semantic Analysis Kai Yu Siemens Corporate Technology D-81730 Munich, Germany Kai.Yu@siemens.com Shipeng Yu Institute for Computer Science University of Munich D-80538 Munich,
More informationTruncation-free Stochastic Variational Inference for Bayesian Nonparametric Models
Truncation-free Stochastic Variational Inference for Bayesian Nonparametric Models Chong Wang Machine Learning Department Carnegie Mellon University chongw@cs.cmu.edu David M. Blei Computer Science Department
More informationNon-parametric Clustering with Dirichlet Processes
Non-parametric Clustering with Dirichlet Processes Timothy Burns SUNY at Buffalo Mar. 31 2009 T. Burns (SUNY at Buffalo) Non-parametric Clustering with Dirichlet Processes Mar. 31 2009 1 / 24 Introduction
More informationLatent Dirichlet Allocation Introduction/Overview
Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models
More informationSpatial Bayesian Nonparametrics for Natural Image Segmentation
Spatial Bayesian Nonparametrics for Natural Image Segmentation Erik Sudderth Brown University Joint work with Michael Jordan University of California Soumya Ghosh Brown University Parsing Visual Scenes
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationLarge-scale Ordinal Collaborative Filtering
Large-scale Ordinal Collaborative Filtering Ulrich Paquet, Blaise Thomson, and Ole Winther Microsoft Research Cambridge, University of Cambridge, Technical University of Denmark ulripa@microsoft.com,brmt2@cam.ac.uk,owi@imm.dtu.dk
More informationAnother Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University
Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University 2 Variational Bayes? Bayes Bayes Theorem But the integral is intractable! Sampling Gibbs, Metropolis
More informationImproving Topic Models with Latent Feature Word Representations
Improving Topic Models with Latent Feature Word Representations Dat Quoc Nguyen Joint work with Richard Billingsley, Lan Du and Mark Johnson Department of Computing Macquarie University Sydney, Australia
More informationVariational Inference via Stochastic Backpropagation
Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation
More informationLearning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationAuto-Encoding Variational Bayes
Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationMonte Carlo Methods for Maximum Margin Supervised Topic Models
Monte Carlo Methods for Maximum Margin Supervised Topic Models Qixia Jiang, Jun Zhu, Maosong Sun, and Eric P. Xing Department of Computer Science & Technology, Tsinghua National TNList Lab, State Key Lab
More informationLearning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationA Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee Whye Teh Gatsby Computational Neuroscience Unit University College London 17 Queen Square, London WC1N 3AR, UK ywteh@gatsby.ucl.ac.uk
More informationDeep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016
Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is Variational Inference? What is Variational Inference? Want to estimate some distribution, p*(x) p*(x) What is
More informationLatent Dirichlet Alloca/on
Latent Dirichlet Alloca/on Blei, Ng and Jordan ( 2002 ) Presented by Deepak Santhanam What is Latent Dirichlet Alloca/on? Genera/ve Model for collec/ons of discrete data Data generated by parameters which
More informationTopics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families
Midterm Review Topics we covered Machine Learning Optimization Basics of optimization Convexity Unconstrained: GD, SGD Constrained: Lagrange, KKT Duality Linear Methods Perceptrons Support Vector Machines
More information16 : Approximate Inference: Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution
More information27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling
10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel
More informationSpatial Normalized Gamma Process
Spatial Normalized Gamma Process Vinayak Rao Yee Whye Teh Presented at NIPS 2009 Discussion and Slides by Eric Wang June 23, 2010 Outline Introduction Motivation The Gamma Process Spatial Normalized Gamma
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationTopic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up
Much of this material is adapted from Blei 2003. Many of the images were taken from the Internet February 20, 2014 Suppose we have a large number of books. Each is about several unknown topics. How can
More informationThe supervised hierarchical Dirichlet process
1 The supervised hierarchical Dirichlet process Andrew M. Dai and Amos J. Storkey Abstract We propose the supervised hierarchical Dirichlet process (shdp), a nonparametric generative model for the joint
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationIndirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina
Indirect Rule Learning: Support Vector Machines Indirect learning: loss optimization It doesn t estimate the prediction rule f (x) directly, since most loss functions do not have explicit optimizers. Indirection
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationDiscriminative Training of Mixed Membership Models
18 Discriminative Training of Mixed Membership Models Jun Zhu Department of Computer Science and Technology, State Key Laboratory of Intelligent Technology and Systems; Tsinghua National Laboratory for
More informationCollapsed Variational Bayesian Inference for Hidden Markov Models
Collapsed Variational Bayesian Inference for Hidden Markov Models Pengyu Wang, Phil Blunsom Department of Computer Science, University of Oxford International Conference on Artificial Intelligence and
More informationProbabilistic Graphical Models for Image Analysis - Lecture 4
Probabilistic Graphical Models for Image Analysis - Lecture 4 Stefan Bauer 12 October 2018 Max Planck ETH Center for Learning Systems Overview 1. Repetition 2. α-divergence 3. Variational Inference 4.
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationInfinite Latent SVM for Classification and Multi-task Learning
Infinite Latent SVM for Classification and Multi-task Learning Jun Zhu, Ning Chen, and Eric P. Xing Dept. of Computer Science & Tech., TNList Lab, Tsinghua University, Beijing 100084, China Machine Learning
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu October 19, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network
More informationVariational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures
17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter
More informationOnline Passive-Aggressive Algorithms
Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il
More informationIntroduction to Graphical Models
Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic
More informationICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts
ICML 2015 Scalable Nonparametric Bayesian Inference on Point Processes with Gaussian Processes Machine Learning Research Group and Oxford-Man Institute University of Oxford July 8, 2015 Point Processes
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationLecture 3a: Dirichlet processes
Lecture 3a: Dirichlet processes Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced Topics
More informationCMPS 242: Project Report
CMPS 242: Project Report RadhaKrishna Vuppala Univ. of California, Santa Cruz vrk@soe.ucsc.edu Abstract The classification procedures impose certain models on the data and when the assumption match the
More informationDocument and Topic Models: plsa and LDA
Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis
More informationLatent Dirichlet Allocation Based Multi-Document Summarization
Latent Dirichlet Allocation Based Multi-Document Summarization Rachit Arora Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai - 600 036, India. rachitar@cse.iitm.ernet.in
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering
More information39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017
Permuted and IROM Department, McCombs School of Business The University of Texas at Austin 39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017 1 / 36 Joint work
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationScaling Neighbourhood Methods
Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)
More informationSparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference
Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More information