Online Bayesian Passive-Aggressive Learning

Online Bayesian Passive-Aggressive Learning Full Journal Version: http://qr.net/b1rd Tianlin Shi Jun Zhu ICML 2014 T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 1 / 35

Outline Introduction Motivation Framework Theory Connection to PA Regret Analysis Practice Extension Latent Space Topic Modeling Experiment T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 2 / 35

Outline Introduction Introduction Motivation Framework Theory Connection to PA Regret Analysis Practice Extension Latent Space Topic Modeling Experiment T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 3 / 35

Introduction The Big Data Challenge Motivation T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 4 / 35

Introduction The Big Data Challenge Motivation Huge amounts of data push the limits of methods/systems. T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 4 / 35

Introduction The Big Data Challenge Motivation Huge amounts of data push the limits of methods/systems. Streaming and potentially infinite data in real applications. T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 4 / 35

Introduction The Big Data Challenge Motivation Huge amounts of data push the limits of methods/systems. Streaming and potentially infinite data in real applications. Complex with latent variables and hierarchical modeling. T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 4 / 35

Introduction Motivation Story I. Online Learning [Shalev-Shwartz 07. Online learning: Theory, algorithms and applications.] Illustration Model& w t Predict& Online&Learner& A Learn& Loss&& l(w ; x, y ) t t t Data& {x t, y t } t=0 T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 5 / 35

Introduction Motivation Story I. Online Learning [Shalev-Shwartz 07. Online learning: Theory, algorithms and applications.] Illustration Setting Online learner A. Model& w t Predict& Incoming data stream (x 0, y 0 ), (x 1, y 1 ), (x 2, y 2 )... Online&Learner& A Learn& Loss&& l(w ; x, y ) t t t Data& {x t, y t } t=0 T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 5 / 35

Introduction Motivation Story I. Online Learning [Shalev-Shwartz 07. Online learning: Theory, algorithms and applications.] Illustration Setting Online learner A. Model& w t Predict& Incoming data stream (x 0, y 0 ), (x 1, y 1 ), (x 2, y 2 )... At each round t, learner A Online&Learner& A Learn& Loss&& l(w ; x, y ) t t t Data& {x t, y t } t=0 T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 5 / 35

Introduction Motivation Story I. Online Learning [Shalev-Shwartz 07. Online learning: Theory, algorithms and applications.] Illustration Setting Online learner A. Model& w t Predict& Incoming data stream (x 0, y 0 ), (x 1, y 1 ), (x 2, y 2 )... At each round t, learner A 1. predicts w t. Online&Learner& A Learn& Loss&& l(w ; x, y ) t t t Data& {x t, y t } t=0 T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 5 / 35

Introduction Motivation Story I. Online Learning [Shalev-Shwartz 07. Online learning: Theory, algorithms and applications.] Illustration Setting Online learner A. Model& w t Predict& Incoming data stream (x 0, y 0 ), (x 1, y 1 ), (x 2, y 2 )... At each round t, learner A Online&Learner& A Learn& Loss&& l(w ; x, y ) t t t Data& {x t, y t } t=0 1. predicts w t. 2. Incurs loss l(w t ; x t, y t ). T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 5 / 35

Introduction Motivation Story I. Online Learning [Shalev-Shwartz 07. Online learning: Theory, algorithms and applications.] Illustration Setting Online learner A. Model& w t Predict& Incoming data stream (x 0, y 0 ), (x 1, y 1 ), (x 2, y 2 )... At each round t, learner A Online&Learner& A Learn& Loss&& l(w ; x, y ) t t t Data& {x t, y t } t=0 1. predicts w t. 2. Incurs loss l(w t ; x t, y t ). 3. Learn how to predict. T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 5 / 35

Introduction Motivation Online Passive-Aggressive (PA) Algorithms [Crammer et al. 06. Online Passive-Aggressive Algorithms.] T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 6 / 35

Introduction Motivation Online Passive-Aggressive (PA) Algorithms Update weight Obtain w t+1 = [Crammer et al. 06. Online Passive-Aggressive Algorithms.] arg min w 1 2 w w t 2 } {{ } regularizer + 2c l ɛ (w; x t, y t ) }{{} large margin (1) T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 6 / 35

Introduction Motivation Why online learning works. Notion of Regret Accumulative loss against fixed model w with hindsight of data: T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 7 / 35

Introduction Motivation Why online learning works. Notion of Regret Accumulative loss against fixed model w with hindsight of data: Regret = T 1 t=0 T 1 l(w t ; x t, y t ) t=0 l(w ; x t, y t ) (2) T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 7 / 35

Introduction Motivation Why online learning works. Notion of Regret Accumulative loss against fixed model w with hindsight of data: Regret Bound Regret = T 1 t=0 Example. In PA, l = l 2 ɛ, then T 1 l(w t ; x t, y t ) t=0 l(w ; x t, y t ) (2) T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 7 / 35

Introduction Motivation Why online learning works. Notion of Regret Accumulative loss against fixed model w with hindsight of data: Regret Bound Regret = T 1 t=0 Example. In PA, l = l 2 ɛ, then T 1 t=0 T 1 l(w t ; x t, y t ) l(w t ) cr T 1 t=0 t=0 l(w ; x t, y t ) (2) l(w ) + const (3) T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 7 / 35

Introduction Motivation Why online learning works. Notion of Regret Accumulative loss against fixed model w with hindsight of data: Regret Bound Regret = T 1 t=0 Example. In PA, l = l 2 ɛ, then T 1 t=0 T 1 l(w t ; x t, y t ) l(w t ) cr T 1 Implicitly minimizes the empirical loss. t=0 t=0 l(w ; x t, y t ) (2) l(w ) + const (3) T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 7 / 35

Introduction Motivation Story II. Sequential Bayesian Inference Maintain a distribution over w. T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 8 / 35

Introduction Motivation Story II. Sequential Bayesian Inference Maintain a distribution over w. For round t, T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 8 / 35

Introduction Motivation Story II. Sequential Bayesian Inference Maintain a distribution over w. For round t, 1. Let current distribution q t (w) be prior. T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 8 / 35

Introduction Motivation Story II. Sequential Bayesian Inference Maintain a distribution over w. For round t, 1. Let current distribution q t (w) be prior. 2. For each incoming data point (x t, y t ), apply Bayes theorem q t+1 (w) q t (w)p(x t w). (4) 3. If inference is intractable, use approximations of q t+1 (w). [Broderick et al. 13. Streaming Variational Bayes.] T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 8 / 35

The Gap Introduction Motivation Story I. Online Learning: PA T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 9 / 35

Introduction Motivation The Gap Story I. Online Learning: PA Huge, Stream, T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 9 / 35

Introduction Motivation The Gap Story I. Online Learning: PA Huge, Stream, Complex 1. Single estimate of the model. T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 9 / 35

Introduction Motivation The Gap Story I. Online Learning: PA Huge, Stream, Complex 1. Single estimate of the model. 2. Ignores latent structure underlying complex data. T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 9 / 35

Introduction Motivation The Gap Story I. Online Learning: PA Huge, Stream, Complex 1. Single estimate of the model. 2. Ignores latent structure underlying complex data. Story II. Sequential Bayesian Inference T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 9 / 35

A New Perspective Introduction Motivation Illustration T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 10 / 35

A New Perspective Introduction Motivation Illustration Intuition T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 10 / 35

Introduction Motivation A New Perspective Illustration Intuition Incoming data (x t, y t ) T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 10 / 35

Introduction Motivation A New Perspective Illustration Intuition Incoming data (x t, y t ) Infer latent variable z t from x t. T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 10 / 35

Introduction Motivation A New Perspective Illustration Intuition Incoming data (x t, y t ) Infer latent variable z t from x t. Make prediction ŷ t. T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 10 / 35

Introduction Motivation A New Perspective Illustration Intuition Incoming data (x t, y t ) Infer latent variable z t from x t. Make prediction ŷ t. Learner incurs instantaneous loss. T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 10 / 35

Introduction Motivation A New Perspective Illustration Intuition Incoming data (x t, y t ) Infer latent variable z t from x t. Make prediction ŷ t. Learner incurs instantaneous loss. Learn both how to infer and to predict. T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 10 / 35

Introduction Framework Bayesian Passive-Aggressive Learning (BayesPA) T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 11 / 35

Introduction Framework Bayesian Passive-Aggressive Learning (BayesPA) 1. At round t, the learner makes a prediction by giving q t (w). T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 11 / 35

Introduction Framework Bayesian Passive-Aggressive Learning (BayesPA) 1. At round t, the learner makes a prediction by giving q t (w). 2. Incurs instantaneous loss l, both T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 11 / 35

Introduction Framework Bayesian Passive-Aggressive Learning (BayesPA) 1. At round t, the learner makes a prediction by giving q t (w). 2. Incurs instantaneous loss l, both Bayesian Log-loss: E qt(w)[log p(x t w)] Decision-theoretic loss: l(q(w); xt, y t ) 3. Learning rule q t+1 (w) = [ ] [ ] ) argmin KL q(w) q t (w) E q(w) log p(x t w) + 2c l ɛ (q(w); x t, y t, q(w) F t } {{ }} {{ } regularizer instantaneous loss (5) T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 11 / 35

Interpretation Introduction Framework Rewrite the learning rule q t+1 (w) = arg max q(w) F KL[q(w) q t(w)p(x t w) }{{} unnormalized posterior ] + 2c l(q t (w); x t, y t ) (6) T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 12 / 35

Interpretation Introduction Framework Rewrite the learning rule q t+1 (w) = arg max q(w) F KL[q(w) Illustration q t(w)p(x t w) }{{} unnormalized posterior ] + 2c l(q t (w); x t, y t ) (6) T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 12 / 35

Introduction Framework BayesPA classifiers: two types 1. Averaging classifiers Prediction rule ŷ t = sign E q(w) [w x t ] Averaged hinge loss ) (q(w); x t, y t = ( [ ɛ y t E q(w) w ]) x t l ave ɛ +. T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 13 / 35

Introduction Framework BayesPA classifiers: two types 1. Averaging classifiers Prediction rule ŷ t = sign E q(w) [w x t ] Averaged hinge loss ) (q(w); x t, y t = ( [ ɛ y t E q(w) w ]) x t 2. Gibbs classifiers l ave ɛ First randomly draw w qt (w) and decide ŷ t = sign(w x t ). Expected hinge loss ) [ (ɛ (q(w); x t, y t = E q(w) yt w ) ] x t. l gibbs ɛ +. + T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 13 / 35

Introduction Framework BayesPA classifiers: two types 1. Averaging classifiers Prediction rule ŷ t = sign E q(w) [w x t ] Averaged hinge loss ) (q(w); x t, y t = ( [ ɛ y t E q(w) w ]) x t 2. Gibbs classifiers l ave ɛ First randomly draw w qt (w) and decide ŷ t = sign(w x t ). Expected hinge loss ) [ (ɛ (q(w); x t, y t = E q(w) yt w ) ] x t. l gibbs ɛ +. + Lemma ) l gibbs ɛ (q(w); x t, y t ) l ave ɛ (q(w); x t, y t. (7) T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 13 / 35

Update Rule Introduction Framework Averaging Classifier ) q t+1 (w) q t (w) p(x }{{} t w) exp (τt y }{{} t w x t, (8) }{{} prior likelihood pseudo likelihood Dual slack variable τ t enforces max-margin constraints. Gibbs Classifier ( ) ) q t+1 (w) q t (w) p(x }{{} t w) exp 2c (ɛ y }{{} t w x t + prior likelihood }{{} pseudo likelihood (9) T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 14 / 35

Outline Theory Introduction Motivation Framework Theory Connection to PA Regret Analysis Practice Extension Latent Space Topic Modeling Experiment T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 15 / 35

Connection to PA Theory Connection to PA Theorem BayesPA subsumes the online PA. T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 16 / 35

Connection to PA Theory Connection to PA Theorem BayesPA subsumes the online PA. Proof. Show µ t 1 µ t follows PA. T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 16 / 35

Theory Regret Analysis Regret Analysis Regret in BayesPA Decision is the distribution q(w). Cumulative loss against fixed distribution p (w) with hindsight of all data. T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 17 / 35

Theory Regret Analysis Regret Analysis Regret in BayesPA Decision is the distribution q(w). Cumulative loss against fixed distribution p (w) with hindsight of all data. Regret = T 1 t=0 T 1 l(q t (w); x t, y t ) t=0 l(p (w); x t, y t ) (10) T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 17 / 35

Theory Regret Analysis Regret Analysis Regret in BayesPA Decision is the distribution q(w). Cumulative loss against fixed distribution p (w) with hindsight of all data. where the loss Regret = T 1 t=0 T 1 l(q t (w); x t, y t ) t=0 l(p (w); x t, y t ) (10) l(q(w); x t, y t ) = E q(w) [log p(x t w)] + 2c l ɛ (q(w t ); x t, y t ) (11) T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 17 / 35

Theory A regret bound for BayesPA Regret Analysis T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 18 / 35

Theory Regret Analysis A regret bound for BayesPA qt τ,u (w) q t (w) exp(uu(w, x t, y t ) + τt (w, x t, y t )). where the sufficient statistics, U(w, x t, y t ) = log p(x t w), and T (w, x t, y t ) = y t w x t. T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 18 / 35

Theory Regret Analysis A regret bound for BayesPA (Contd.) Theorem Suppose Fisher information J t λ t I, Regret KL[p (w) p 0 (w)] + 1 T 2 (1 + 1 4c2 ) λ t (12) t=0 T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 19 / 35

Theory Regret Analysis A regret bound for BayesPA (Contd.) Theorem Suppose Fisher information J t λ t I, Regret KL[p (w) p 0 (w)] + 1 T 2 (1 + 1 4c2 ) λ t (12) t=0 Remark. If Bayesian CLT holds, BayesPA achieve optimal regret O(log T ). T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 19 / 35

Outline Practice Introduction Motivation Framework Theory Connection to PA Regret Analysis Practice Extension Latent Space Topic Modeling Experiment T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 20 / 35

Some Extensions Practice Extension Mini-batches X t = {x d } d Bt, Y t = {y d } d Bt. [ ] [ ] ) min KL q(w) q t (w) E q(w) log p(x t w) + 2c l q F t }{{} ɛ (q(w); X t, Y t, }{{} log p(x t w) d Bt l ɛ(q(w);x t,y t) d Bt T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 21 / 35

Some Extensions Practice Extension Mini-batches X t = {x d } d Bt, Y t = {y d } d Bt. [ ] [ ] ) min KL q(w) q t (w) E q(w) log p(x t w) + 2c l q F t }{{} ɛ (q(w); X t, Y t, }{{} Multi-Task Learning Multiple labels (x t, y 1 t,..., y τ t ). min q F t log p(x t w) d Bt [ ] KL q(w) q t (w)p(x t w, M, H t ) + 2c l ɛ(q(w);x t,y t) d Bt T ) l ɛ (q(w, M, H t ); X t, Y τ t τ=1 T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 21 / 35

Practice BayesPA with Latent Space Latent Space Illustration global&variables w local&variables H t (a) sampling analysis m draw&a&mini5 batch infer&the&hidden& structure t = 1,2,..., X t,y t (X t,y t ) q * (H t ) (b) up glo T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 22 / 35

Practice BayesPA with Latent Space Latent Space Illustration global&variables w Formulation local&variables Maintain distribution q(w, M) of global variables. H t (a) sampling analysis m draw&a&mini5 batch infer&the&hidden& structure t = 1,2,..., X t,y t (X t,y t ) q * (H t ) [ ] ) min KL q(w, M, H t) q t(w, M)p(x t w, M, H t) + 2c l ɛ (q(w, M, H t); X t, Y t, (13) q F t (b) up glo T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 22 / 35

Practice BayesPA with Latent Space Latent Space Illustration global&variables w Formulation local&variables Maintain distribution q(w, M) of global variables. H t (a) sampling analysis m draw&a&mini5 batch infer&the&hidden& structure t = 1,2,..., X t,y t (X t,y t ) q * (H t ) [ ] ) min KL q(w, M, H t) q t(w, M)p(x t w, M, H t) + 2c l ɛ (q(w, M, H t); X t, Y t, (13) q F t Problem Efficient inference for marginal q(w, M)? T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 22 / 35 (b) up glo

Practice Latent Space Approximate Inference in BayesPA Variational Inference Idea: restrict F t to be tractable. T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 23 / 35

Practice Latent Space Approximate Inference in BayesPA Variational Inference Idea: restrict F t to be tractable. sampling analysis model&update draw&a&mini5 batch infer&the&hidden& structure (b) update&distribu8on&of& global&variables (X t,y t ) q * (H t ) q* (w,m) 1. Structured mean field q (w, M t, H t ) = q (w)q (M t ) q(h t ). T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 23 / 35

Practice Latent Space Approximate Inference in BayesPA Variational Inference Idea: restrict F t to be tractable. sampling analysis model&update draw&a&mini5 batch infer&the&hidden& structure (b) update&distribu8on&of& global&variables (X t,y t ) q * (H t ) q* (w,m) 1. Structured mean field q (w, M t, H t ) = q (w)q (M t ) q(h t ). 2. Solve BayesPA problem. T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 23 / 35

Practice Latent Space Approximate Inference in BayesPA Variational Inference Idea: restrict F t to be tractable. sampling analysis model&update draw&a&mini5 batch infer&the&hidden& structure (b) update&distribu8on&of& global&variables (X t,y t ) q * (H t ) q* (w,m) 1. Structured mean field q (w, M t, H t ) = q (w)q (M t ) q(h t ). 2. Solve BayesPA problem. 3. Output q t+1 (M, w) = q (w)q (M). T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 23 / 35

Practice Topic Modeling Application to Topic Modeling Review of Maximum Entropy Discriminant LDA (MedLDA) T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 24 / 35

Practice Topic Modeling Online Max-Margin Topic Models Online MedLDA For MedLDA, latent BayesPA concretizes to [ ] KL q(w, Φ, Z t ) q t (w, Φ)p 0 (Z t )p(x t Φ, Z t ) min q,ξ d + 2c d B t l ɛ (w; z t, y t ), (14) In our paper s.t. q(w, Φ, Z t ) P. Derive efficient algorithms for online MedLDA with both l Avg ɛ, l Gibbs ɛ, T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 25 / 35

Practice Topic Modeling Online Max-Margin Topic Models Online MedLDA For MedLDA, latent BayesPA concretizes to [ ] KL q(w, Φ, Z t ) q t (w, Φ)p 0 (Z t )p(x t Φ, Z t ) min q,ξ d + 2c d B t l ɛ (w; z t, y t ), (14) In our paper s.t. q(w, Φ, Z t ) P. Derive efficient algorithms for online MedLDA with both l Avg ɛ, l Gibbs ɛ, Provided nonparametric extension: Med Hierarchical Dirichlet Process (MedHDP). T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 25 / 35

Practice Topic Modeling Online Max-Margin Topic Models Online MedLDA For MedLDA, latent BayesPA concretizes to [ ] KL q(w, Φ, Z t ) q t (w, Φ)p 0 (Z t )p(x t Φ, Z t ) min q,ξ d + 2c d B t l ɛ (w; z t, y t ), (14) In our paper s.t. q(w, Φ, Z t ) P. Derive efficient algorithms for online MedLDA with both l Avg ɛ, l Gibbs ɛ, Provided nonparametric extension: Med Hierarchical Dirichlet Process (MedHDP). BayesPA for nonparametric models: derive efficient algorithms for online MedHDP. T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 25 / 35

Outline Experiment Introduction Motivation Framework Theory Connection to PA Regret Analysis Practice Extension Latent Space Topic Modeling Experiment T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 26 / 35

Experiment Classification on 20NG 20 Newsgroup 1. 20 categories of documents. T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 27 / 35

Experiment Classification on 20NG 20 Newsgroup 1. 20 categories of documents. 2. training/testing split 11269/7505. T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 27 / 35

Experiment Classification on 20NG 20 Newsgroup 1. 20 categories of documents. 2. training/testing split 11269/7505. 3. one vs. all for multiple categories. T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 27 / 35

Experiment Classification on 20NG 20 Newsgroup 1. 20 categories of documents. 2. training/testing split 11269/7505. 3. one vs. all for multiple categories. Baseline batch MedLDA (Zhu el al. 2009, 2013) batch MedHDP. T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 27 / 35

Experiment Classification on 20NG 20 Newsgroup 1. 20 categories of documents. 2. training/testing split 11269/7505. 3. one vs. all for multiple categories. Baseline batch MedLDA (Zhu el al. 2009, 2013) batch MedHDP. sparse stochastic LDA (mimno et al. 2012), truncation-free HDP (Wang & Blei 2012). T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 27 / 35

Experiment Results of Online MedLDA Passes through dataset. 1 pamedlda ave pamedlda gibbs MedLDA gibbs gmedlda splda+svm 0.9 0.8 0.7 Error Rate 0.6 0.5 0.4 0.3 0.2 10 1 10 0 10 1 #Passes Through the Dataset T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 28 / 35

Experiment Results of Online MedLDA Accuracy and running time pamedlda ave pamedlda gibbs MedLDA gibbs gmedlda splda+svm 0.8 10 5 0.75 10 4 0.7 Accuracy 0.65 0.6 Time(s) 10 3 0.55 0.5 10 2 0.45 0.4 20 40 60 80 100 #Topic 10 1 0 20 40 60 80 100 120 #Topic T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 29 / 35

Experiment T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 30 / 35

Experiment T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 31 / 35

Experiment Sensitivity with Batchsize Batch Size 1 4 16 64 256 1024 Batch 1 1 0.9 0.9 0.8 0.8 0.7 0.7 Test Error 0.6 0.5 Test Error 0.6 0.5 0.4 0.4 0.3 0.3 0.2 0.2 10 0 10 1 10 2 10 3 10 4 10 5 CPU Seconds (Log Scale) 10 0 10 1 10 2 10 3 10 4 10 5 CPU Seconds (Log Scale) T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 32 / 35

Experiment Online MedHDP on 20NG pamedhdp ave pamedhdp gibbs bmedhdp tfhdp+svm 0.9 10 4 0.85 Accuracy 0.8 0.75 0.7 0.65 Time(s) 10 3 10 2 0.6 0.55 20 22 24 26 28 30 #Topic 10 1 20 22 24 26 28 30 #Topic T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 33 / 35

Experiment Multi-Task Learning on Wikipedia Dataset Training data: 1.1 M Wikipedia articles, 20 categories. T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 34 / 35

Experiment Multi-Task Learning on Wikipedia Dataset Training data: 1.1 M Wikipedia articles, 20 categories. Test data: 5000 Wikipedia articles. T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 34 / 35

Experiment Multi-Task Learning on Wikipedia Dataset Training data: 1.1 M Wikipedia articles, 20 categories. Test data: 5000 Wikipedia articles. Vocabulary: 917,683 unique items. Result 0.6 pamedldamt ave pamedhdpmt ave pamedldamt gibbs pamedhdpmt gibbs MedLDAmt MedHDPmt F1 Score 0.5 0.4 0.3 0.2 0.1 0 10 1 10 2 10 3 10 4 10 5 Time (Seconds) T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 34 / 35

Contribution Experiment A generic online learning framework. T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 35 / 35

Contribution Experiment A generic online learning framework. Extend Passive-Aggressive learning to Bayesian setting. T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 35 / 35

Contribution Experiment A generic online learning framework. Extend Passive-Aggressive learning to Bayesian setting. Generalize Sequential Bayesian Inference to discriminative learning. T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 35 / 35

Contribution Experiment A generic online learning framework. Extend Passive-Aggressive learning to Bayesian setting. Generalize Sequential Bayesian Inference to discriminative learning. Develop efficient online learning algorithm for max-margin topic models. T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 35 / 35