Online Bayesian Passive-Aggressive Learning
|
|
- Beverly Gordon
- 5 years ago
- Views:
Transcription
1 Online Bayesian Passive-Aggressive Learning Full Journal Version: Tianlin Shi Jun Zhu ICML 2014 T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
2 Outline Introduction Motivation Framework Theory Connection to PA Regret Analysis Practice Extension Latent Space Topic Modeling Experiment T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
3 Outline Introduction Introduction Motivation Framework Theory Connection to PA Regret Analysis Practice Extension Latent Space Topic Modeling Experiment T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
4 Introduction The Big Data Challenge Motivation T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
5 Introduction The Big Data Challenge Motivation Huge amounts of data push the limits of methods/systems. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
6 Introduction The Big Data Challenge Motivation Huge amounts of data push the limits of methods/systems. Streaming and potentially infinite data in real applications. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
7 Introduction The Big Data Challenge Motivation Huge amounts of data push the limits of methods/systems. Streaming and potentially infinite data in real applications. Complex with latent variables and hierarchical modeling. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
8 Introduction Motivation Story I. Online Learning [Shalev-Shwartz 07. Online learning: Theory, algorithms and applications.] Illustration Model& w t Predict& Online&Learner& A Learn& Loss&& l(w ; x, y ) t t t Data& {x t, y t } t=0 T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
9 Introduction Motivation Story I. Online Learning [Shalev-Shwartz 07. Online learning: Theory, algorithms and applications.] Illustration Setting Online learner A. Model& w t Predict& Online&Learner& A Learn& Loss&& l(w ; x, y ) t t t Data& {x t, y t } t=0 T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
10 Introduction Motivation Story I. Online Learning [Shalev-Shwartz 07. Online learning: Theory, algorithms and applications.] Illustration Setting Online learner A. Model& w t Predict& Incoming data stream (x 0, y 0 ), (x 1, y 1 ), (x 2, y 2 )... Online&Learner& A Learn& Loss&& l(w ; x, y ) t t t Data& {x t, y t } t=0 T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
11 Introduction Motivation Story I. Online Learning [Shalev-Shwartz 07. Online learning: Theory, algorithms and applications.] Illustration Setting Online learner A. Model& w t Predict& Incoming data stream (x 0, y 0 ), (x 1, y 1 ), (x 2, y 2 )... At each round t, learner A Online&Learner& A Learn& Loss&& l(w ; x, y ) t t t Data& {x t, y t } t=0 T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
12 Introduction Motivation Story I. Online Learning [Shalev-Shwartz 07. Online learning: Theory, algorithms and applications.] Illustration Setting Online learner A. Model& w t Predict& Incoming data stream (x 0, y 0 ), (x 1, y 1 ), (x 2, y 2 )... At each round t, learner A 1. predicts w t. Online&Learner& A Learn& Loss&& l(w ; x, y ) t t t Data& {x t, y t } t=0 T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
13 Introduction Motivation Story I. Online Learning [Shalev-Shwartz 07. Online learning: Theory, algorithms and applications.] Illustration Setting Online learner A. Model& w t Predict& Incoming data stream (x 0, y 0 ), (x 1, y 1 ), (x 2, y 2 )... At each round t, learner A Online&Learner& A Learn& Loss&& l(w ; x, y ) t t t Data& {x t, y t } t=0 1. predicts w t. 2. Incurs loss l(w t ; x t, y t ). T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
14 Introduction Motivation Story I. Online Learning [Shalev-Shwartz 07. Online learning: Theory, algorithms and applications.] Illustration Setting Online learner A. Model& w t Predict& Incoming data stream (x 0, y 0 ), (x 1, y 1 ), (x 2, y 2 )... At each round t, learner A Online&Learner& A Learn& Loss&& l(w ; x, y ) t t t Data& {x t, y t } t=0 1. predicts w t. 2. Incurs loss l(w t ; x t, y t ). 3. Learn how to predict. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
15 Introduction Motivation Online Passive-Aggressive (PA) Algorithms [Crammer et al. 06. Online Passive-Aggressive Algorithms.] T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
16 Introduction Motivation Online Passive-Aggressive (PA) Algorithms Update weight Obtain w t+1 = [Crammer et al. 06. Online Passive-Aggressive Algorithms.] arg min w 1 2 w w t 2 } {{ } regularizer + 2c l ɛ (w; x t, y t ) }{{} large margin (1) T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
17 Introduction Motivation Online Passive-Aggressive (PA) Algorithms Update weight Obtain w t+1 = [Crammer et al. 06. Online Passive-Aggressive Algorithms.] arg min w 1 2 w w t 2 } {{ } regularizer + 2c l ɛ (w; x t, y t ) }{{} large margin (1) Illustration T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
18 Introduction Motivation Online Passive-Aggressive (PA) Algorithms Update weight Obtain w t+1 = [Crammer et al. 06. Online Passive-Aggressive Algorithms.] arg min w 1 2 w w t 2 } {{ } regularizer + 2c l ɛ (w; x t, y t ) }{{} large margin (1) Illustration T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
19 Introduction Motivation Why online learning works. Notion of Regret Accumulative loss against fixed model w with hindsight of data: T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
20 Introduction Motivation Why online learning works. Notion of Regret Accumulative loss against fixed model w with hindsight of data: Regret = T 1 t=0 T 1 l(w t ; x t, y t ) t=0 l(w ; x t, y t ) (2) T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
21 Introduction Motivation Why online learning works. Notion of Regret Accumulative loss against fixed model w with hindsight of data: Regret Bound Regret = T 1 t=0 Example. In PA, l = l 2 ɛ, then T 1 l(w t ; x t, y t ) t=0 l(w ; x t, y t ) (2) T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
22 Introduction Motivation Why online learning works. Notion of Regret Accumulative loss against fixed model w with hindsight of data: Regret Bound Regret = T 1 t=0 Example. In PA, l = l 2 ɛ, then T 1 t=0 T 1 l(w t ; x t, y t ) l(w t ) cr T 1 t=0 t=0 l(w ; x t, y t ) (2) l(w ) + const (3) T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
23 Introduction Motivation Why online learning works. Notion of Regret Accumulative loss against fixed model w with hindsight of data: Regret Bound Regret = T 1 t=0 Example. In PA, l = l 2 ɛ, then T 1 t=0 T 1 l(w t ; x t, y t ) l(w t ) cr T 1 Implicitly minimizes the empirical loss. t=0 t=0 l(w ; x t, y t ) (2) l(w ) + const (3) T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
24 Introduction Motivation Story II. Sequential Bayesian Inference Maintain a distribution over w. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
25 Introduction Motivation Story II. Sequential Bayesian Inference Maintain a distribution over w. For round t, T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
26 Introduction Motivation Story II. Sequential Bayesian Inference Maintain a distribution over w. For round t, 1. Let current distribution q t (w) be prior. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
27 Introduction Motivation Story II. Sequential Bayesian Inference Maintain a distribution over w. For round t, 1. Let current distribution q t (w) be prior. 2. For each incoming data point (x t, y t ), apply Bayes theorem q t+1 (w) q t (w)p(x t w). (4) T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
28 Introduction Motivation Story II. Sequential Bayesian Inference Maintain a distribution over w. For round t, 1. Let current distribution q t (w) be prior. 2. For each incoming data point (x t, y t ), apply Bayes theorem q t+1 (w) q t (w)p(x t w). (4) 3. If inference is intractable, use approximations of q t+1 (w). [Broderick et al. 13. Streaming Variational Bayes.] T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
29 Introduction Motivation Story II. Sequential Bayesian Inference Maintain a distribution over w. For round t, 1. Let current distribution q t (w) be prior. 2. For each incoming data point (x t, y t ), apply Bayes theorem q t+1 (w) q t (w)p(x t w). (4) 3. If inference is intractable, use approximations of q t+1 (w). [Broderick et al. 13. Streaming Variational Bayes.] Advantage Captures uncertainty. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
30 Introduction Motivation Story II. Sequential Bayesian Inference Maintain a distribution over w. For round t, 1. Let current distribution q t (w) be prior. 2. For each incoming data point (x t, y t ), apply Bayes theorem q t+1 (w) q t (w)p(x t w). (4) 3. If inference is intractable, use approximations of q t+1 (w). [Broderick et al. 13. Streaming Variational Bayes.] Advantage Captures uncertainty. Flexible to model the underlying structure. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
31 Introduction Motivation Story II. Sequential Bayesian Inference Maintain a distribution over w. For round t, 1. Let current distribution q t (w) be prior. 2. For each incoming data point (x t, y t ), apply Bayes theorem q t+1 (w) q t (w)p(x t w). (4) 3. If inference is intractable, use approximations of q t+1 (w). [Broderick et al. 13. Streaming Variational Bayes.] Advantage Captures uncertainty. Flexible to model the underlying structure. Could be nonparametric. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
32 The Gap Introduction Motivation Story I. Online Learning: PA T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
33 Introduction Motivation The Gap Story I. Online Learning: PA Huge, Stream, T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
34 Introduction Motivation The Gap Story I. Online Learning: PA Huge, Stream, Complex 1. Single estimate of the model. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
35 Introduction Motivation The Gap Story I. Online Learning: PA Huge, Stream, Complex 1. Single estimate of the model. 2. Ignores latent structure underlying complex data. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
36 Introduction Motivation The Gap Story I. Online Learning: PA Huge, Stream, Complex 1. Single estimate of the model. 2. Ignores latent structure underlying complex data. Story II. Sequential Bayesian Inference T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
37 Introduction Motivation The Gap Story I. Online Learning: PA Huge, Stream, Complex 1. Single estimate of the model. 2. Ignores latent structure underlying complex data. Story II. Sequential Bayesian Inference Lack of discriminative power T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
38 Introduction Motivation The Gap Story I. Online Learning: PA Huge, Stream, Complex 1. Single estimate of the model. 2. Ignores latent structure underlying complex data. Story II. Sequential Bayesian Inference Lack of discriminative power Does not directly minimize loss (or regret). T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
39 A New Perspective Introduction Motivation Illustration T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
40 A New Perspective Introduction Motivation Illustration Intuition T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
41 Introduction Motivation A New Perspective Illustration Intuition Incoming data (x t, y t ) T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
42 Introduction Motivation A New Perspective Illustration Intuition Incoming data (x t, y t ) Infer latent variable z t from x t. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
43 Introduction Motivation A New Perspective Illustration Intuition Incoming data (x t, y t ) Infer latent variable z t from x t. Make prediction ŷ t. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
44 Introduction Motivation A New Perspective Illustration Intuition Incoming data (x t, y t ) Infer latent variable z t from x t. Make prediction ŷ t. Learner incurs instantaneous loss. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
45 Introduction Motivation A New Perspective Illustration Intuition Incoming data (x t, y t ) Infer latent variable z t from x t. Make prediction ŷ t. Learner incurs instantaneous loss. Learn both how to infer and to predict. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
46 Introduction Framework Bayesian Passive-Aggressive Learning (BayesPA) T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
47 Introduction Framework Bayesian Passive-Aggressive Learning (BayesPA) 1. At round t, the learner makes a prediction by giving q t (w). T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
48 Introduction Framework Bayesian Passive-Aggressive Learning (BayesPA) 1. At round t, the learner makes a prediction by giving q t (w). 2. Incurs instantaneous loss l, both T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
49 Introduction Framework Bayesian Passive-Aggressive Learning (BayesPA) 1. At round t, the learner makes a prediction by giving q t (w). 2. Incurs instantaneous loss l, both Bayesian Log-loss: E qt(w)[log p(x t w)] T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
50 Introduction Framework Bayesian Passive-Aggressive Learning (BayesPA) 1. At round t, the learner makes a prediction by giving q t (w). 2. Incurs instantaneous loss l, both Bayesian Log-loss: E qt(w)[log p(x t w)] Decision-theoretic loss: l(q(w); xt, y t ) T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
51 Introduction Framework Bayesian Passive-Aggressive Learning (BayesPA) 1. At round t, the learner makes a prediction by giving q t (w). 2. Incurs instantaneous loss l, both Bayesian Log-loss: E qt(w)[log p(x t w)] Decision-theoretic loss: l(q(w); xt, y t ) 3. Learning rule q t+1 (w) = [ ] [ ] ) argmin KL q(w) q t (w) E q(w) log p(x t w) + 2c l ɛ (q(w); x t, y t, q(w) F t } {{ }} {{ } regularizer instantaneous loss (5) T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
52 Interpretation Introduction Framework Rewrite the learning rule q t+1 (w) = arg max q(w) F KL[q(w) q t(w)p(x t w) }{{} unnormalized posterior ] + 2c l(q t (w); x t, y t ) (6) T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
53 Interpretation Introduction Framework Rewrite the learning rule q t+1 (w) = arg max q(w) F KL[q(w) Illustration q t(w)p(x t w) }{{} unnormalized posterior ] + 2c l(q t (w); x t, y t ) (6) T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
54 Interpretation Introduction Framework Rewrite the learning rule q t+1 (w) = arg max q(w) F KL[q(w) Illustration q t(w)p(x t w) }{{} unnormalized posterior ] + 2c l(q t (w); x t, y t ) (6) T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
55 Introduction Framework BayesPA classifiers: two types 1. Averaging classifiers Prediction rule ŷ t = sign E q(w) [w x t ] Averaged hinge loss ) (q(w); x t, y t = ( [ ɛ y t E q(w) w ]) x t l ave ɛ +. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
56 Introduction Framework BayesPA classifiers: two types 1. Averaging classifiers Prediction rule ŷ t = sign E q(w) [w x t ] Averaged hinge loss ) (q(w); x t, y t = ( [ ɛ y t E q(w) w ]) x t 2. Gibbs classifiers l ave ɛ First randomly draw w qt (w) and decide ŷ t = sign(w x t ). Expected hinge loss ) [ (ɛ (q(w); x t, y t = E q(w) yt w ) ] x t. l gibbs ɛ +. + T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
57 Introduction Framework BayesPA classifiers: two types 1. Averaging classifiers Prediction rule ŷ t = sign E q(w) [w x t ] Averaged hinge loss ) (q(w); x t, y t = ( [ ɛ y t E q(w) w ]) x t 2. Gibbs classifiers l ave ɛ First randomly draw w qt (w) and decide ŷ t = sign(w x t ). Expected hinge loss ) [ (ɛ (q(w); x t, y t = E q(w) yt w ) ] x t. l gibbs ɛ +. + Lemma ) l gibbs ɛ (q(w); x t, y t ) l ave ɛ (q(w); x t, y t. (7) T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
58 Update Rule Introduction Framework Averaging Classifier ) q t+1 (w) q t (w) p(x }{{} t w) exp (τt y }{{} t w x t, (8) }{{} prior likelihood pseudo likelihood Dual slack variable τ t enforces max-margin constraints. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
59 Update Rule Introduction Framework Averaging Classifier ) q t+1 (w) q t (w) p(x }{{} t w) exp (τt y }{{} t w x t, (8) }{{} prior likelihood pseudo likelihood Dual slack variable τ t enforces max-margin constraints. Gibbs Classifier ( ) ) q t+1 (w) q t (w) p(x }{{} t w) exp 2c (ɛ y }{{} t w x t + prior likelihood }{{} pseudo likelihood (9) T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
60 Outline Theory Introduction Motivation Framework Theory Connection to PA Regret Analysis Practice Extension Latent Space Topic Modeling Experiment T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
61 Connection to PA Theory Connection to PA Theorem BayesPA subsumes the online PA. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
62 Connection to PA Theory Connection to PA Theorem BayesPA subsumes the online PA. Proof. Show µ t 1 µ t follows PA. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
63 Theory Regret Analysis Regret Analysis Regret in BayesPA Decision is the distribution q(w). Cumulative loss against fixed distribution p (w) with hindsight of all data. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
64 Theory Regret Analysis Regret Analysis Regret in BayesPA Decision is the distribution q(w). Cumulative loss against fixed distribution p (w) with hindsight of all data. Regret = T 1 t=0 T 1 l(q t (w); x t, y t ) t=0 l(p (w); x t, y t ) (10) T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
65 Theory Regret Analysis Regret Analysis Regret in BayesPA Decision is the distribution q(w). Cumulative loss against fixed distribution p (w) with hindsight of all data. where the loss Regret = T 1 t=0 T 1 l(q t (w); x t, y t ) t=0 l(p (w); x t, y t ) (10) l(q(w); x t, y t ) = E q(w) [log p(x t w)] + 2c l ɛ (q(w t ); x t, y t ) (11) T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
66 Theory A regret bound for BayesPA Regret Analysis T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
67 Theory A regret bound for BayesPA Regret Analysis T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
68 Theory Regret Analysis A regret bound for BayesPA qt τ,u (w) q t (w) exp(uu(w, x t, y t ) + τt (w, x t, y t )). where the sufficient statistics, U(w, x t, y t ) = log p(x t w), and T (w, x t, y t ) = y t w x t. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
69 Theory Regret Analysis A regret bound for BayesPA (Contd.) Theorem Suppose Fisher information J t λ t I, Regret KL[p (w) p 0 (w)] + 1 T 2 ( c2 ) λ t (12) t=0 T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
70 Theory Regret Analysis A regret bound for BayesPA (Contd.) Theorem Suppose Fisher information J t λ t I, Regret KL[p (w) p 0 (w)] + 1 T 2 ( c2 ) λ t (12) t=0 Remark. If Bayesian CLT holds, BayesPA achieve optimal regret O(log T ). T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
71 Outline Practice Introduction Motivation Framework Theory Connection to PA Regret Analysis Practice Extension Latent Space Topic Modeling Experiment T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
72 Some Extensions Practice Extension Mini-batches X t = {x d } d Bt, Y t = {y d } d Bt. [ ] [ ] ) min KL q(w) q t (w) E q(w) log p(x t w) + 2c l q F t }{{} ɛ (q(w); X t, Y t, }{{} log p(x t w) d Bt l ɛ(q(w);x t,y t) d Bt T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
73 Some Extensions Practice Extension Mini-batches X t = {x d } d Bt, Y t = {y d } d Bt. [ ] [ ] ) min KL q(w) q t (w) E q(w) log p(x t w) + 2c l q F t }{{} ɛ (q(w); X t, Y t, }{{} Multi-Task Learning Multiple labels (x t, y 1 t,..., y τ t ). min q F t log p(x t w) d Bt [ ] KL q(w) q t (w)p(x t w, M, H t ) + 2c l ɛ(q(w);x t,y t) d Bt T ) l ɛ (q(w, M, H t ); X t, Y τ t τ=1 T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
74 Practice BayesPA with Latent Space Latent Space Illustration global&variables w local&variables H t (a) sampling analysis m draw&a&mini5 batch infer&the&hidden& structure t = 1,2,..., X t,y t (X t,y t ) q * (H t ) (b) up glo T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
75 Practice BayesPA with Latent Space Latent Space Illustration global&variables w Formulation local&variables Maintain distribution q(w, M) of global variables. H t (a) sampling analysis m draw&a&mini5 batch infer&the&hidden& structure t = 1,2,..., X t,y t (X t,y t ) q * (H t ) [ ] ) min KL q(w, M, H t) q t(w, M)p(x t w, M, H t) + 2c l ɛ (q(w, M, H t); X t, Y t, (13) q F t (b) up glo T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
76 Practice BayesPA with Latent Space Latent Space Illustration global&variables w Formulation local&variables Maintain distribution q(w, M) of global variables. H t (a) sampling analysis m draw&a&mini5 batch infer&the&hidden& structure t = 1,2,..., X t,y t (X t,y t ) q * (H t ) [ ] ) min KL q(w, M, H t) q t(w, M)p(x t w, M, H t) + 2c l ɛ (q(w, M, H t); X t, Y t, (13) q F t Problem Efficient inference for marginal q(w, M)? T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35 (b) up glo
77 Practice Latent Space Approximate Inference in BayesPA Variational Inference Idea: restrict F t to be tractable. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
78 Practice Latent Space Approximate Inference in BayesPA Variational Inference Idea: restrict F t to be tractable. sampling analysis model&update draw&a&mini5 batch infer&the&hidden& structure (b) update&distribu8on&of& global&variables (X t,y t ) q * (H t ) q* (w,m) 1. Structured mean field q (w, M t, H t ) = q (w)q (M t ) q(h t ). T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
79 Practice Latent Space Approximate Inference in BayesPA Variational Inference Idea: restrict F t to be tractable. sampling analysis model&update draw&a&mini5 batch infer&the&hidden& structure (b) update&distribu8on&of& global&variables (X t,y t ) q * (H t ) q* (w,m) 1. Structured mean field q (w, M t, H t ) = q (w)q (M t ) q(h t ). 2. Solve BayesPA problem. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
80 Practice Latent Space Approximate Inference in BayesPA Variational Inference Idea: restrict F t to be tractable. sampling analysis model&update draw&a&mini5 batch infer&the&hidden& structure (b) update&distribu8on&of& global&variables (X t,y t ) q * (H t ) q* (w,m) 1. Structured mean field q (w, M t, H t ) = q (w)q (M t ) q(h t ). 2. Solve BayesPA problem. 3. Output q t+1 (M, w) = q (w)q (M). T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
81 Practice Topic Modeling Application to Topic Modeling Review of Maximum Entropy Discriminant LDA (MedLDA) T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
82 Practice Topic Modeling Online Max-Margin Topic Models Online MedLDA For MedLDA, latent BayesPA concretizes to [ ] KL q(w, Φ, Z t ) q t (w, Φ)p 0 (Z t )p(x t Φ, Z t ) min q,ξ d + 2c d B t l ɛ (w; z t, y t ), (14) s.t. q(w, Φ, Z t ) P. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
83 Practice Topic Modeling Online Max-Margin Topic Models Online MedLDA For MedLDA, latent BayesPA concretizes to [ ] KL q(w, Φ, Z t ) q t (w, Φ)p 0 (Z t )p(x t Φ, Z t ) min q,ξ d + 2c d B t l ɛ (w; z t, y t ), (14) In our paper s.t. q(w, Φ, Z t ) P. Derive efficient algorithms for online MedLDA with both l Avg ɛ, l Gibbs ɛ, T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
84 Practice Topic Modeling Online Max-Margin Topic Models Online MedLDA For MedLDA, latent BayesPA concretizes to [ ] KL q(w, Φ, Z t ) q t (w, Φ)p 0 (Z t )p(x t Φ, Z t ) min q,ξ d + 2c d B t l ɛ (w; z t, y t ), (14) In our paper s.t. q(w, Φ, Z t ) P. Derive efficient algorithms for online MedLDA with both l Avg ɛ, l Gibbs ɛ, Provided nonparametric extension: Med Hierarchical Dirichlet Process (MedHDP). T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
85 Practice Topic Modeling Online Max-Margin Topic Models Online MedLDA For MedLDA, latent BayesPA concretizes to [ ] KL q(w, Φ, Z t ) q t (w, Φ)p 0 (Z t )p(x t Φ, Z t ) min q,ξ d + 2c d B t l ɛ (w; z t, y t ), (14) In our paper s.t. q(w, Φ, Z t ) P. Derive efficient algorithms for online MedLDA with both l Avg ɛ, l Gibbs ɛ, Provided nonparametric extension: Med Hierarchical Dirichlet Process (MedHDP). BayesPA for nonparametric models: derive efficient algorithms for online MedHDP. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
86 Outline Experiment Introduction Motivation Framework Theory Connection to PA Regret Analysis Practice Extension Latent Space Topic Modeling Experiment T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
87 Experiment Classification on 20NG 20 Newsgroup categories of documents. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
88 Experiment Classification on 20NG 20 Newsgroup categories of documents. 2. training/testing split 11269/7505. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
89 Experiment Classification on 20NG 20 Newsgroup categories of documents. 2. training/testing split 11269/ one vs. all for multiple categories. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
90 Experiment Classification on 20NG 20 Newsgroup categories of documents. 2. training/testing split 11269/ one vs. all for multiple categories. Baseline batch MedLDA (Zhu el al. 2009, 2013) batch MedHDP. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
91 Experiment Classification on 20NG 20 Newsgroup categories of documents. 2. training/testing split 11269/ one vs. all for multiple categories. Baseline batch MedLDA (Zhu el al. 2009, 2013) batch MedHDP. sparse stochastic LDA (mimno et al. 2012), truncation-free HDP (Wang & Blei 2012). T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
92 Experiment Results of Online MedLDA Passes through dataset. 1 pamedlda ave pamedlda gibbs MedLDA gibbs gmedlda splda+svm Error Rate #Passes Through the Dataset T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
93 Experiment Results of Online MedLDA Accuracy and running time pamedlda ave pamedlda gibbs MedLDA gibbs gmedlda splda+svm Accuracy Time(s) #Topic #Topic T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
94 Experiment T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
95 Experiment T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
96 Experiment Sensitivity with Batchsize Batch Size Batch Test Error Test Error CPU Seconds (Log Scale) CPU Seconds (Log Scale) T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
97 Experiment Online MedHDP on 20NG pamedhdp ave pamedhdp gibbs bmedhdp tfhdp+svm Accuracy Time(s) #Topic #Topic T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
98 Experiment Multi-Task Learning on Wikipedia Dataset Training data: 1.1 M Wikipedia articles, 20 categories. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
99 Experiment Multi-Task Learning on Wikipedia Dataset Training data: 1.1 M Wikipedia articles, 20 categories. Test data: 5000 Wikipedia articles. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
100 Experiment Multi-Task Learning on Wikipedia Dataset Training data: 1.1 M Wikipedia articles, 20 categories. Test data: 5000 Wikipedia articles. Vocabulary: 917,683 unique items. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
101 Experiment Multi-Task Learning on Wikipedia Dataset Training data: 1.1 M Wikipedia articles, 20 categories. Test data: 5000 Wikipedia articles. Vocabulary: 917,683 unique items. Result 0.6 pamedldamt ave pamedhdpmt ave pamedldamt gibbs pamedhdpmt gibbs MedLDAmt MedHDPmt F1 Score Time (Seconds) T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
102 Contribution Experiment A generic online learning framework. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
103 Contribution Experiment A generic online learning framework. Extend Passive-Aggressive learning to Bayesian setting. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
104 Contribution Experiment A generic online learning framework. Extend Passive-Aggressive learning to Bayesian setting. Generalize Sequential Bayesian Inference to discriminative learning. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
105 Contribution Experiment A generic online learning framework. Extend Passive-Aggressive learning to Bayesian setting. Generalize Sequential Bayesian Inference to discriminative learning. Develop efficient online learning algorithm for max-margin topic models. T. Shi, J. Zhu (Tsinghua) BayesPA ICML / 35
Online Bayesian Passive-Agressive Learning
Online Bayesian Passive-Agressive Learning International Conference on Machine Learning, 2014 Tianlin Shi Jun Zhu Tsinghua University, China 21 August 2015 Presented by: Kyle Ulrich Introduction Online
More informationOnline Bayesian Passive-Aggressive Learning"
Online Bayesian Passive-Aggressive Learning" Tianlin Shi! stl501@gmail.com! Jun Zhu! dcszj@mail.tsinghua.edu.cn! The BIG DATA challenge" Large amounts of data.! Big data:!! Big Science: 25 PB annual data.!
More informationOnline Bayesian Passive-Aggressive Learning
Journal of Machine Learning Research 1 2014 1-48 Submitted 4/00; Published 10/00 Online Bayesian Passive-Aggressive Learning Tianlin Shi Institute for Interdisciplinary Information Sciences Tsinghua University
More informationSmall-variance Asymptotics for Dirichlet Process Mixtures of SVMs
Small-variance Asymptotics for Dirichlet Process Mixtures of SVMs Yining Wang Jun Zhu Tsinghua University July, 2014 Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, 2014 1 / 25 Outline
More informationClassical Predictive Models
Laplace Max-margin Markov Networks Recent Advances in Learning SPARSE Structured I/O Models: models, algorithms, and applications Eric Xing epxing@cs.cmu.edu Machine Learning Dept./Language Technology
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationOnline Passive-Aggressive Algorithms
Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il
More informationOnline Passive-Aggressive Algorithms
Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il
More informationTopic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up
Much of this material is adapted from Blei 2003. Many of the images were taken from the Internet February 20, 2014 Suppose we have a large number of books. Each is about several unknown topics. How can
More informationCollapsed Variational Inference for Sum-Product Networks
for Sum-Product Networks Han Zhao 1, Tameem Adel 2, Geoff Gordon 1, Brandon Amos 1 Presented by: Han Zhao Carnegie Mellon University 1, University of Amsterdam 2 June. 20th, 2016 1 / 26 Outline Background
More informationOnline Passive-Aggressive Algorithms. Tirgul 11
Online Passive-Aggressive Algorithms Tirgul 11 Multi-Label Classification 2 Multilabel Problem: Example Mapping Apps to smart folders: Assign an installed app to one or more folders Candy Crush Saga 3
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationUsing Both Latent and Supervised Shared Topics for Multitask Learning
Using Both Latent and Supervised Shared Topics for Multitask Learning Ayan Acharya, Aditya Rawal, Raymond J. Mooney, Eduardo R. Hruschka UT Austin, Dept. of ECE September 21, 2013 Problem Definition An
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationFast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine
Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava nitish@cs.toronto.edu Ruslan Salahutdinov rsalahu@cs.toronto.edu Geoffrey Hinton hinton@cs.toronto.edu
More informationLecture 16: Perceptron and Exponential Weights Algorithm
EECS 598-005: Theoretical Foundations of Machine Learning Fall 2015 Lecture 16: Perceptron and Exponential Weights Algorithm Lecturer: Jacob Abernethy Scribes: Yue Wang, Editors: Weiqing Yu and Andrew
More informationPosterior Regularization
Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods
More informationLittlestone s Dimension and Online Learnability
Littlestone s Dimension and Online Learnability Shai Shalev-Shwartz Toyota Technological Institute at Chicago The Hebrew University Talk at UCSD workshop, February, 2009 Joint work with Shai Ben-David
More informationBayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan
Bayesian Learning CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Bayes Theorem MAP Learners Bayes optimal classifier Naïve Bayes classifier Example text classification Bayesian networks
More informationDiversity-Promoting Bayesian Learning of Latent Variable Models
Diversity-Promoting Bayesian Learning of Latent Variable Models Pengtao Xie 1, Jun Zhu 1,2 and Eric Xing 1 1 Machine Learning Department, Carnegie Mellon University 2 Department of Computer Science and
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationSparse Stochastic Inference for Latent Dirichlet Allocation
Sparse Stochastic Inference for Latent Dirichlet Allocation David Mimno 1, Matthew D. Hoffman 2, David M. Blei 1 1 Dept. of Computer Science, Princeton U. 2 Dept. of Statistics, Columbia U. Presentation
More informationProbabilistic Time Series Classification
Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign
More informationOnline multiclass learning with bandit feedback under a Passive-Aggressive approach
ESANN 205 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 22-24 April 205, i6doc.com publ., ISBN 978-28758704-8. Online
More informationThe Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees
The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904,
More informationGaussian Models
Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density
More informationLearning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationStreaming Variational Bayes
Streaming Variational Bayes Tamara Broderick, Nicholas Boyd, Andre Wibisono, Ashia C. Wilson, Michael I. Jordan UC Berkeley Discussion led by Miao Liu September 13, 2013 Introduction The SDA-Bayes Framework
More informationThe supervised hierarchical Dirichlet process
1 The supervised hierarchical Dirichlet process Andrew M. Dai and Amos J. Storkey Abstract We propose the supervised hierarchical Dirichlet process (shdp), a nonparametric generative model for the joint
More informationDEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures
More informationOn the Generalization Ability of Online Strongly Convex Programming Algorithms
On the Generalization Ability of Online Strongly Convex Programming Algorithms Sham M. Kakade I Chicago Chicago, IL 60637 sham@tti-c.org Ambuj ewari I Chicago Chicago, IL 60637 tewari@tti-c.org Abstract
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Max-margin learning of GM Eric Xing Lecture 28, Apr 28, 2014 b r a c e Reading: 1 Classical Predictive Models Input and output space: Predictive
More informationOnline Manifold Regularization: A New Learning Setting and Empirical Study
Online Manifold Regularization: A New Learning Setting and Empirical Study Andrew B. Goldberg 1, Ming Li 2, Xiaojin Zhu 1 1 Computer Sciences, University of Wisconsin Madison, USA. {goldberg,jerryzhu}@cs.wisc.edu
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationBayesian Nonparametrics for Speech and Signal Processing
Bayesian Nonparametrics for Speech and Signal Processing Michael I. Jordan University of California, Berkeley June 28, 2011 Acknowledgments: Emily Fox, Erik Sudderth, Yee Whye Teh, and Romain Thibaux Computer
More informationDistributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College
Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic
More informationOnline Forest Density Estimation
Online Forest Density Estimation Frédéric Koriche CRIL - CNRS UMR 8188, Univ. Artois koriche@cril.fr UAI 16 1 Outline 1 Probabilistic Graphical Models 2 Online Density Estimation 3 Online Forest Density
More informationDeep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationAdaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationMachine Learning for NLP
Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers
More informationDiscriminative Training of Mixed Membership Models
18 Discriminative Training of Mixed Membership Models Jun Zhu Department of Computer Science and Technology, State Key Laboratory of Intelligent Technology and Systems; Tsinghua National Laboratory for
More informationA Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank
A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank Shoaib Jameel Shoaib Jameel 1, Wai Lam 2, Steven Schockaert 1, and Lidong Bing 3 1 School of Computer Science and Informatics,
More informationU Logo Use Guidelines
Information Theory Lecture 3: Applications to Machine Learning U Logo Use Guidelines Mark Reid logo is a contemporary n of our heritage. presents our name, d and our motto: arn the nature of things. authenticity
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts
ICML 2015 Scalable Nonparametric Bayesian Inference on Point Processes with Gaussian Processes Machine Learning Research Group and Oxford-Man Institute University of Oxford July 8, 2015 Point Processes
More informationSum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017
Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth
More informationOnline Learning of Probabilistic Graphical Models
1/34 Online Learning of Probabilistic Graphical Models Frédéric Koriche CRIL - CNRS UMR 8188, Univ. Artois koriche@cril.fr CRIL-U Nankin 2016 Probabilistic Graphical Models 2/34 Outline 1 Probabilistic
More informationVariational Autoencoders
Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly
More informationGenerative MaxEnt Learning for Multiclass Classification
Generative Maximum Entropy Learning for Multiclass Classification A. Dukkipati, G. Pandey, D. Ghoshdastidar, P. Koley, D. M. V. S. Sriram Dept. of Computer Science and Automation Indian Institute of Science,
More informationBetter Algorithms for Selective Sampling
Francesco Orabona Nicolò Cesa-Bianchi DSI, Università degli Studi di Milano, Italy francesco@orabonacom nicolocesa-bianchi@unimiit Abstract We study online algorithms for selective sampling that use regularized
More informationLinear Classification
Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative
More informationFaster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions
Faster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions Mohammad Emtiyaz Khan, Reza Babanezhad, Wu Lin, Mark Schmidt, Masashi Sugiyama Conference on Uncertainty
More information6.867 Machine Learning
6.867 Machine Learning Problem Set 2 Due date: Wednesday October 6 Please address all questions and comments about this problem set to 6867-staff@csail.mit.edu. You will need to use MATLAB for some of
More informationPolyhedral Outer Approximations with Application to Natural Language Parsing
Polyhedral Outer Approximations with Application to Natural Language Parsing André F. T. Martins 1,2 Noah A. Smith 1 Eric P. Xing 1 1 Language Technologies Institute School of Computer Science Carnegie
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall
More informationRecurrent Latent Variable Networks for Session-Based Recommendation
Recurrent Latent Variable Networks for Session-Based Recommendation Panayiotis Christodoulou Cyprus University of Technology paa.christodoulou@edu.cut.ac.cy 27/8/2017 Panayiotis Christodoulou (C.U.T.)
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationEfficient Bandit Algorithms for Online Multiclass Prediction
Efficient Bandit Algorithms for Online Multiclass Prediction Sham Kakade, Shai Shalev-Shwartz and Ambuj Tewari Presented By: Nakul Verma Motivation In many learning applications, true class labels are
More informationLearning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationLinear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x))
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard and Mitch Marcus (and lots original slides by
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationAn empirical study about online learning with generalized passive-aggressive approaches
An empirical study about online learning with generalized passive-aggressive approaches Adrian Perez-Suay, Francesc J. Ferri, Miguel Arevalillo-Herráez, and Jesús V. Albert Dept. nformàtica, Universitat
More informationLearning, Games, and Networks
Learning, Games, and Networks Abhishek Sinha Laboratory for Information and Decision Systems MIT ML Talk Series @CNRG December 12, 2016 1 / 44 Outline 1 Prediction With Experts Advice 2 Application to
More informationThe No-Regret Framework for Online Learning
The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationAuthors: John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira and Jennifer Wortman (University of Pennsylvania)
Learning Bouds for Domain Adaptation Authors: John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira and Jennifer Wortman (University of Pennsylvania) Presentation by: Afshin Rostamizadeh (New York
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationPractical Agnostic Active Learning
Practical Agnostic Active Learning Alina Beygelzimer Yahoo Research based on joint work with Sanjoy Dasgupta, Daniel Hsu, John Langford, Francesco Orabona, Chicheng Zhang, and Tong Zhang * * introductory
More informationLatent Dirichlet Allocation Introduction/Overview
Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models
More information29 : Posterior Regularization
10-708: Probabilistic Graphical Models 10-708, Spring 2014 29 : Posterior Regularization Lecturer: Eric P. Xing Scribes: Felix Juefei Xu, Abhishek Chugh 1 Introduction This is the last lecture which tends
More informationPart 1: Expectation Propagation
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud
More informationIntro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation
Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor
More informationLecture 2 - Learning Binary & Multi-class Classifiers from Labelled Training Data
Lecture 2 - Learning Binary & Multi-class Classifiers from Labelled Training Data DD2424 March 23, 2017 Binary classification problem given labelled training data Have labelled training examples? Given
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationOnline Passive-Aggressive Algorithms
Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Joseph Keshet Shai Shalev-Shwartz Yoram Singer School of Computer Science and Engineering The Hebrew University Jerusalem, 91904, Israel CRAMMER@CIS.UPENN.EDU
More informationECE662: Pattern Recognition and Decision Making Processes: HW TWO
ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are
More informationAd Placement Strategies
Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad
More informationAccelerating Stochastic Optimization
Accelerating Stochastic Optimization Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem and Mobileye Master Class at Tel-Aviv, Tel-Aviv University, November 2014 Shalev-Shwartz
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationMAD-Bayes: MAP-based Asymptotic Derivations from Bayes
MAD-Bayes: MAP-based Asymptotic Derivations from Bayes Tamara Broderick Brian Kulis Michael I. Jordan Cat Clusters Mouse clusters Dog 1 Cat Clusters Dog Mouse Lizard Sheep Picture 1 Picture 2 Picture 3
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationOnline Learning Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das
Online Learning 9.520 Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das About this class Goal To introduce the general setting of online learning. To describe an online version of the RLS algorithm
More informationData Mining Techniques
Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!
More informationTopics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning
Topics Bayesian Learning Sattiraju Prabhakar CS898O: ML Wichita State University Objectives for Bayesian Learning Bayes Theorem and MAP Bayes Optimal Classifier Naïve Bayes Classifier An Example Classifying
More information