Putting the Bayes update to sleep

Size: px
Start display at page:

Download "Putting the Bayes update to sleep"

Transcription

1 Putting the Bayes update to sleep Manfred Warmuth UCSC AMS seminar Joint work with Wouter M. Koolen, Dmitry Adamskiy, Olivier Bousquet

2 Menu How adding one line of code to the multiplicative update imbues the algorithm with a long-term memory and makes it robust for data that shifts between a small number of modes MW (UCSC) Putting the Bayes update to sleep 2 / 43

3 Menu How adding one line of code to the multiplicative update Outline imbues the algorithm with a long-term memory and makes it robust for data that shifts between a small number of modes The Disk spin down problem The Mixing Past Posterior Update - long term versus short term memory Bayes in the tube The curse of the multiplicative update The two-track update - another update with a long-term memory Theoretical justification: Putting Bayes to sleep Open problems MW (UCSC) Putting the Bayes update to sleep 2 / 43

4 Disk spindown problem Experts a set of n fixed timeouts : τ 1, τ 2,..., τ n Master Alg. maintains a set of weights / shares: s 1, s 2,..., s n Multiplicative update Problem s t+1,i = s t,i e η energy usage of timeout i Z MW (UCSC) Putting the Bayes update to sleep 3 / 43

5 A simple way of making the algorithm robust Mix in little bit of uniform vector (initial prior) Has a Bayesian interpretation s = Multiplicative Update s = (1 α) s + α ( 1 N, 1 N,..., 1 N ) where α is small Called Fixed Share to Uniform Past MW (UCSC) Putting the Bayes update to sleep 4 / 43

6 A better way of making the algorithm robust Keep track of past average share vector r s = Multiplicative Update s = (1 α) s + α r Called Mixing Past Posterior Update (w. uniform distribution on the past) Facilitates switch to previously useful vector Long-term memory No obvious Bayesian interpretation MW (UCSC) Putting the Bayes update to sleep 5 / 43

7 Total loss as a function of time A D A D G A D MW (UCSC) Putting the Bayes update to sleep 6 / 43

8 The weights of the models over time trial log weight A D G others A D A D G A D MW (UCSC) Putting the Bayes update to sleep 7 / 43

9 The family of online learning algorithms Bayes MW (UCSC) Putting the Bayes update to sleep 8 / 43

10 The family of online learning algorithms Bayes homes in fast MW (UCSC) Putting the Bayes update to sleep 8 / 43

11 The family of online learning algorithms Bayes homes in fast simple, elegant baseline MW (UCSC) Putting the Bayes update to sleep 8 / 43

12 The family of online learning algorithms Bayes homes in fast simple, elegant baseline Fixed Share MW (UCSC) Putting the Bayes update to sleep 8 / 43

13 The family of online learning algorithms Bayes homes in fast simple, elegant baseline Fixed Share + adaptivity MW (UCSC) Putting the Bayes update to sleep 8 / 43

14 The family of online learning algorithms Bayes homes in fast simple, elegant baseline Fixed Share + adaptivity simple, elegant good performance MW (UCSC) Putting the Bayes update to sleep 8 / 43

15 The family of online learning algorithms Bayes Fixed Share Mixing Past Posteriors homes in fast simple, elegant baseline + adaptivity simple, elegant good performance MW (UCSC) Putting the Bayes update to sleep 8 / 43

16 The family of online learning algorithms Bayes Fixed Share Mixing Past Posteriors homes in fast + adaptivity + sparsity simple, elegant simple, elegant baseline good performance MW (UCSC) Putting the Bayes update to sleep 8 / 43

17 The family of online learning algorithms Bayes Fixed Share Mixing Past Posteriors homes in fast + adaptivity + sparsity simple, elegant simple, elegant self-referential mess baseline good performance great performance MW (UCSC) Putting the Bayes update to sleep 8 / 43

18 Our breakthrough Bayesian interpretation Mixing Past Posteriors Mixing Past Posteriors MW (UCSC) Putting the Bayes update to sleep 9 / 43

19 Quick lookahead! The trick is to put some models to sleep Asleep model don t predict Their weights not updated Later... The Gaussian Mixture example MW (UCSC) Putting the Bayes update to sleep 10 / 43

20 The updates in Bayesian notation Fixed Share s temp m = s new m P(y m) sold m j P(y m ) s old m Bayes posterior = (1 α) s temp m + α uniform Bayes over partition models MW (UCSC) Putting the Bayes update to sleep 11 / 43

21 The updates in Bayesian notation Fixed Share s temp m = s new m P(y m) sold m j P(y m ) s old m Bayes posterior = (1 α) s temp m + α uniform Bayes over partition models Mixing Past Posteriors s temp m = s new m P(y m) sold m j P(y m ) sm old = (1 α) sm temp Bayes posterior + α past average s temp m weights MW (UCSC) Putting the Bayes update to sleep 11 / 43

22 The updates in Bayesian notation Fixed Share s temp m = s new m P(y m) sold m j P(y m ) s old m Bayes posterior = (1 α) s temp m + α uniform Bayes over partition models Mixing Past Posteriors s temp m = s new m P(y m) sold m j P(y m ) sm old = (1 α) sm temp Bayes posterior + α past average s temp m weights Self-referential mess Will give Bayesian interpretation using the sleeping trick MW (UCSC) Putting the Bayes update to sleep 11 / 43

23 Multiplicative updates Blessing Speed Curse Loss of diversity How to maintain variety when multiplicative updates are used MW (UCSC) Putting the Bayes update to sleep 12 / 43

24 Multiplicative updates Blessing Speed Curse Loss of diversity How to maintain variety when multiplicative updates are used Next Implementation of Bayes in the tube Towards a rudimentary model of evolution MW (UCSC) Putting the Bayes update to sleep 12 / 43

25 In-vitro selection algorithm - for finding RNA strand that binds to protein Make tube of random RNA strands protein sheet for t=1 to 20 do 1 Dip sheet with protein into tube 2 Pull out and wash off RNA that stuck 3 Multiply washed off RNA back to original amount (normalization) MW (UCSC) Putting the Bayes update to sleep 13 / 43

26 In-vitro selection algorithm - for finding RNA strand that binds to protein Make tube of random RNA strands protein sheet for t=1 to 20 do 1 Dip sheet with protein into tube 2 Pull out and wash off RNA that stuck 3 Multiply washed off RNA back to original amount (normalization) Can t be done with computer because RNA strands per liter of soup MW (UCSC) Putting the Bayes update to sleep 13 / 43

27 Basic scheme Start with unit amount of random RNA Loop 1 Functional separation into good RNA and bad RNA 2 Amplify good RNA to unit amount MW (UCSC) Putting the Bayes update to sleep 14 / 43

28 Duplicating DNA with PCR needed for normalization step Invented by Kary Mullis 1985 Heat - double stranded DNA comes apart Cool - Short primers hybridize at the ends Taq Polymerase runs along DNA strand and complements bases Back to double stranded DNA MW (UCSC) Putting the Bayes update to sleep 15 / 43

29 Duplicating DNA with PCR needed for normalization step Invented by Kary Mullis 1985 Heat - double stranded DNA comes apart Cool - Short primers hybridize at the ends Taq Polymerase runs along DNA strand and complements bases Back to double stranded DNA Ideally all DNA strands multiplied by factor of 2 MW (UCSC) Putting the Bayes update to sleep 15 / 43

30 The caveat Not many interesting/specific functional separations found Need high throughput for functional separation MW (UCSC) Putting the Bayes update to sleep 16 / 43

31 Mathematical description of in-vitro selection Name the unknown RNA strands as 1, 2,... n where n (# of strands in 1 liter) s i 0 is share of RNA i MW (UCSC) Putting the Bayes update to sleep 17 / 43

32 Mathematical description of in-vitro selection Name the unknown RNA strands as 1, 2,... n where n (# of strands in 1 liter) s i 0 is share of RNA i Contents of tube represented as share vector s = (s 1, s 2,..., s n ) When tube has unit amount, then s 1 + s s n = 1 MW (UCSC) Putting the Bayes update to sleep 17 / 43

33 Mathematical description of in-vitro selection Name the unknown RNA strands as 1, 2,... n where n (# of strands in 1 liter) s i 0 is share of RNA i Contents of tube represented as share vector s = (s 1, s 2,..., s n ) When tube has unit amount, then s 1 + s s n = 1 W i [0, 1] is the fraction of one unit of RNA i that is good W i is fitness of RNA strand i for attaching to protein Protein represented as fitness vector W = (W 1, W 2,..., W n ) MW (UCSC) Putting the Bayes update to sleep 17 / 43

34 Mathematical description of in-vitro selection Name the unknown RNA strands as 1, 2,... n where n (# of strands in 1 liter) s i 0 is share of RNA i Contents of tube represented as share vector s = (s 1, s 2,..., s n ) When tube has unit amount, then s 1 + s s n = 1 W i [0, 1] is the fraction of one unit of RNA i that is good W i is fitness of RNA strand i for attaching to protein Protein represented as fitness vector W = (W 1, W 2,..., W n ) Vectors s, W unknown: Blind Computation! MW (UCSC) Putting the Bayes update to sleep 17 / 43

35 Mathematical description of in-vitro selection Name the unknown RNA strands as 1, 2,... n where n (# of strands in 1 liter) s i 0 is share of RNA i Contents of tube represented as share vector s = (s 1, s 2,..., s n ) When tube has unit amount, then s 1 + s s n = 1 W i [0, 1] is the fraction of one unit of RNA i that is good W i is fitness of RNA strand i for attaching to protein Protein represented as fitness vector W = (W 1, W 2,..., W n ) Vectors s, W unknown: Blind Computation! Strong assumption: Fitness W i independent of share vector s MW (UCSC) Putting the Bayes update to sleep 17 / 43

36 Update Good RNA in tube s: s 1 W 1 + s 2 W s n W n = s W Amplification: Good share of RNA i is s i W i - multiplied by factor F If precise, then all good RNA multiplied by same factor F Final tube at end of loop F s W Since final tube has unit amount of RNA Update in each loop F s W = 1 and F = 1 s W s i := s iw i s W MW (UCSC) Putting the Bayes update to sleep 18 / 43

37 Implementing Bayes rule in RNA s i is prior P(i) W i [0..1] is probability P(Good i) s W is probability P(Good) s i := s iw i s W is Bayes rule P(i Good) = }{{} posterior prior {}}{ P(i) P(Good i) P(Good) MW (UCSC) Putting the Bayes update to sleep 19 / 43

38 Iterating Bayes Rule with same data likelihoods W 1 =.9 s 1 shares W 2 =.75 s 2 s 3 W 3 =.8 Initial s = (.01,.39,.6) W = (.9,.75,.8) t MW (UCSC) Putting the Bayes update to sleep 20 / 43

39 max i W i eventually wins.9 shares.84.7 Largest x i always increasing Smallest x i always decreasing.85 s t,i = s 0,i W i t normalization t is speed parameter MW (UCSC) Putting the Bayes update to sleep 21 / 43

40 The best are too good Multiplicative update s i s i fitness factor i }{{} 0 MW (UCSC) Putting the Bayes update to sleep 22 / 43

41 The best are too good Multiplicative update s i s i fitness factor i }{{} 0 Blessing Best get amplified exponentially fast Can do this with variables MW (UCSC) Putting the Bayes update to sleep 22 / 43

42 The best are too good Multiplicative update s i s i fitness factor i }{{} 0 Blessing Best get amplified exponentially fast Can do this with variables Curse The best wipe out the others Loss of diversity MW (UCSC) Putting the Bayes update to sleep 22 / 43

43 View of evolution Simple view Inheritance - mutation - selection for the fittest with multiplicative update MW (UCSC) Putting the Bayes update to sleep 23 / 43

44 View of evolution Simple view Inheritance - mutation - selection for the fittest with multiplicative update New view Multiplicative updates are brittle A mechanism is needed to ameliorate the curse of the multiplicative update Machine Learning and Nature uses various mechanisms Here we focus on the sleeping trick MW (UCSC) Putting the Bayes update to sleep 23 / 43

45 Long term memory with the 2 Track Update Multiplicative update + small soup exchange gives long-term memory MW (UCSC) Putting the Bayes update to sleep 24 / 43

46 Long term memory with the 2 Track Update Multiplicative update + small soup exchange gives long-term memory Awake Selection+PCR γ Asleep 1 γ γ = Pass through β α+β α β 1 α 1 β Selection+PCR Pass through α β 1 α 1 β MW (UCSC) Putting the Bayes update to sleep 24 / 43

47 2 sets of weights Awake weights: a Asleep weights: s 1. update: a m = multiplicative update of a s m = s ie pass thru 2. update: ã = (1 α) a m + β s m s = α a m + (1 β) s m MW (UCSC) Putting the Bayes update to sleep 25 / 43

48 Total loss as a function of time A D A D G A D MW (UCSC) Putting the Bayes update to sleep 26 / 43

49 The weights a of the awake models over time trial log weight A D G others A D A D G A D MW (UCSC) Putting the Bayes update to sleep 27 / 43

50 The weights s of the asleep models over time trial log weight A D G others A D A D G A D MW (UCSC) Putting the Bayes update to sleep 28 / 43

51 Bayesian interpretation Point of departure: - We are trying to predict a sequence y 1, y 2,.... Definition A model issues a prediction each round. We denote the prediction of model m in round t by P(y t y <t, m). Say we have several models m = 1,..., M. How to combine their predictions? MW (UCSC) Putting the Bayes update to sleep 29 / 43

52 Bayesian answer Place a prior P(m) on models. Bayesian predictive distribution P(y t y <t ) = M P(y t y <t, m)p(m y <t ) m=1 where posterior distribution is incrementally updated by P(m y t ) = P(y t y <t, m)p(m y <t ) P(y t y <t ) Bayes is fast: predict in O(M) time per round. Bayes is good: regret w.r.t. model m on data y T bounded by T ( ln P(y t y <t ) ( ln P(y t y <t, m))) ln P(m) t=1 MW (UCSC) Putting the Bayes update to sleep 30 / 43

53 Specialists Definition (FSSW97,CV09) A specialist may or may not issue a prediction. Prediction P(y t y <t, m) only available for awake m W t. Say we have several specialists m = 1,..., M. How to combine their predictions? MW (UCSC) Putting the Bayes update to sleep 31 / 43

54 Specialists Definition (FSSW97,CV09) A specialist may or may not issue a prediction. Prediction P(y t y <t, m) only available for awake m W t. Say we have several specialists m = 1,..., M. How to combine their predictions? If we imagine a prediction for the sleeping specialists, we can use Bayes Choice: sleeping specialists predict with Bayesian predictive distribution That sounds circular. Because it is. $%!#? MW (UCSC) Putting the Bayes update to sleep 31 / 43

55 Bayes for specialists Place a prior P(m) on models. Bayesian predictive distribution P(y t y <t ) = P(y t y <t, m)p(m y <t ) + P(y t y <t )P(m y <t ) m W t m / W t MW (UCSC) Putting the Bayes update to sleep 32 / 43

56 Bayes for specialists Place a prior P(m) on models. Bayesian predictive distribution P(y t y <t ) = P(y t y <t, m)p(m y <t ) + P(y t y <t )P(m y <t ) m W t m / W t has solution P(y t y <t ) = m W t P(y t y <t, m)p(m y <t ). m W t P(m y <t ) MW (UCSC) Putting the Bayes update to sleep 32 / 43

57 Bayes for specialists Place a prior P(m) on models. Bayesian predictive distribution P(y t y <t ) = P(y t y <t, m)p(m y <t ) + P(y t y <t )P(m y <t ) m W t m / W t has solution m W P(y t y <t ) = t P(y t y <t, m)p(m y <t ). m W t P(m y <t ) The posterior distribution is incrementally updated by P(y t y <t,m)p(m y <t) P(y t y <t) if m W t, P(m y t ) = P(y t y <t)p(m y <t) P(y t y = P(m y <t) <t ) if m / W t. Bayes is fast: predict in O(M) time per round. Bayes is good: regret w.r.t. model m on data y T bounded by ( ln P(y t y <t ) + ln P(y t y <t, m)) ln P(m). t : 1 t T and m W t MW (UCSC) Putting the Bayes update to sleep 32 / 43

58 Bayes for specialists with awake vector a(m) is percentage of awakeness Bayesian predictive distribution P(y t y <t ) = m (a(m)p(y t y <t, m) + (1 a(m))p(y t y <t )) P(m y <t ) MW (UCSC) Putting the Bayes update to sleep 33 / 43

59 Bayes for specialists with awake vector a(m) is percentage of awakeness Bayesian predictive distribution P(y t y <t ) = m (a(m)p(y t y <t, m) + (1 a(m))p(y t y <t )) P(m y <t ) has solution P(y t y <t ) = m a(m)p(y t y <t, m)p(m y <t ) m a(m)p(m y. <t) MW (UCSC) Putting the Bayes update to sleep 33 / 43

60 Bayes for specialists with awake vector a(m) is percentage of awakeness Bayesian predictive distribution P(y t y <t ) = m (a(m)p(y t y <t, m) + (1 a(m))p(y t y <t )) P(m y <t ) has solution P(y t y <t ) = m a(m)p(y t y <t, m)p(m y <t ) m a(m)p(m y. <t) The posterior distribution is incrementally updated by P(m y t ) = (a(m)p(y t y <t, m) + (1 a(m))p(y t y <t ))P(m y <t ) P(y t y <t ) ( = a(m) P(y ) t y <t, m) + (1 a(m)) P(m y <t ) P(y t y <t ) MW (UCSC) Putting the Bayes update to sleep 33 / 43

61 With awake vector continued How done with in vitro selection? Split P(m y <t ) into - awake part a(m)p(m y <t ) and - asleep part (1 a(m))p(m y <t ) Do multiplicative update on the awake part only so that total weight of that part is not changing Recombine (total weight of posterior is one) MW (UCSC) Putting the Bayes update to sleep 34 / 43

62 Neural nets ( ) a w x ŷ = σ a w w = a w exp( η.l) }{{} Z awake + (1 a) w }{{} asleep where Z = a w exp( η.l) a w w 1 = 1 When all awake (a = 1) the update becomes w = w exp( η.l) w exp( η.l) When all asleep (a = 0) the update becomes w = w However the prediction does not make sense in this case MW (UCSC) Putting the Bayes update to sleep 35 / 43

63 The specialists trick Specialists / sleeping trick just a curiosity? MW (UCSC) Putting the Bayes update to sleep 36 / 43

64 The specialists trick Specialists / sleeping trick just a curiosity? Specialists trick: Start with input models Create many derived virtual specialists Control how they predict Control when they are awake Run Bayes on all these specialists Carefully choose prior Low regret w.r.t. class of intricate combinations of input models. Fast execution. Bayes on specialists collapses MW (UCSC) Putting the Bayes update to sleep 36 / 43

65 Application 1: Freund s Problem In practice: Large number M of models Some are good some of the time Most are bad all the time We do not know in advance which models will be useful when. Goal: compare favourably on data y T with the best alternation of N M models with B T blocks. T outcomes A B C A C B C A partition model, N = 3 good models, B = 8 blocks Bayesian reflex: average all such partition models. NP hard. MW (UCSC) Putting the Bayes update to sleep 37 / 43

66 Application 1: sleeping solution Create partition specialists T outcomes zzzzzzzzz... A A A B B C C C Bayes is fast: O(M) time per trial, O(M) space Bayes is good: regret close to information-theoretic lower bound MW (UCSC) Putting the Bayes update to sleep 38 / 43

67 Application 1: sleeping solution Create partition specialists T outcomes zzzzzzzzz... A A A B B C C C Bayes is fast: O(M) time per trial, O(M) space Bayes is good: regret close to information-theoretic lower bound Bayesian interpretation for Mixing Past Posterior and 2 Track updates Slightly improved regret bound Explain mysterious factor 2 MW (UCSC) Putting the Bayes update to sleep 38 / 43

68 Application 2: online multitask learning In practice: Large number M of models Data y 1, y 2,... from K interleaved tasks Observe task label κ 1, κ 2,... before prediction We do not know in advance which tasks are similar. Goal: Compare favourably to best clustering of tasks with N K cells K tasks A B A C B A cluster model, N = 3 cells Bayesian reflex: average all cluster models. NP hard. MW (UCSC) Putting the Bayes update to sleep 39 / 43

69 Application 2: sleeping solution Create cluster specialists A B zzzzzzzzz... K tasks A A B C Bayes is fast: O(M) time per trial, O(MK) space Bayes is good: regret close to information-theoretic lower bound Intriguing algorithm Regret independent of task switch count MW (UCSC) Putting the Bayes update to sleep 40 / 43

70 Pictorial summary of Bayesian interpretation T outcomes Bayes A fixed expert Fixed Share A B C A C B C A partition models MixingPastPosteriors / 2 track zzzzzzzzz... A A A C C C B B partition specialists MW (UCSC) Putting the Bayes update to sleep 41 / 43

71 Conclusion The specialists trick Avoids NP-hard offline problem Ditch Bayes on coordinated comparator models Instead create uncoordinated virtual specialists that cover time line Craft prior so that Bayes collapses (efficient emulation) Small regret MW (UCSC) Putting the Bayes update to sleep 42 / 43

72 Open problems Is sleeping / specialist trick necessary? How related to Chinese Restaurant Problem? Are 3 tracks better than 2 tracks for some applications? Does nature use 3 or more tracks? Practical update for shifting Gaussians? MW (UCSC) Putting the Bayes update to sleep 43 / 43

Putting Bayes to sleep

Putting Bayes to sleep Putting Bayes to sleep Wouter M. Koolen Dmitry Adamskiy Manfred Warmuth Thursday 30 th August, 2012 Menu How a beautiful and intriguing piece of technology provides new insights in existing methods and

More information

The Blessing and the Curse

The Blessing and the Curse The Blessing and the Curse of the Multiplicative Updates Manfred K. Warmuth University of California, Santa Cruz CMPS 272, Feb 31, 2012 Thanks to David Ilstrup and Anindya Sen for helping with the slides

More information

CS281B/Stat241B. Statistical Learning Theory. Lecture 14.

CS281B/Stat241B. Statistical Learning Theory. Lecture 14. CS281B/Stat241B. Statistical Learning Theory. Lecture 14. Wouter M. Koolen Convex losses Exp-concave losses Mixable losses The gradient trick Specialists 1 Menu Today we solve new online learning problems

More information

Learning, Games, and Networks

Learning, Games, and Networks Learning, Games, and Networks Abhishek Sinha Laboratory for Information and Decision Systems MIT ML Talk Series @CNRG December 12, 2016 1 / 44 Outline 1 Prediction With Experts Advice 2 Application to

More information

Leaving The Span Manfred K. Warmuth and Vishy Vishwanathan

Leaving The Span Manfred K. Warmuth and Vishy Vishwanathan Leaving The Span Manfred K. Warmuth and Vishy Vishwanathan UCSC and NICTA Talk at NYAS Conference, 10-27-06 Thanks to Dima and Jun 1 Let s keep it simple Linear Regression 10 8 6 4 2 0 2 4 6 8 8 6 4 2

More information

Clustering and Gaussian Mixture Models

Clustering and Gaussian Mixture Models Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap

More information

Putting Bayes to sleep

Putting Bayes to sleep Putting Bayes to sleep Wouter M. Koolen Dmitry Adamskiy Manfred K. Warmuth Abstract We consider sequential prediction algorithms that are given the predictions from a set of models as inputs. If the nature

More information

On-line Variance Minimization

On-line Variance Minimization On-line Variance Minimization Manfred Warmuth Dima Kuzmin University of California - Santa Cruz 19th Annual Conference on Learning Theory M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06

More information

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures

More information

Multivariate Bayesian Linear Regression MLAI Lecture 11

Multivariate Bayesian Linear Regression MLAI Lecture 11 Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate

More information

Lecture 8. Instructor: Haipeng Luo

Lecture 8. Instructor: Haipeng Luo Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Mixtures of Gaussians. Sargur Srihari

Mixtures of Gaussians. Sargur Srihari Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

Online prediction with expert advise

Online prediction with expert advise Online prediction with expert advise Jyrki Kivinen Australian National University http://axiom.anu.edu.au/~kivinen Contents 1. Online prediction: introductory example, basic setting 2. Classification with

More information

U Logo Use Guidelines

U Logo Use Guidelines Information Theory Lecture 3: Applications to Machine Learning U Logo Use Guidelines Mark Reid logo is a contemporary n of our heritage. presents our name, d and our motto: arn the nature of things. authenticity

More information

PetaBricks: Variable Accuracy and Online Learning

PetaBricks: Variable Accuracy and Online Learning PetaBricks: Variable Accuracy and Online Learning Jason Ansel MIT - CSAIL May 4, 2011 Jason Ansel (MIT) PetaBricks May 4, 2011 1 / 40 Outline 1 Motivating Example 2 PetaBricks Language Overview 3 Variable

More information

Learning with Large Number of Experts: Component Hedge Algorithm

Learning with Large Number of Experts: Component Hedge Algorithm Learning with Large Number of Experts: Component Hedge Algorithm Giulia DeSalvo and Vitaly Kuznetsov Courant Institute March 24th, 215 1 / 3 Learning with Large Number of Experts Regret of RWM is O( T

More information

Ad Placement Strategies

Ad Placement Strategies Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

K-Lists. Anindya Sen and Corrie Scalisi. June 11, University of California, Santa Cruz. Anindya Sen and Corrie Scalisi (UCSC) K-Lists 1 / 25

K-Lists. Anindya Sen and Corrie Scalisi. June 11, University of California, Santa Cruz. Anindya Sen and Corrie Scalisi (UCSC) K-Lists 1 / 25 K-Lists Anindya Sen and Corrie Scalisi University of California, Santa Cruz June 11, 2007 Anindya Sen and Corrie Scalisi (UCSC) K-Lists 1 / 25 Outline 1 Introduction 2 Experts 3 Noise-Free Case Deterministic

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

Applications of on-line prediction. in telecommunication problems

Applications of on-line prediction. in telecommunication problems Applications of on-line prediction in telecommunication problems Gábor Lugosi Pompeu Fabra University, Barcelona based on joint work with András György and Tamás Linder 1 Outline On-line prediction; Some

More information

Midterm, Fall 2003

Midterm, Fall 2003 5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are

More information

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas Hidden Markov Models Vibhav Gogate The University of Texas at Dallas Intro to AI (CS 4365) Many slides over the course adapted from either Dan Klein, Luke Zettlemoyer, Stuart Russell or Andrew Moore 1

More information

Lecture 6: Model Checking and Selection

Lecture 6: Model Checking and Selection Lecture 6: Model Checking and Selection Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 27, 2014 Model selection We often have multiple modeling choices that are equally sensible: M 1,, M T. Which

More information

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 4 1 Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

CIS519: Applied Machine Learning Fall Homework 5. Due: December 10 th, 2018, 11:59 PM

CIS519: Applied Machine Learning Fall Homework 5. Due: December 10 th, 2018, 11:59 PM CIS59: Applied Machine Learning Fall 208 Homework 5 Handed Out: December 5 th, 208 Due: December 0 th, 208, :59 PM Feel free to talk to other members of the class in doing the homework. I am more concerned

More information

Notes on AdaGrad. Joseph Perla 2014

Notes on AdaGrad. Joseph Perla 2014 Notes on AdaGrad Joseph Perla 2014 1 Introduction Stochastic Gradient Descent (SGD) is a common online learning algorithm for optimizing convex (and often non-convex) functions in machine learning today.

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

CSCE 478/878 Lecture 6: Bayesian Learning

CSCE 478/878 Lecture 6: Bayesian Learning Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell

More information

Assessing Regime Uncertainty Through Reversible Jump McMC

Assessing Regime Uncertainty Through Reversible Jump McMC Assessing Regime Uncertainty Through Reversible Jump McMC August 14, 2008 1 Introduction Background Research Question 2 The RJMcMC Method McMC RJMcMC Algorithm Dependent Proposals Independent Proposals

More information

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships

More information

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Outline Maximum likelihood (ML) Priors, and

More information

Ways to make neural networks generalize better

Ways to make neural networks generalize better Ways to make neural networks generalize better Seminar in Deep Learning University of Tartu 04 / 10 / 2014 Pihel Saatmann Topics Overview of ways to improve generalization Limiting the size of the weights

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Spatial Bayesian Nonparametrics for Natural Image Segmentation

Spatial Bayesian Nonparametrics for Natural Image Segmentation Spatial Bayesian Nonparametrics for Natural Image Segmentation Erik Sudderth Brown University Joint work with Michael Jordan University of California Soumya Ghosh Brown University Parsing Visual Scenes

More information

Online Kernel PCA with Entropic Matrix Updates

Online Kernel PCA with Entropic Matrix Updates Online Kernel PCA with Entropic Matrix Updates Dima Kuzmin Manfred K. Warmuth University of California - Santa Cruz ICML 2007, Corvallis, Oregon April 23, 2008 D. Kuzmin, M. Warmuth (UCSC) Online Kernel

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models (DPGMM)

Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models (DPGMM) Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models (DPGMM) Rajarshi Das Language Technologies Institute School of Computer Science Carnegie Mellon University rajarshd@cs.cmu.edu Sunday

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

THE WEIGHTED MAJORITY ALGORITHM

THE WEIGHTED MAJORITY ALGORITHM THE WEIGHTED MAJORITY ALGORITHM Csaba Szepesvári University of Alberta CMPUT 654 E-mail: szepesva@ualberta.ca UofA, October 3, 2006 OUTLINE 1 PREDICTION WITH EXPERT ADVICE 2 HALVING: FIND THE PERFECT EXPERT!

More information

Bayesian Support Vector Machines for Feature Ranking and Selection

Bayesian Support Vector Machines for Feature Ranking and Selection Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction

More information

Ad Placement Strategies

Ad Placement Strategies Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory

More information

CSEP 573: Artificial Intelligence

CSEP 573: Artificial Intelligence CSEP 573: Artificial Intelligence Hidden Markov Models Luke Zettlemoyer Many slides over the course adapted from either Dan Klein, Stuart Russell, Andrew Moore, Ali Farhadi, or Dan Weld 1 Outline Probabilistic

More information

Manfred K. Warmuth - UCSC S.V.N. Vishwanathan - Purdue & Microsoft Research. Updated: March 23, Warmuth (UCSC) ICML 09 Boosting Tutorial 1 / 62

Manfred K. Warmuth - UCSC S.V.N. Vishwanathan - Purdue & Microsoft Research. Updated: March 23, Warmuth (UCSC) ICML 09 Boosting Tutorial 1 / 62 Updated: March 23, 2010 Warmuth (UCSC) ICML 09 Boosting Tutorial 1 / 62 ICML 2009 Tutorial Survey of Boosting from an Optimization Perspective Part I: Entropy Regularized LPBoost Part II: Boosting from

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory

More information

Will Penny. SPM for MEG/EEG, 15th May 2012

Will Penny. SPM for MEG/EEG, 15th May 2012 SPM for MEG/EEG, 15th May 2012 A prior distribution over model space p(m) (or hypothesis space ) can be updated to a posterior distribution after observing data y. This is implemented using Bayes rule

More information

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Algorithms for Classification: The Basic Methods

Algorithms for Classification: The Basic Methods Algorithms for Classification: The Basic Methods Outline Simplicity first: 1R Naïve Bayes 2 Classification Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.

More information

Stochastic Processes, Kernel Regression, Infinite Mixture Models

Stochastic Processes, Kernel Regression, Infinite Mixture Models Stochastic Processes, Kernel Regression, Infinite Mixture Models Gabriel Huang (TA for Simon Lacoste-Julien) IFT 6269 : Probabilistic Graphical Models - Fall 2018 Stochastic Process = Random Function 2

More information

Optimal and Adaptive Online Learning

Optimal and Adaptive Online Learning Optimal and Adaptive Online Learning Haipeng Luo Advisor: Robert Schapire Computer Science Department Princeton University Examples of Online Learning (a) Spam detection 2 / 34 Examples of Online Learning

More information

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun Y. LeCun: Machine Learning and Pattern Recognition p. 1/? MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun The Courant Institute, New York University http://yann.lecun.com

More information

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials) Markov Networks l Like Bayes Nets l Graphical model that describes joint probability distribution using tables (AKA potentials) l Nodes are random variables l Labels are outcomes over the variables Markov

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger

More information

Notes and Solutions #6 Meeting of 21 October 2008

Notes and Solutions #6 Meeting of 21 October 2008 HAVERFORD COLLEGE PROBLEM SOLVING GROUP 008-9 Notes and Solutions #6 Meeting of October 008 The Two Envelope Problem ( Box Problem ) An extremely wealthy intergalactic charitable institution has awarded

More information

Will Penny. DCM short course, Paris 2012

Will Penny. DCM short course, Paris 2012 DCM short course, Paris 2012 Ten Simple Rules Stephan et al. Neuroimage, 2010 Model Structure Bayes rule for models A prior distribution over model space p(m) (or hypothesis space ) can be updated to a

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

More Spectral Clustering and an Introduction to Conjugacy

More Spectral Clustering and an Introduction to Conjugacy CS8B/Stat4B: Advanced Topics in Learning & Decision Making More Spectral Clustering and an Introduction to Conjugacy Lecturer: Michael I. Jordan Scribe: Marco Barreno Monday, April 5, 004. Back to spectral

More information

Will Penny. SPM short course for M/EEG, London 2013

Will Penny. SPM short course for M/EEG, London 2013 SPM short course for M/EEG, London 2013 Ten Simple Rules Stephan et al. Neuroimage, 2010 Model Structure Bayes rule for models A prior distribution over model space p(m) (or hypothesis space ) can be updated

More information

Probability models for machine learning. Advanced topics ML4bio 2016 Alan Moses

Probability models for machine learning. Advanced topics ML4bio 2016 Alan Moses Probability models for machine learning Advanced topics ML4bio 2016 Alan Moses What did we cover in this course so far? 4 major areas of machine learning: Clustering Dimensionality reduction Classification

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning

More information

Bregman Divergences for Data Mining Meta-Algorithms

Bregman Divergences for Data Mining Meta-Algorithms p.1/?? Bregman Divergences for Data Mining Meta-Algorithms Joydeep Ghosh University of Texas at Austin ghosh@ece.utexas.edu Reflects joint work with Arindam Banerjee, Srujana Merugu, Inderjit Dhillon,

More information

CSC2515 Assignment #2

CSC2515 Assignment #2 CSC2515 Assignment #2 Due: Nov.4, 2pm at the START of class Worth: 18% Late assignments not accepted. 1 Pseudo-Bayesian Linear Regression (3%) In this question you will dabble in Bayesian statistics and

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

The Free Matrix Lunch

The Free Matrix Lunch The Free Matrix Lunch Wouter M. Koolen Wojciech Kot lowski Manfred K. Warmuth Tuesday 24 th April, 2012 Koolen, Kot lowski, Warmuth (RHUL) The Free Matrix Lunch Tuesday 24 th April, 2012 1 / 26 Introduction

More information

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials) Markov Networks l Like Bayes Nets l Graph model that describes joint probability distribution using tables (AKA potentials) l Nodes are random variables l Labels are outcomes over the variables Markov

More information

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann Machine Learning! in just a few minutes Jan Peters Gerhard Neumann 1 Purpose of this Lecture Foundations of machine learning tools for robotics We focus on regression methods and general principles Often

More information

The information complexity of sequential resource allocation

The information complexity of sequential resource allocation The information complexity of sequential resource allocation Emilie Kaufmann, joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishan SMILE Seminar, ENS, June 8th, 205 Sequential allocation

More information

The Naïve Bayes Classifier. Machine Learning Fall 2017

The Naïve Bayes Classifier. Machine Learning Fall 2017 The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning

More information

SVM optimization and Kernel methods

SVM optimization and Kernel methods Announcements SVM optimization and Kernel methods w 4 is up. Due in a week. Kaggle is up 4/13/17 1 4/13/17 2 Outline Review SVM optimization Non-linear transformations in SVM Soft-margin SVM Goal: Find

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Advanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12

Advanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12 Advanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12 Lecture 4: Multiarmed Bandit in the Adversarial Model Lecturer: Yishay Mansour Scribe: Shai Vardi 4.1 Lecture Overview

More information

New Algorithms for Contextual Bandits

New Algorithms for Contextual Bandits New Algorithms for Contextual Bandits Lev Reyzin Georgia Institute of Technology Work done at Yahoo! 1 S A. Beygelzimer, J. Langford, L. Li, L. Reyzin, R.E. Schapire Contextual Bandit Algorithms with Supervised

More information

Clustering, K-Means, EM Tutorial

Clustering, K-Means, EM Tutorial Clustering, K-Means, EM Tutorial Kamyar Ghasemipour Parts taken from Shikhar Sharma, Wenjie Luo, and Boris Ivanovic s tutorial slides, as well as lecture notes Organization: Clustering Motivation K-Means

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Learning Task Grouping and Overlap in Multi-Task Learning

Learning Task Grouping and Overlap in Multi-Task Learning Learning Task Grouping and Overlap in Multi-Task Learning Abhishek Kumar Hal Daumé III Department of Computer Science University of Mayland, College Park 20 May 2013 Proceedings of the 29 th International

More information

HMM part 1. Dr Philip Jackson

HMM part 1. Dr Philip Jackson Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. HMM part 1 Dr Philip Jackson Probability fundamentals Markov models State topology diagrams Hidden Markov models -

More information

Model Comparison. Course on Bayesian Inference, WTCN, UCL, February Model Comparison. Bayes rule for models. Linear Models. AIC and BIC.

Model Comparison. Course on Bayesian Inference, WTCN, UCL, February Model Comparison. Bayes rule for models. Linear Models. AIC and BIC. Course on Bayesian Inference, WTCN, UCL, February 2013 A prior distribution over model space p(m) (or hypothesis space ) can be updated to a posterior distribution after observing data y. This is implemented

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

CS 188: Artificial Intelligence. Bayes Nets

CS 188: Artificial Intelligence. Bayes Nets CS 188: Artificial Intelligence Probabilistic Inference: Enumeration, Variable Elimination, Sampling Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew

More information

Using Additive Expert Ensembles to Cope with Concept Drift

Using Additive Expert Ensembles to Cope with Concept Drift Jeremy Z. Kolter and Marcus A. Maloof {jzk, maloof}@cs.georgetown.edu Department of Computer Science, Georgetown University, Washington, DC 20057-1232, USA Abstract We consider online learning where the

More information

Statistical learning. Chapter 20, Sections 1 3 1

Statistical learning. Chapter 20, Sections 1 3 1 Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

Clustering non-stationary data streams and its applications

Clustering non-stationary data streams and its applications Clustering non-stationary data streams and its applications Amr Abdullatif DIBRIS, University of Genoa, Italy amr.abdullatif@unige.it June 22th, 2016 Outline Introduction 1 Introduction 2 3 4 INTRODUCTION

More information

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence

More information

Gibbs Sampling Methods for Multiple Sequence Alignment

Gibbs Sampling Methods for Multiple Sequence Alignment Gibbs Sampling Methods for Multiple Sequence Alignment Scott C. Schmidler 1 Jun S. Liu 2 1 Section on Medical Informatics and 2 Department of Statistics Stanford University 11/17/99 1 Outline Statistical

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 12: Probability 3/2/2011 Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. 1 Announcements P3 due on Monday (3/7) at 4:59pm W3 going out

More information

Gaussian Processes for Machine Learning

Gaussian Processes for Machine Learning Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of

More information