Variations sur la borne PAC-bayésienne

Size: px
Start display at page:

Download "Variations sur la borne PAC-bayésienne"

Transcription

1 Variations sur la borne PAC-bayésienne Pascal Germain INRIA Paris Équipe SIRRA Séminaires du département d informatique et de génie logiciel Université Laval 11 juillet 2016 Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

2 Plan 1 Introduction 2 PAC-Bayesian Theory Majority Vote Classifiers A General PAC-Bayesian Theorem Transductive Bounds Rényi-Based Bounds Regression Bounds 3 PAC-Bayesian Theory Meets Bayesian Inference PAC-Bayesian Marginal Likelihood Model Comparaison Toy xperiments: Linear Regression 4 Conclusion and future works Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

3 Plan 1 Introduction 2 PAC-Bayesian Theory Majority Vote Classifiers A General PAC-Bayesian Theorem Transductive Bounds Rényi-Based Bounds Regression Bounds 3 PAC-Bayesian Theory Meets Bayesian Inference PAC-Bayesian Marginal Likelihood Model Comparaison Toy xperiments: Linear Regression 4 Conclusion and future works Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

4 Definitions Learning example An example x, y X Y is a description-label pair. Data generating distribution ach example is an i.i.d. observation from distribution D on X Y. Learning sample S = { x 1, y 1, x 2, y 2,..., x n, y n } D n Predictors or hypothesis h : X Y, h H Learning algorithm AS h Loss function l : H X Y R mpirical loss L l S h = 1 n n lh, x i, y i i=1 Generalization loss L l D h = x,y D lh, x i, y i Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

5 PAC-Bayesian Theory Initiated by McAllester 1999, the PAC-Bayesian theory gives PAC generalization guarantees to Bayesian like algorithms. PAC guarantees Probably Approximately Correct With probability at least 1, the loss of predictor h is less than ε Pr LDh l ε L Sh, l n,,... 1 S D n Bayesian flavor Given: A prior distribution P on H. A posterior distribution Q on H. Pr L l l Dh ε L S h, n,, P,... 1 S D n Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

6 A Classical PAC-Bayesian Theorem PAC-Bayesian theorem adapted from McAllester 1999, 2003 For any distribution D on X Y, for any set of predictors H, for any loss l : H X Y [0, 1], for any distribution P on H, for any 0, 1], we have, Pr S D n Q on H : L l Dh l L S h + [ 1 2n KLQ P + ln 2 n ] 1, where KLQ P = Training bound ln Qh Ph is the Kullback-Leibler divergence. Gives generalization guarantees not based on testing sample. Valid for all posterior Q on H Inspiration for conceiving new learning algorithms. Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

7 Plan 1 Introduction 2 PAC-Bayesian Theory Majority Vote Classifiers A General PAC-Bayesian Theorem Transductive Bounds Rényi-Based Bounds Regression Bounds 3 PAC-Bayesian Theory Meets Bayesian Inference PAC-Bayesian Marginal Likelihood Model Comparaison Toy xperiments: Linear Regression 4 Conclusion and future works Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

8 Plan 1 Introduction 2 PAC-Bayesian Theory Majority Vote Classifiers A General PAC-Bayesian Theorem Transductive Bounds Rényi-Based Bounds Regression Bounds 3 PAC-Bayesian Theory Meets Bayesian Inference PAC-Bayesian Marginal Likelihood Model Comparaison Toy xperiments: Linear Regression 4 Conclusion and future works Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

9 Majority Vote Classifiers Consider a binary classification problem, where Y = { 1, +1} and the set H contains binary voters h : X { 1, +1} Weighted majority vote To predict the label of x X, the classifier asks for the prevailing opinion B Q x = sgn hx Many learning algorithms output majority vote classifiers AdaBoost, Random Forests, Bagging,... Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

10 A Surrogate Loss Majority vote risk R D B Q = Pr x,y D B Q x y where I[a] = 1 if predicate a is true; I[a] = 0 otherwise. [ ] = I y hx 0 x,y D Gibbs Risk / Linear Loss The stochastic Gibbs classifier G Q x draws h H according to Q and output h x. [ ] R D G Q = I hx y x,y D = L l 01 D h, where l 01h, x, y = I [ hx y ]. Factor two It is well-known that R D B Q 2 R D G Q y hx y hx See Germain, Lacasse, Laviolette, Marchand, and Roy 2015, JMLR for an extensive study. Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

11 Plan 1 Introduction 2 PAC-Bayesian Theory Majority Vote Classifiers A General PAC-Bayesian Theorem Transductive Bounds Rényi-Based Bounds Regression Bounds 3 PAC-Bayesian Theory Meets Bayesian Inference PAC-Bayesian Marginal Likelihood Model Comparaison Toy xperiments: Linear Regression 4 Conclusion and future works Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

12 A General PAC-Bayesian Theorem -function: «distance» between R S G Q et R D G Q Convex function : [0, 1] [0, 1] R. General theorem Bégin et al. 2014, 2016; Germain 2015 For any distribution D on X Y, for any set H of voters, for any distribution P on H, for any 0, 1], and for any -function, we have, with probability at least 1 over the choice of S D n, Q on H : RS G Q, R D G Q 1 n [ KLQ P + ln I ] n, where I n = sup r [0,1] [ n ] n k r k 1 r n k e n k n, r. }{{} k=0 Bin k;n,r Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

13 General theorem Pr S D n Q on H : RS G Q, R D G Q 1 n [ KLQ P + ln I ] n 1. Interpretation R S G Q, r 1 n [KLQ P +ln I n ] r Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

14 General theorem Pr S D n Q on H : RS G Q, R D G Q 1 n [ KLQ P + ln I ] n 1. Proof ideas. Change of Measure Inequality For any P and Q on H, and for any measurable function φ : H R, we have φh KLQ P + ln eφh. h P Markov s inequality Pr X a X Pr X X a 1. Probability of observing k misclassifications among n examples Given a voter h, consider a binomial variable of n trials with success L l 01 D h: Pr L l 01 S D n S h= k n k n k = L l01 n D k h 1 L l 01 D h = Bin k; n, L l 01 D h Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

15 General theorem Pr S D n Q on H : RS G Q, R D G Q 1 n [ KLQ P + ln I ] n 1. Proof. n L S l h, L l Dh L l S h, LDh l Jensen s Inequality n Change of measure KLQ P + ln e n L h P Markov s Inequality 1 KLQ P + ln 1 xpectation swap = KLQ P + ln 1 Binomial law = KLQ P + ln 1 S D n h P h P Supremum over risk KLQ P + ln 1 sup r [0,1] = KLQ P + ln 1 I n. l S h,l D l h e h P S D l n L S h,ld l h l L nen S h,l l D h n Bin k; n, LDh l e n k n,l D l h k=0 [ n Bin k; n, r ] e n k n, r Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33 k=0

16 General theorem Pr S D n Q on H : RS G Q, R D G Q 1 n [ KLQ P + ln I ] n 1. Corollary [...] with probability at least 1 over the choice of S D n, for all Q on H : [ ] a kl RS G Q, R D G Q 1 n KLQ P + ln 2 n, Langford and Seeger 2001 b R D G Q R [ ] 1 S G Q + 2n KLQ P + ln 2 n, McAllester 1999, 2003 c R D G Q c 1 R 1 e c S G Q + 1 [ ] n KLQ P + ln 1, Catoni 2007 d R D G Q R S G Q + 1 [ λ KLQ P + ln 1 + f λ, n]. Alquier et al klq, p = q ln q 1 q p + 1 q ln 1 p 2q p2, c q, p = ln[1 1 e c p] c q, λ q, p = λ n p q. Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

17 Plan 1 Introduction 2 PAC-Bayesian Theory Majority Vote Classifiers A General PAC-Bayesian Theorem Transductive Bounds Rényi-Based Bounds Regression Bounds 3 PAC-Bayesian Theory Meets Bayesian Inference PAC-Bayesian Marginal Likelihood Model Comparaison Toy xperiments: Linear Regression 4 Conclusion and future works Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

18 Transductive Learning Assumption xamples are drawn without replacement from a finite set Z of size N. S = { x 1, y 1, x 2, y 2,..., x n, y n } Z U = { x n+1,, x n+2,,..., x N, } = Z \ S Inductive learning: n draws with replacement according to D Binomial law. Transductive learning: n draws without replacement in Z Hypergeometric law. Theorem Bégin et al For any set Z of N examples, [...] with probability at least 1 over the choice of n examples among Z, Q on H : RS G Q, R Z G Q 1 [ KLQ P + ln T ] n, N, n where T n, N = max K=0...N min[n,k] k=max[0,k+n N] K k N K N n n k e n k n, K N Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33.

19 Theorem Pr S [Z] n Q on H : RS G Q, R Z G Q 1 n [ KLQ P + ln T ] n, N 1. Proof. n L S l l h, L Z h L l S h, L Z h l Jensen s inequality n Change of measure KLQ P + ln e n L h P Markov s inequality 1 KLQ P + ln 1 xpectations swap = KLQ P + ln 1 Hypergeometric law = KLQ P + ln 1 h P Supremum over risk KLQ P + ln 1 max K=0...N = KLQ P + ln 1 T n, N. l S h, L Z l h l n L e S h, L Z l h S [Z] n h P l L S h P S [Z] nen h, L Z l h k N L l Z h k k N N L Z l h n k e n k n, L Z l h N n [ ] K k N K n k e n k N n, K N n Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

20 A New Transductive Bound for the Gibbs Risk Corollary Bégin et al [...] with probability at least 1 over the choice of n examples among Z, Q on H : R Z G Q R [ 1 S G Q + n N 2n KLQ P + ln 3 lnn ] n1 n N. Theorem Derbeko et al Q on H : R Z G Q R ] S G Q + [KLQ P + ln nn n N 2n ] 1 m N mn +17 [KLQ P +ln 2m 1 [ 1 m N KLQ P +ln 3lnm m1 m N 2m [ ] 1 KLQ P +ln 2 m 2m ] Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33 N

21 Plan 1 Introduction 2 PAC-Bayesian Theory Majority Vote Classifiers A General PAC-Bayesian Theorem Transductive Bounds Rényi-Based Bounds Regression Bounds 3 PAC-Bayesian Theory Meets Bayesian Inference PAC-Bayesian Marginal Likelihood Model Comparaison Toy xperiments: Linear Regression 4 Conclusion and future works Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

22 A New Change of Measure Kullback-Leibler Change of Measure Inequality For any P and Q on H, and for any φ : H R, we have φh KLQ P + ln h P eφh. Rényi Change of Measure Inequality Atar and Merhav 2015 For any P and Q on H, any φ : H R, and for any α > 1, we have α α 1 ln φh D αq P + ln φh α α 1, h P with D α Q P = 1 [ α 1 ln and h P lim D α Q P = KLQ P. α 1 Qh α ] Ph KLQ P, Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

23 Rényi-Based General Theorem Theorem Bégin et al [...] for any α > 1, with probability at least 1 over the choice of S D n, with Q on H: ln RS G Q, R D G Q 1 [ α D α Q P+ ln IR n, ] α, and α := I R n, α = α α 1 > 1. sup r [0,1] [ n Bin k; n, r ] k n, rα, k=0 Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

24 Rényi-Based General Theorem Pr S D n Q on H : ln RS G Q, R D G Q 1 α [ D αq P+ ln IR n, α ] 1. Proof. α := α α ln L S l h, L l Dh Jensen s Inequality α ln L l S h, L l Dh Change of measure D αq P + ln l L S h, L l h P Dh α Markov s Inequality 1 D αq P + ln 1 xpectation swap = D αq P + ln 1 Binomial law = D αq P + ln 1 S D n h P h P Supremum over risk D αq P + ln 1 sup r [0,1] L l S h, L l h P Dh α S D n L l S h, L Dh l α n k=0 α 1 Bin k; n, L l Dh k n, L l Dh α [ n Bin k; n, r ] k, n rα k=0 = D αq P + ln 1 IR n, α. Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

25 mpirical Study Majority votes of 500 decision trees on Mushroom dataset Weak Decision Trees Strong Decision Trees 0.00 R D G Q Jensen s inequality Change of measure Markov s inequality Supremum over risk KLQ P and := 2q p 2 D αq P and := 2q p 2 KLQ P and := klq, p D αq P and := klq, p Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

26 Plan 1 Introduction 2 PAC-Bayesian Theory Majority Vote Classifiers A General PAC-Bayesian Theorem Transductive Bounds Rényi-Based Bounds Regression Bounds 3 PAC-Bayesian Theory Meets Bayesian Inference PAC-Bayesian Marginal Likelihood Model Comparaison Toy xperiments: Linear Regression 4 Conclusion and future works Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

27 PAC-Bayesian Bounds for Regression Lemma Maurer 2004 For any l : H X Y [0, 1], and convex : [0, 1] [0, 1] R, n S D en L l S h, LD l h Bin k; n, LDh l e n k n, L D l h k=0 General theorem for regression with bounded losses For any distribution D on X Y, for any set H of predictors, for any l : H X Y [0, 1] for any distribution P on H, for any 0, 1], and for any -function, we have, with probability at least 1 over the choice of S D n, Q on H : L Sh, l L l Dh 1 n [ KLQ P + ln I ] n. Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

28 General theorem for regression with bounded losses Pr S D n Q on H : L S l h, L l Dh 1 n [ KLQ P + ln I n ] 1. Proof. n L S l h, L l Dh L l S h, LDh l Jensen s Inequality n Change of measure KLQ P + ln e n L h P Markov s Inequality 1 KLQ P + ln 1 xpectation swap = KLQ P + ln 1 Maurer s Lemma KLQ P + ln 1 S D n h P h P Supremum over risk KLQ P + ln 1 sup r [0,1] l S h,l D l h e h P S D l n L S h,ld l h l L en n S h,l l D h n Bin k; n, LDh l e n k n,l D l h k=0 = KLQ P + ln 1 I n. [ n Bin k; n, r ] e n k n, r Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33 k=0

29 PAC-Bayesian Bounds for Regression General theorem for regression with bounded losses Pr S D n Q on H : L S l h, L l Dh 1 n [ KLQ P + ln I n ] 1. Corollary [...] with probability at least 1 over the choice of S D n, for all Q on H : [ a kl L S lh, L D l h 1 n KLQ P + ln 2 n [ b LD l h l 1 L S h + 2n KLQ P + ln 2 n ], Langford and Seeger 2001 ], McAllester 1999, 2003 c d LD c l h 1 1 e c L l D h L l S h + 1 λ L l S h + 1 n [ KLQ P + ln 1 ], Catoni 2007 [ KLQ P + ln 1 + f λ, n]. Alquier et al Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

30 Plan 1 Introduction 2 PAC-Bayesian Theory Majority Vote Classifiers A General PAC-Bayesian Theorem Transductive Bounds Rényi-Based Bounds Regression Bounds 3 PAC-Bayesian Theory Meets Bayesian Inference PAC-Bayesian Marginal Likelihood Model Comparaison Toy xperiments: Linear Regression 4 Conclusion and future works Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

31 Plan 1 Introduction 2 PAC-Bayesian Theory Majority Vote Classifiers A General PAC-Bayesian Theorem Transductive Bounds Rényi-Based Bounds Regression Bounds 3 PAC-Bayesian Theory Meets Bayesian Inference PAC-Bayesian Marginal Likelihood Model Comparaison Toy xperiments: Linear Regression 4 Conclusion and future works Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

32 Optimal Gibbs Posterior Corollary [...] with probability at least 1 over the choice of S D n, for all Q on H : c d LD c l h 1 1 e c LD l h L l S h + 1 λ L l S h + 1 n [ KLQ P + ln 1 ], Catoni 2007 [ KLQ P + ln 1 + f λ, n]. Alquier et al From an algorithm design perspective, Corollary c suggests optimizing the following trade-off: c n R S G Q + KLQ P, which also minimizes d, with λ := c n. The optimal Gibbs posterior is given by Q c h = 1 Z S Ph e c n L l S h. See Catoni 2007, Alquier et al. 2015,... Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

33 Tying the Concepts Let us denote Θ as the set of all possible model parameters. Bayesian Rule pθ X, Y = pθ py X, θ py X where X = {x 1,..., x n }, Y = {y 1,..., y n }, and pθ py X, θ, pθ is the prior for each θ Θ similar to P over H pθ X, Y is the posterior for each θ Θ similar to Q over H py X, θ is the likelihood of the parameters θ given the sample S. Negative log-likelihood loss function Then, L l nll S θ = 1 n l nll θ, x, y = ln 1 py x,θ. n l nll θ, x i, y i = 1 n i=1 n ln py i x i, θ = 1 ln py X, θ. n Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33 i=1

34 Rediscovering the Marginal Likelihood With the negative log-likelihood loss, the Bayesian and PAC-Bayesian posteriors align: pθ X, Y = pθ py X, θ py X = l Pθ e n L nll S θ Z S = Q θ. The normalization constant Z S corresponds to the marginal likelihood Z S = py X = Pθ e n L l nll S θ dθ. Putting back the posterior inside the PAC-Bayesian bounds, we obtain: l n L nll S θ + KLQ P θ Q = n = Θ Θ Pθ e n L l nll θ S l L nll S θ dθ + Z S Θ ] [ln 1ZS Pθ e n L l nll θ S Z S dθ = Θ [ Pθ e n L l nll θ S ln Z S Z S Z S ln 1 Z S = ln Z S. Pθ e n L l nll S Pθ Z S Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33 θ ] dθ

35 From the Marginal Likelihood to PAC-Bayesian Bounds Corollary Germain, Bach, Lacoste, Lacoste-Julien 2016 Given a data distribution D, a parameter set Θ, a prior distribution P over Θ, a 0, 1], if l nll lies in [a, b], we have, with probability at least 1 over the choice of S D n, c d θ Q Ll nll D θ a + b a 1 e a b [1 e a n Z S θ Q Ll nll D θ 1 2 b a2 1 n ln Z S Take home message! The marginal likelihood minimizes some PAC-Bayesian Bounds under the negative log-likelihood loss function. ], Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

36 Plan 1 Introduction 2 PAC-Bayesian Theory Majority Vote Classifiers A General PAC-Bayesian Theorem Transductive Bounds Rényi-Based Bounds Regression Bounds 3 PAC-Bayesian Theory Meets Bayesian Inference PAC-Bayesian Marginal Likelihood Model Comparaison Toy xperiments: Linear Regression 4 Conclusion and future works Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

37 Model Comparaison Consider a discrete set of L models {M i } L i=1 with parameters {Θ i} L i=1, a prior pm i over these models, for each model M i, a prior pθ M i = P i θ over Θ i Bayesian Rule pθ X, Y, M i = pθ M i py X, θ, M i py X, M i where the model evidence is py X, M i = pθ M i py X, θ, M i dθ = Z S,i. Θ i, Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

38 Bayesian Model Selection Slide from Zoubin Ghahramani s MLSS 2012 talk : Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

39 Frequentist Bounds for Bayesian Model Selection Alternative explanation for the Bayesian Occam s Razor phenomena... Corollary Germain, Bach, et al [...] with probability at least 1 over the choice of S D n, i {1,..., L} : c d θ Q i [ L l nll b a D θ a + 1 e a n Z 1 e a b S,i L θ Q Ll nll D θ 1 2 b a2 1 n ln Z S,i L. ], Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

40 Plan 1 Introduction 2 PAC-Bayesian Theory Majority Vote Classifiers A General PAC-Bayesian Theorem Transductive Bounds Rényi-Based Bounds Regression Bounds 3 PAC-Bayesian Theory Meets Bayesian Inference PAC-Bayesian Marginal Likelihood Model Comparaison Toy xperiments: Linear Regression 4 Conclusion and future works Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

41 Bayesian Linear Regression Consider a mapping function φ : X R d. Given x, y X Y, model parameters θ := w R d and a fixed noise σ, we consider the likelihood py x, w = N y w φx, σ 2 = 1 2πσ 2 e 1 2σ 2 y w φx 2 Thus, the negative log-likelihood loss function is l nll w, x, y = ln 1 py x, w = 1 2 ln2πσ σ 2 y w φx 2 We also consider an isotropic Gaussian prior of mean 0 and variance σ 2 P pw σ P = N w 0, σ 2 P = 1 e 1 2σ 2 w 2 P. 2π d σp 2 Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

42 Bayesian Linear Regression The Gibbs optimal posterior is given by Q w = pw σ, σ P = pw σ, σ P px, Y w, σ, σ P py X, σ, σ P = N w ŵ, A 1, where A := 1 Φ T Φ + 1 I and ŵ := 1 A 1 Φ T y. σ 2 σp 2 σ 2 The negative log marginal likelihood is ln Z S σ, σ P = 1 2σ 2 y Φŵ 2 + n 2 ln2πσ σ 2 P ŵ log A + d ln σ P = n L l nll S ŵ + 1 trφ T ΦA 1 2σ }{{ 2 } n w L l nll S w Q + 1 2σ P 2 tra 1 d σ 2 P ŵ log A + d ln σ P }{{} KL N ŵ, A 1 N 0, σ 2 PI. Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

43 Fitting y = sinx + ɛ with polynomial models Inspired by Bishop 2006 Illustrate the decomposition of the marginal likelihood into the empirical loss and KL-divergence. ln Z S = n θ Q l L nll S θ + KLQ P model d=1 model d=2 model d=3 model d=4 model d=5 model d=6 model d=7 sinx ln Z X,Y KLˆρ π n θ ˆρ L lnll X,Y θ n θ ˆρ L lnll D θ π π 3 2 π 2π x model degree d Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

44 Plan 1 Introduction 2 PAC-Bayesian Theory Majority Vote Classifiers A General PAC-Bayesian Theorem Transductive Bounds Rényi-Based Bounds Regression Bounds 3 PAC-Bayesian Theory Meets Bayesian Inference PAC-Bayesian Marginal Likelihood Model Comparaison Toy xperiments: Linear Regression 4 Conclusion and future works Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

45 Conclusion and future works I talked about.. A General theorem from which we recover existing results; My modular proof, easy to adapt to various frameworks; A direct link between PAC-Bayesian frequentist bounds and Bayesian model selection. I did not talk about... Our learning algorithms inspired by PAC-Bayesian Bounds; see Germain, Lacasse, Laviolette, and Marchand 2009 ICML and Germain, Habrard, et al ICML Our PAC-Bayesian theorems for unbounded losses. I plan to... see Germain, Bach, et al arxiv Study other Bayesian techniques from a PAC-Bayes perspective empirical Bayes, variational Bayes, etc. Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33

46 References I Alquier, Pierre, James Ridgway, and Nicolas Chopin On the properties of variational approximations of Gibbs posteriors. In: ArXiv e-prints. url: Atar, Rami and Neri Merhav Information-theoretic applications of the logarithmic probability comparison bound. In: I International Symposium on Information Theory ISIT. Bégin, Luc, Pascal Germain, François Laviolette, and Jean-Francis Roy PAC-Bayesian Theory for Transductive Learning. In: AISTATS PAC-Bayesian Bounds based on the Rényi Divergence. In: AISTATS. Bishop, Christopher M Pattern Recognition and Machine Learning Information Science and Statistics. Secaucus, NJ, USA: Springer-Verlag New York, Inc. Catoni, Olivier PAC-Bayesian supervised classification: the thermodynamics of statistical learning. Vol. 56. Inst. of Mathematical Statistic. Derbeko, Philip, Ran l-yaniv, and Ron Meir xplicit Learning Curves for Transduction and Application to Clustering and Compression Algorithms. In: J. Artif. Intell. Res. JAIR 22. Germain, Pascal Généralisations de la théorie PAC-bayésienne pour l apprentissage inductif, l apprentissage transductif et l adaptation de domaine. PhD thesis. Université Laval. url: Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 2

47 References II Germain, Pascal, Francis Bach, Alexandre Lacoste, and Simon Lacoste-Julien PAC-Bayesian Theory Meets Bayesian Inference. In: ArXiv e-prints. url: Germain, Pascal, Amaury Habrard, François Laviolette, and milie Morvant A New PAC-Bayesian Perspective on Domain Adaptation. In: ICML. url: Germain, Pascal, Alexandre Lacasse, Francois Laviolette, and Mario Marchand PAC-Bayesian learning of linear classifiers. In: ICML. Germain, Pascal, Alexandre Lacasse, Francois Laviolette, Mario Marchand, and Jean-Francis Roy Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm. In: JMLR 16. Langford, John and Matthias Seeger Bounds for averaging classifiers. Tech. rep. Carnegie Mellon, Departement of Computer Science. Maurer, Andreas A Note on the PAC-Bayesian Theorem. In: CoRR cs.lg/ McAllester, David Some PAC-Bayesian Theorems. In: Machine Learning PAC-Bayesian Stochastic Model selection. In: Machine Learning Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 2

Generalization of the PAC-Bayesian Theory

Generalization of the PAC-Bayesian Theory Generalization of the PACBayesian Theory and Applications to SemiSupervised Learning Pascal Germain INRIA Paris (SIERRA Team) Modal Seminar INRIA Lille January 24, 2017 Dans la vie, l essentiel est de

More information

PAC-Bayesian Bounds based on the Rényi Divergence

PAC-Bayesian Bounds based on the Rényi Divergence Luc Bégin Pascal Germain 2 François Laviolette 3 Jean-Francis Roy 3 lucbegin@umonctonca pascalgermain@inriafr {francoislaviolette, jean-francisroy}@iftulavalca Campus d dmundston, Université de Moncton,

More information

A tutorial on the Pac-Bayesian Theory. by François Laviolette

A tutorial on the Pac-Bayesian Theory. by François Laviolette A tutorial on the Pac-Bayesian Theory NIPS workshop - (Almost) 50 shades of Bayesian Learning: PAC-Bayesian trends and insights by François Laviolette Laboratoire du GRAAL, Université Laval December 9th

More information

PAC-Bayesian Generalization Bound for Multi-class Learning

PAC-Bayesian Generalization Bound for Multi-class Learning PAC-Bayesian Generalization Bound for Multi-class Learning Loubna BENABBOU Department of Industrial Engineering Ecole Mohammadia d Ingènieurs Mohammed V University in Rabat, Morocco Benabbou@emi.ac.ma

More information

PAC-Bayesian Theory Meets Bayesian Inference

PAC-Bayesian Theory Meets Bayesian Inference PAC-Bayesian Theory Meets Bayesian Inference Pascal Germain Francis Bach Alexandre Lacoste Simon Lacoste-Julien INRIA Paris - École Normale Supérieure, firstname.lastname@inria.fr Google, allac@google.com

More information

PAC-Bayesian Learning and Domain Adaptation

PAC-Bayesian Learning and Domain Adaptation PAC-Bayesian Learning and Domain Adaptation Pascal Germain 1 François Laviolette 1 Amaury Habrard 2 Emilie Morvant 3 1 GRAAL Machine Learning Research Group Département d informatique et de génie logiciel

More information

La théorie PAC-Bayes en apprentissage supervisé

La théorie PAC-Bayes en apprentissage supervisé La théorie PAC-Bayes en apprentissage supervisé Présentation au LRI de l université Paris XI François Laviolette, Laboratoire du GRAAL, Université Laval, Québec, Canada 14 dcembre 2010 Summary Aujourd

More information

Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm

Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm Journal of Machine Learning Research 16 2015 787-860 Submitted 5/13; Revised 9/14; Published 4/15 Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm Pascal Germain

More information

From PAC-Bayes Bounds to Quadratic Programs for Majority Votes

From PAC-Bayes Bounds to Quadratic Programs for Majority Votes François Laviolette FrancoisLaviolette@iftulavalca Mario Marchand MarioMarchand@iftulavalca Jean-Francis Roy Jean-FrancisRoy1@ulavalca Département d informatique et de génie logiciel, Université Laval,

More information

PAC-Bayes Risk Bounds for Sample-Compressed Gibbs Classifiers

PAC-Bayes Risk Bounds for Sample-Compressed Gibbs Classifiers PAC-Bayes Ris Bounds for Sample-Compressed Gibbs Classifiers François Laviolette Francois.Laviolette@ift.ulaval.ca Mario Marchand Mario.Marchand@ift.ulaval.ca Département d informatique et de génie logiciel,

More information

A Column Generation Bound Minimization Approach with PAC-Bayesian Generalization Guarantees

A Column Generation Bound Minimization Approach with PAC-Bayesian Generalization Guarantees A Column Generation Bound Minimization Approach with PAC-Bayesian Generalization Guarantees Jean-Francis Roy jean-francis.roy@ift.ulaval.ca Mario Marchand mario.marchand@ift.ulaval.ca François Laviolette

More information

Domain-Adversarial Neural Networks

Domain-Adversarial Neural Networks Domain-Adversarial Neural Networks Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand Département d informatique et de génie logiciel, Université Laval, Québec, Canada Département

More information

A Pseudo-Boolean Set Covering Machine

A Pseudo-Boolean Set Covering Machine A Pseudo-Boolean Set Covering Machine Pascal Germain, Sébastien Giguère, Jean-Francis Roy, Brice Zirakiza, François Laviolette, and Claude-Guy Quimper Département d informatique et de génie logiciel, Université

More information

Deep Neural Networks: From Flat Minima to Numerically Nonvacuous Generalization Bounds via PAC-Bayes

Deep Neural Networks: From Flat Minima to Numerically Nonvacuous Generalization Bounds via PAC-Bayes Deep Neural Networks: From Flat Minima to Numerically Nonvacuous Generalization Bounds via PAC-Bayes Daniel M. Roy University of Toronto; Vector Institute Joint work with Gintarė K. Džiugaitė University

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

A Strongly Quasiconvex PAC-Bayesian Bound

A Strongly Quasiconvex PAC-Bayesian Bound A Strongly Quasiconvex PAC-Bayesian Bound Yevgeny Seldin NIPS-2017 Workshop on (Almost) 50 Shades of Bayesian Learning: PAC-Bayesian trends and insights Based on joint work with Niklas Thiemann, Christian

More information

Generalization bounds

Generalization bounds Advanced Course in Machine Learning pring 200 Generalization bounds Handouts are jointly prepared by hie Mannor and hai halev-hwartz he problem of characterizing learnability is the most basic question

More information

Introduction to Statistical Learning Theory

Introduction to Statistical Learning Theory Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated

More information

1-bit Matrix Completion. PAC-Bayes and Variational Approximation

1-bit Matrix Completion. PAC-Bayes and Variational Approximation : PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Bayes In Paris, 5 January 2017 (Happy New Year!) Various Topics covered Matrix Completion PAC-Bayesian Estimation Variational

More information

Model Averaging With Holdout Estimation of the Posterior Distribution

Model Averaging With Holdout Estimation of the Posterior Distribution Model Averaging With Holdout stimation of the Posterior Distribution Alexandre Lacoste alexandre.lacoste.1@ulaval.ca François Laviolette francois.laviolette@ift.ulaval.ca Mario Marchand mario.marchand@ift.ulaval.ca

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

arxiv: v1 [stat.ml] 30 Oct 2018

arxiv: v1 [stat.ml] 30 Oct 2018 Gaël Letarte gael.letarte.@ulaval.ca Emilie Morvant 2 emilie.morvant@univ-st-etienne.fr Pascal Germain 3 pascal.germain@inria.fr Département d informatique et de génie logiciel, Université Laval, Québec,

More information

PAC-Bayesian Analysis for a two-step Hierarchical Multiview Learning Approach

PAC-Bayesian Analysis for a two-step Hierarchical Multiview Learning Approach PAC-Bayesian Analysis for a two-step Hierarchical Multiview Learning Approach Anil Goyal, milie Morvant, Pascal Germain, Massih-Reza Amini To cite this version: Anil Goyal, milie Morvant, Pascal Germain,

More information

PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification

PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification Emilie Morvant, Sokol Koço, Liva Ralaivola To cite this version: Emilie Morvant, Sokol Koço, Liva Ralaivola. PAC-Bayesian

More information

Generalization Bounds

Generalization Bounds Generalization Bounds Here we consider the problem of learning from binary labels. We assume training data D = x 1, y 1,... x N, y N with y t being one of the two values 1 or 1. We will assume that these

More information

Expectation Propagation for Approximate Bayesian Inference

Expectation Propagation for Approximate Bayesian Inference Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

1-bit Matrix Completion. PAC-Bayes and Variational Approximation

1-bit Matrix Completion. PAC-Bayes and Variational Approximation : PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Junior Conference on Data Science 2016 Université Paris Saclay, 15-16 September 2016 Introduction: Matrix Completion

More information

Multiview Boosting by Controlling the Diversity and the Accuracy of View-specific Voters

Multiview Boosting by Controlling the Diversity and the Accuracy of View-specific Voters Multiview Boosting by Controlling the Diversity and the Accuracy of View-specific Voters Anil Goyal 1,2 milie Morvant 1 Pascal Germain 3 Massih-Reza Amini 2 1 Univ Lyon, UJM-Saint-tienne, CNRS, Institut

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 4 1 Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

Variance Reduction and Ensemble Methods

Variance Reduction and Ensemble Methods Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis

More information

PAC-Bayes Generalization Bounds for Randomized Structured Prediction

PAC-Bayes Generalization Bounds for Randomized Structured Prediction PAC-Bayes Generalization Bounds for Randomized Structured Prediction Ben London University of Maryland blondon@cs.umd.edu Ben Taskar University of Washington taskar@cs.washington.edu Bert Huang University

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Statistical learning. Chapter 20, Sections 1 3 1

Statistical learning. Chapter 20, Sections 1 3 1 Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

Machine Learning Basics: Maximum Likelihood Estimation

Machine Learning Basics: Maximum Likelihood Estimation Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning

More information

Learning a Multiview Weighted Majority Vote Classifier: Using PAC-Bayesian Theory and Boosting

Learning a Multiview Weighted Majority Vote Classifier: Using PAC-Bayesian Theory and Boosting Learning a Multiview Weighted Majority Vote Classifier: Using PAC-Bayesian Theory and Boosting Anil Goyal To cite this version: Anil Goyal. Learning a Multiview Weighted Majority Vote Classifier: Using

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 3 Stochastic Gradients, Bayesian Inference, and Occam s Razor https://people.orie.cornell.edu/andrew/orie6741 Cornell University August

More information

arxiv: v1 [stat.ml] 17 Jul 2017

arxiv: v1 [stat.ml] 17 Jul 2017 PACBayes and Domain Adaptation arxiv:1707.05712v1 [stat.ml] 17 Jul 2017 Pascal Germain pascal.germain@inria.fr Département d informatique de l ENS, École normale supérieure, CNRS, PSL Research University,

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

Posterior Regularization

Posterior Regularization Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Bayesian inference J. Daunizeau

Bayesian inference J. Daunizeau Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK Overview of the talk 1 Probabilistic modelling and representation of uncertainty

More information

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring / Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical

More information

TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter Generalization and Regularization

TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter Generalization and Regularization TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter 2019 Generalization and Regularization 1 Chomsky vs. Kolmogorov and Hinton Noam Chomsky: Natural language grammar cannot be learned by

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

STA 414/2104, Spring 2014, Practice Problem Set #1

STA 414/2104, Spring 2014, Practice Problem Set #1 STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Expectation Propagation in Dynamical Systems

Expectation Propagation in Dynamical Systems Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Generalization, Overfitting, and Model Selection

Generalization, Overfitting, and Model Selection Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How

More information

Domain Adaptation of Majority Votes via Perturbed Variation-based Label Transfer

Domain Adaptation of Majority Votes via Perturbed Variation-based Label Transfer Domain Adaptation of Majority Votes via Perturbed Variation-based Label Transfer Emilie Morvant To cite this version: Emilie Morvant. Domain Adaptation of Majority Votes via Perturbed Variation-based Label

More information

Large-scale Ordinal Collaborative Filtering

Large-scale Ordinal Collaborative Filtering Large-scale Ordinal Collaborative Filtering Ulrich Paquet, Blaise Thomson, and Ole Winther Microsoft Research Cambridge, University of Cambridge, Technical University of Denmark ulripa@microsoft.com,brmt2@cam.ac.uk,owi@imm.dtu.dk

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Does Unlabeled Data Help?

Does Unlabeled Data Help? Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline

More information

Foundations of Machine Learning

Foundations of Machine Learning Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about

More information

An Introduction to Statistical and Probabilistic Linear Models

An Introduction to Statistical and Probabilistic Linear Models An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

PDEEC Machine Learning 2016/17

PDEEC Machine Learning 2016/17 PDEEC Machine Learning 2016/17 Lecture - Model assessment, selection and Ensemble Jaime S. Cardoso jaime.cardoso@inesctec.pt INESC TEC and Faculdade Engenharia, Universidade do Porto Nov. 07, 2017 1 /

More information

Statistical Machine Learning Lectures 4: Variational Bayes

Statistical Machine Learning Lectures 4: Variational Bayes 1 / 29 Statistical Machine Learning Lectures 4: Variational Bayes Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 29 Synonyms Variational Bayes Variational Inference Variational Bayesian Inference

More information

Introduction and Models

Introduction and Models CSE522, Winter 2011, Learning Theory Lecture 1 and 2-01/04/2011, 01/06/2011 Lecturer: Ofer Dekel Introduction and Models Scribe: Jessica Chang Machine learning algorithms have emerged as the dominant and

More information

Stratégies bayésiennes et fréquentistes dans un modèle de bandit

Stratégies bayésiennes et fréquentistes dans un modèle de bandit Stratégies bayésiennes et fréquentistes dans un modèle de bandit thèse effectuée à Telecom ParisTech, co-dirigée par Olivier Cappé, Aurélien Garivier et Rémi Munos Journées MAS, Grenoble, 30 août 2016

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor

More information

Statistical learning. Chapter 20, Sections 1 3 1

Statistical learning. Chapter 20, Sections 1 3 1 Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

an introduction to bayesian inference

an introduction to bayesian inference with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 4: MDL and PAC-Bayes Uniform vs Non-Uniform Bias No Free Lunch: we need some inductive bias Limiting attention to hypothesis

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW2 due now! Project proposal due on tomorrow Midterm next lecture! HW3 posted Last time Linear Regression Parametric vs Nonparametric

More information

PAC-learning, VC Dimension and Margin-based Bounds

PAC-learning, VC Dimension and Margin-based Bounds More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based

More information

CS-E3210 Machine Learning: Basic Principles

CS-E3210 Machine Learning: Basic Principles CS-E3210 Machine Learning: Basic Principles Lecture 3: Regression I slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 48 In a nutshell

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Vapnik Chervonenkis Theory Barnabás Póczos Empirical Risk and True Risk 2 Empirical Risk Shorthand: True risk of f (deterministic): Bayes risk: Let us use the empirical

More information

Machine Learning Lecture 2

Machine Learning Lecture 2 Machine Perceptual Learning and Sensory Summer Augmented 6 Computing Announcements Machine Learning Lecture 2 Course webpage http://www.vision.rwth-aachen.de/teaching/ Slides will be made available on

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Stochastic optimization in Hilbert spaces

Stochastic optimization in Hilbert spaces Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic optimization Hilbert spaces 1 / 48 Outline Learning vs Statistics Aymeric Dieuleveut Stochastic optimization Hilbert

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Bayesian Inference. Chris Mathys Wellcome Trust Centre for Neuroimaging UCL. London SPM Course

Bayesian Inference. Chris Mathys Wellcome Trust Centre for Neuroimaging UCL. London SPM Course Bayesian Inference Chris Mathys Wellcome Trust Centre for Neuroimaging UCL London SPM Course Thanks to Jean Daunizeau and Jérémie Mattout for previous versions of this talk A spectacular piece of information

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Statistical Learning. Philipp Koehn. 10 November 2015

Statistical Learning. Philipp Koehn. 10 November 2015 Statistical Learning Philipp Koehn 10 November 2015 Outline 1 Learning agents Inductive learning Decision tree learning Measuring learning performance Bayesian learning Maximum a posteriori and maximum

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Bayesian Inference Course, WTCN, UCL, March 2013

Bayesian Inference Course, WTCN, UCL, March 2013 Bayesian Course, WTCN, UCL, March 2013 Shannon (1948) asked how much information is received when we observe a specific value of the variable x? If an unlikely event occurs then one would expect the information

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

Machine Learning using Bayesian Approaches

Machine Learning using Bayesian Approaches Machine Learning using Bayesian Approaches Sargur N. Srihari University at Buffalo, State University of New York 1 Outline 1. Progress in ML and PR 2. Fully Bayesian Approach 1. Probability theory Bayes

More information