Variations sur la borne PAC-bayésienne
|
|
- Elmer Fleming
- 5 years ago
- Views:
Transcription
1 Variations sur la borne PAC-bayésienne Pascal Germain INRIA Paris Équipe SIRRA Séminaires du département d informatique et de génie logiciel Université Laval 11 juillet 2016 Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
2 Plan 1 Introduction 2 PAC-Bayesian Theory Majority Vote Classifiers A General PAC-Bayesian Theorem Transductive Bounds Rényi-Based Bounds Regression Bounds 3 PAC-Bayesian Theory Meets Bayesian Inference PAC-Bayesian Marginal Likelihood Model Comparaison Toy xperiments: Linear Regression 4 Conclusion and future works Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
3 Plan 1 Introduction 2 PAC-Bayesian Theory Majority Vote Classifiers A General PAC-Bayesian Theorem Transductive Bounds Rényi-Based Bounds Regression Bounds 3 PAC-Bayesian Theory Meets Bayesian Inference PAC-Bayesian Marginal Likelihood Model Comparaison Toy xperiments: Linear Regression 4 Conclusion and future works Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
4 Definitions Learning example An example x, y X Y is a description-label pair. Data generating distribution ach example is an i.i.d. observation from distribution D on X Y. Learning sample S = { x 1, y 1, x 2, y 2,..., x n, y n } D n Predictors or hypothesis h : X Y, h H Learning algorithm AS h Loss function l : H X Y R mpirical loss L l S h = 1 n n lh, x i, y i i=1 Generalization loss L l D h = x,y D lh, x i, y i Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
5 PAC-Bayesian Theory Initiated by McAllester 1999, the PAC-Bayesian theory gives PAC generalization guarantees to Bayesian like algorithms. PAC guarantees Probably Approximately Correct With probability at least 1, the loss of predictor h is less than ε Pr LDh l ε L Sh, l n,,... 1 S D n Bayesian flavor Given: A prior distribution P on H. A posterior distribution Q on H. Pr L l l Dh ε L S h, n,, P,... 1 S D n Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
6 A Classical PAC-Bayesian Theorem PAC-Bayesian theorem adapted from McAllester 1999, 2003 For any distribution D on X Y, for any set of predictors H, for any loss l : H X Y [0, 1], for any distribution P on H, for any 0, 1], we have, Pr S D n Q on H : L l Dh l L S h + [ 1 2n KLQ P + ln 2 n ] 1, where KLQ P = Training bound ln Qh Ph is the Kullback-Leibler divergence. Gives generalization guarantees not based on testing sample. Valid for all posterior Q on H Inspiration for conceiving new learning algorithms. Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
7 Plan 1 Introduction 2 PAC-Bayesian Theory Majority Vote Classifiers A General PAC-Bayesian Theorem Transductive Bounds Rényi-Based Bounds Regression Bounds 3 PAC-Bayesian Theory Meets Bayesian Inference PAC-Bayesian Marginal Likelihood Model Comparaison Toy xperiments: Linear Regression 4 Conclusion and future works Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
8 Plan 1 Introduction 2 PAC-Bayesian Theory Majority Vote Classifiers A General PAC-Bayesian Theorem Transductive Bounds Rényi-Based Bounds Regression Bounds 3 PAC-Bayesian Theory Meets Bayesian Inference PAC-Bayesian Marginal Likelihood Model Comparaison Toy xperiments: Linear Regression 4 Conclusion and future works Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
9 Majority Vote Classifiers Consider a binary classification problem, where Y = { 1, +1} and the set H contains binary voters h : X { 1, +1} Weighted majority vote To predict the label of x X, the classifier asks for the prevailing opinion B Q x = sgn hx Many learning algorithms output majority vote classifiers AdaBoost, Random Forests, Bagging,... Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
10 A Surrogate Loss Majority vote risk R D B Q = Pr x,y D B Q x y where I[a] = 1 if predicate a is true; I[a] = 0 otherwise. [ ] = I y hx 0 x,y D Gibbs Risk / Linear Loss The stochastic Gibbs classifier G Q x draws h H according to Q and output h x. [ ] R D G Q = I hx y x,y D = L l 01 D h, where l 01h, x, y = I [ hx y ]. Factor two It is well-known that R D B Q 2 R D G Q y hx y hx See Germain, Lacasse, Laviolette, Marchand, and Roy 2015, JMLR for an extensive study. Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
11 Plan 1 Introduction 2 PAC-Bayesian Theory Majority Vote Classifiers A General PAC-Bayesian Theorem Transductive Bounds Rényi-Based Bounds Regression Bounds 3 PAC-Bayesian Theory Meets Bayesian Inference PAC-Bayesian Marginal Likelihood Model Comparaison Toy xperiments: Linear Regression 4 Conclusion and future works Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
12 A General PAC-Bayesian Theorem -function: «distance» between R S G Q et R D G Q Convex function : [0, 1] [0, 1] R. General theorem Bégin et al. 2014, 2016; Germain 2015 For any distribution D on X Y, for any set H of voters, for any distribution P on H, for any 0, 1], and for any -function, we have, with probability at least 1 over the choice of S D n, Q on H : RS G Q, R D G Q 1 n [ KLQ P + ln I ] n, where I n = sup r [0,1] [ n ] n k r k 1 r n k e n k n, r. }{{} k=0 Bin k;n,r Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
13 General theorem Pr S D n Q on H : RS G Q, R D G Q 1 n [ KLQ P + ln I ] n 1. Interpretation R S G Q, r 1 n [KLQ P +ln I n ] r Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
14 General theorem Pr S D n Q on H : RS G Q, R D G Q 1 n [ KLQ P + ln I ] n 1. Proof ideas. Change of Measure Inequality For any P and Q on H, and for any measurable function φ : H R, we have φh KLQ P + ln eφh. h P Markov s inequality Pr X a X Pr X X a 1. Probability of observing k misclassifications among n examples Given a voter h, consider a binomial variable of n trials with success L l 01 D h: Pr L l 01 S D n S h= k n k n k = L l01 n D k h 1 L l 01 D h = Bin k; n, L l 01 D h Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
15 General theorem Pr S D n Q on H : RS G Q, R D G Q 1 n [ KLQ P + ln I ] n 1. Proof. n L S l h, L l Dh L l S h, LDh l Jensen s Inequality n Change of measure KLQ P + ln e n L h P Markov s Inequality 1 KLQ P + ln 1 xpectation swap = KLQ P + ln 1 Binomial law = KLQ P + ln 1 S D n h P h P Supremum over risk KLQ P + ln 1 sup r [0,1] = KLQ P + ln 1 I n. l S h,l D l h e h P S D l n L S h,ld l h l L nen S h,l l D h n Bin k; n, LDh l e n k n,l D l h k=0 [ n Bin k; n, r ] e n k n, r Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33 k=0
16 General theorem Pr S D n Q on H : RS G Q, R D G Q 1 n [ KLQ P + ln I ] n 1. Corollary [...] with probability at least 1 over the choice of S D n, for all Q on H : [ ] a kl RS G Q, R D G Q 1 n KLQ P + ln 2 n, Langford and Seeger 2001 b R D G Q R [ ] 1 S G Q + 2n KLQ P + ln 2 n, McAllester 1999, 2003 c R D G Q c 1 R 1 e c S G Q + 1 [ ] n KLQ P + ln 1, Catoni 2007 d R D G Q R S G Q + 1 [ λ KLQ P + ln 1 + f λ, n]. Alquier et al klq, p = q ln q 1 q p + 1 q ln 1 p 2q p2, c q, p = ln[1 1 e c p] c q, λ q, p = λ n p q. Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
17 Plan 1 Introduction 2 PAC-Bayesian Theory Majority Vote Classifiers A General PAC-Bayesian Theorem Transductive Bounds Rényi-Based Bounds Regression Bounds 3 PAC-Bayesian Theory Meets Bayesian Inference PAC-Bayesian Marginal Likelihood Model Comparaison Toy xperiments: Linear Regression 4 Conclusion and future works Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
18 Transductive Learning Assumption xamples are drawn without replacement from a finite set Z of size N. S = { x 1, y 1, x 2, y 2,..., x n, y n } Z U = { x n+1,, x n+2,,..., x N, } = Z \ S Inductive learning: n draws with replacement according to D Binomial law. Transductive learning: n draws without replacement in Z Hypergeometric law. Theorem Bégin et al For any set Z of N examples, [...] with probability at least 1 over the choice of n examples among Z, Q on H : RS G Q, R Z G Q 1 [ KLQ P + ln T ] n, N, n where T n, N = max K=0...N min[n,k] k=max[0,k+n N] K k N K N n n k e n k n, K N Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33.
19 Theorem Pr S [Z] n Q on H : RS G Q, R Z G Q 1 n [ KLQ P + ln T ] n, N 1. Proof. n L S l l h, L Z h L l S h, L Z h l Jensen s inequality n Change of measure KLQ P + ln e n L h P Markov s inequality 1 KLQ P + ln 1 xpectations swap = KLQ P + ln 1 Hypergeometric law = KLQ P + ln 1 h P Supremum over risk KLQ P + ln 1 max K=0...N = KLQ P + ln 1 T n, N. l S h, L Z l h l n L e S h, L Z l h S [Z] n h P l L S h P S [Z] nen h, L Z l h k N L l Z h k k N N L Z l h n k e n k n, L Z l h N n [ ] K k N K n k e n k N n, K N n Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
20 A New Transductive Bound for the Gibbs Risk Corollary Bégin et al [...] with probability at least 1 over the choice of n examples among Z, Q on H : R Z G Q R [ 1 S G Q + n N 2n KLQ P + ln 3 lnn ] n1 n N. Theorem Derbeko et al Q on H : R Z G Q R ] S G Q + [KLQ P + ln nn n N 2n ] 1 m N mn +17 [KLQ P +ln 2m 1 [ 1 m N KLQ P +ln 3lnm m1 m N 2m [ ] 1 KLQ P +ln 2 m 2m ] Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33 N
21 Plan 1 Introduction 2 PAC-Bayesian Theory Majority Vote Classifiers A General PAC-Bayesian Theorem Transductive Bounds Rényi-Based Bounds Regression Bounds 3 PAC-Bayesian Theory Meets Bayesian Inference PAC-Bayesian Marginal Likelihood Model Comparaison Toy xperiments: Linear Regression 4 Conclusion and future works Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
22 A New Change of Measure Kullback-Leibler Change of Measure Inequality For any P and Q on H, and for any φ : H R, we have φh KLQ P + ln h P eφh. Rényi Change of Measure Inequality Atar and Merhav 2015 For any P and Q on H, any φ : H R, and for any α > 1, we have α α 1 ln φh D αq P + ln φh α α 1, h P with D α Q P = 1 [ α 1 ln and h P lim D α Q P = KLQ P. α 1 Qh α ] Ph KLQ P, Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
23 Rényi-Based General Theorem Theorem Bégin et al [...] for any α > 1, with probability at least 1 over the choice of S D n, with Q on H: ln RS G Q, R D G Q 1 [ α D α Q P+ ln IR n, ] α, and α := I R n, α = α α 1 > 1. sup r [0,1] [ n Bin k; n, r ] k n, rα, k=0 Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
24 Rényi-Based General Theorem Pr S D n Q on H : ln RS G Q, R D G Q 1 α [ D αq P+ ln IR n, α ] 1. Proof. α := α α ln L S l h, L l Dh Jensen s Inequality α ln L l S h, L l Dh Change of measure D αq P + ln l L S h, L l h P Dh α Markov s Inequality 1 D αq P + ln 1 xpectation swap = D αq P + ln 1 Binomial law = D αq P + ln 1 S D n h P h P Supremum over risk D αq P + ln 1 sup r [0,1] L l S h, L l h P Dh α S D n L l S h, L Dh l α n k=0 α 1 Bin k; n, L l Dh k n, L l Dh α [ n Bin k; n, r ] k, n rα k=0 = D αq P + ln 1 IR n, α. Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
25 mpirical Study Majority votes of 500 decision trees on Mushroom dataset Weak Decision Trees Strong Decision Trees 0.00 R D G Q Jensen s inequality Change of measure Markov s inequality Supremum over risk KLQ P and := 2q p 2 D αq P and := 2q p 2 KLQ P and := klq, p D αq P and := klq, p Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
26 Plan 1 Introduction 2 PAC-Bayesian Theory Majority Vote Classifiers A General PAC-Bayesian Theorem Transductive Bounds Rényi-Based Bounds Regression Bounds 3 PAC-Bayesian Theory Meets Bayesian Inference PAC-Bayesian Marginal Likelihood Model Comparaison Toy xperiments: Linear Regression 4 Conclusion and future works Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
27 PAC-Bayesian Bounds for Regression Lemma Maurer 2004 For any l : H X Y [0, 1], and convex : [0, 1] [0, 1] R, n S D en L l S h, LD l h Bin k; n, LDh l e n k n, L D l h k=0 General theorem for regression with bounded losses For any distribution D on X Y, for any set H of predictors, for any l : H X Y [0, 1] for any distribution P on H, for any 0, 1], and for any -function, we have, with probability at least 1 over the choice of S D n, Q on H : L Sh, l L l Dh 1 n [ KLQ P + ln I ] n. Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
28 General theorem for regression with bounded losses Pr S D n Q on H : L S l h, L l Dh 1 n [ KLQ P + ln I n ] 1. Proof. n L S l h, L l Dh L l S h, LDh l Jensen s Inequality n Change of measure KLQ P + ln e n L h P Markov s Inequality 1 KLQ P + ln 1 xpectation swap = KLQ P + ln 1 Maurer s Lemma KLQ P + ln 1 S D n h P h P Supremum over risk KLQ P + ln 1 sup r [0,1] l S h,l D l h e h P S D l n L S h,ld l h l L en n S h,l l D h n Bin k; n, LDh l e n k n,l D l h k=0 = KLQ P + ln 1 I n. [ n Bin k; n, r ] e n k n, r Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33 k=0
29 PAC-Bayesian Bounds for Regression General theorem for regression with bounded losses Pr S D n Q on H : L S l h, L l Dh 1 n [ KLQ P + ln I n ] 1. Corollary [...] with probability at least 1 over the choice of S D n, for all Q on H : [ a kl L S lh, L D l h 1 n KLQ P + ln 2 n [ b LD l h l 1 L S h + 2n KLQ P + ln 2 n ], Langford and Seeger 2001 ], McAllester 1999, 2003 c d LD c l h 1 1 e c L l D h L l S h + 1 λ L l S h + 1 n [ KLQ P + ln 1 ], Catoni 2007 [ KLQ P + ln 1 + f λ, n]. Alquier et al Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
30 Plan 1 Introduction 2 PAC-Bayesian Theory Majority Vote Classifiers A General PAC-Bayesian Theorem Transductive Bounds Rényi-Based Bounds Regression Bounds 3 PAC-Bayesian Theory Meets Bayesian Inference PAC-Bayesian Marginal Likelihood Model Comparaison Toy xperiments: Linear Regression 4 Conclusion and future works Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
31 Plan 1 Introduction 2 PAC-Bayesian Theory Majority Vote Classifiers A General PAC-Bayesian Theorem Transductive Bounds Rényi-Based Bounds Regression Bounds 3 PAC-Bayesian Theory Meets Bayesian Inference PAC-Bayesian Marginal Likelihood Model Comparaison Toy xperiments: Linear Regression 4 Conclusion and future works Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
32 Optimal Gibbs Posterior Corollary [...] with probability at least 1 over the choice of S D n, for all Q on H : c d LD c l h 1 1 e c LD l h L l S h + 1 λ L l S h + 1 n [ KLQ P + ln 1 ], Catoni 2007 [ KLQ P + ln 1 + f λ, n]. Alquier et al From an algorithm design perspective, Corollary c suggests optimizing the following trade-off: c n R S G Q + KLQ P, which also minimizes d, with λ := c n. The optimal Gibbs posterior is given by Q c h = 1 Z S Ph e c n L l S h. See Catoni 2007, Alquier et al. 2015,... Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
33 Tying the Concepts Let us denote Θ as the set of all possible model parameters. Bayesian Rule pθ X, Y = pθ py X, θ py X where X = {x 1,..., x n }, Y = {y 1,..., y n }, and pθ py X, θ, pθ is the prior for each θ Θ similar to P over H pθ X, Y is the posterior for each θ Θ similar to Q over H py X, θ is the likelihood of the parameters θ given the sample S. Negative log-likelihood loss function Then, L l nll S θ = 1 n l nll θ, x, y = ln 1 py x,θ. n l nll θ, x i, y i = 1 n i=1 n ln py i x i, θ = 1 ln py X, θ. n Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33 i=1
34 Rediscovering the Marginal Likelihood With the negative log-likelihood loss, the Bayesian and PAC-Bayesian posteriors align: pθ X, Y = pθ py X, θ py X = l Pθ e n L nll S θ Z S = Q θ. The normalization constant Z S corresponds to the marginal likelihood Z S = py X = Pθ e n L l nll S θ dθ. Putting back the posterior inside the PAC-Bayesian bounds, we obtain: l n L nll S θ + KLQ P θ Q = n = Θ Θ Pθ e n L l nll θ S l L nll S θ dθ + Z S Θ ] [ln 1ZS Pθ e n L l nll θ S Z S dθ = Θ [ Pθ e n L l nll θ S ln Z S Z S Z S ln 1 Z S = ln Z S. Pθ e n L l nll S Pθ Z S Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33 θ ] dθ
35 From the Marginal Likelihood to PAC-Bayesian Bounds Corollary Germain, Bach, Lacoste, Lacoste-Julien 2016 Given a data distribution D, a parameter set Θ, a prior distribution P over Θ, a 0, 1], if l nll lies in [a, b], we have, with probability at least 1 over the choice of S D n, c d θ Q Ll nll D θ a + b a 1 e a b [1 e a n Z S θ Q Ll nll D θ 1 2 b a2 1 n ln Z S Take home message! The marginal likelihood minimizes some PAC-Bayesian Bounds under the negative log-likelihood loss function. ], Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
36 Plan 1 Introduction 2 PAC-Bayesian Theory Majority Vote Classifiers A General PAC-Bayesian Theorem Transductive Bounds Rényi-Based Bounds Regression Bounds 3 PAC-Bayesian Theory Meets Bayesian Inference PAC-Bayesian Marginal Likelihood Model Comparaison Toy xperiments: Linear Regression 4 Conclusion and future works Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
37 Model Comparaison Consider a discrete set of L models {M i } L i=1 with parameters {Θ i} L i=1, a prior pm i over these models, for each model M i, a prior pθ M i = P i θ over Θ i Bayesian Rule pθ X, Y, M i = pθ M i py X, θ, M i py X, M i where the model evidence is py X, M i = pθ M i py X, θ, M i dθ = Z S,i. Θ i, Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
38 Bayesian Model Selection Slide from Zoubin Ghahramani s MLSS 2012 talk : Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
39 Frequentist Bounds for Bayesian Model Selection Alternative explanation for the Bayesian Occam s Razor phenomena... Corollary Germain, Bach, et al [...] with probability at least 1 over the choice of S D n, i {1,..., L} : c d θ Q i [ L l nll b a D θ a + 1 e a n Z 1 e a b S,i L θ Q Ll nll D θ 1 2 b a2 1 n ln Z S,i L. ], Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
40 Plan 1 Introduction 2 PAC-Bayesian Theory Majority Vote Classifiers A General PAC-Bayesian Theorem Transductive Bounds Rényi-Based Bounds Regression Bounds 3 PAC-Bayesian Theory Meets Bayesian Inference PAC-Bayesian Marginal Likelihood Model Comparaison Toy xperiments: Linear Regression 4 Conclusion and future works Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
41 Bayesian Linear Regression Consider a mapping function φ : X R d. Given x, y X Y, model parameters θ := w R d and a fixed noise σ, we consider the likelihood py x, w = N y w φx, σ 2 = 1 2πσ 2 e 1 2σ 2 y w φx 2 Thus, the negative log-likelihood loss function is l nll w, x, y = ln 1 py x, w = 1 2 ln2πσ σ 2 y w φx 2 We also consider an isotropic Gaussian prior of mean 0 and variance σ 2 P pw σ P = N w 0, σ 2 P = 1 e 1 2σ 2 w 2 P. 2π d σp 2 Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
42 Bayesian Linear Regression The Gibbs optimal posterior is given by Q w = pw σ, σ P = pw σ, σ P px, Y w, σ, σ P py X, σ, σ P = N w ŵ, A 1, where A := 1 Φ T Φ + 1 I and ŵ := 1 A 1 Φ T y. σ 2 σp 2 σ 2 The negative log marginal likelihood is ln Z S σ, σ P = 1 2σ 2 y Φŵ 2 + n 2 ln2πσ σ 2 P ŵ log A + d ln σ P = n L l nll S ŵ + 1 trφ T ΦA 1 2σ }{{ 2 } n w L l nll S w Q + 1 2σ P 2 tra 1 d σ 2 P ŵ log A + d ln σ P }{{} KL N ŵ, A 1 N 0, σ 2 PI. Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
43 Fitting y = sinx + ɛ with polynomial models Inspired by Bishop 2006 Illustrate the decomposition of the marginal likelihood into the empirical loss and KL-divergence. ln Z S = n θ Q l L nll S θ + KLQ P model d=1 model d=2 model d=3 model d=4 model d=5 model d=6 model d=7 sinx ln Z X,Y KLˆρ π n θ ˆρ L lnll X,Y θ n θ ˆρ L lnll D θ π π 3 2 π 2π x model degree d Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
44 Plan 1 Introduction 2 PAC-Bayesian Theory Majority Vote Classifiers A General PAC-Bayesian Theorem Transductive Bounds Rényi-Based Bounds Regression Bounds 3 PAC-Bayesian Theory Meets Bayesian Inference PAC-Bayesian Marginal Likelihood Model Comparaison Toy xperiments: Linear Regression 4 Conclusion and future works Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
45 Conclusion and future works I talked about.. A General theorem from which we recover existing results; My modular proof, easy to adapt to various frameworks; A direct link between PAC-Bayesian frequentist bounds and Bayesian model selection. I did not talk about... Our learning algorithms inspired by PAC-Bayesian Bounds; see Germain, Lacasse, Laviolette, and Marchand 2009 ICML and Germain, Habrard, et al ICML Our PAC-Bayesian theorems for unbounded losses. I plan to... see Germain, Bach, et al arxiv Study other Bayesian techniques from a PAC-Bayes perspective empirical Bayes, variational Bayes, etc. Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 33
46 References I Alquier, Pierre, James Ridgway, and Nicolas Chopin On the properties of variational approximations of Gibbs posteriors. In: ArXiv e-prints. url: Atar, Rami and Neri Merhav Information-theoretic applications of the logarithmic probability comparison bound. In: I International Symposium on Information Theory ISIT. Bégin, Luc, Pascal Germain, François Laviolette, and Jean-Francis Roy PAC-Bayesian Theory for Transductive Learning. In: AISTATS PAC-Bayesian Bounds based on the Rényi Divergence. In: AISTATS. Bishop, Christopher M Pattern Recognition and Machine Learning Information Science and Statistics. Secaucus, NJ, USA: Springer-Verlag New York, Inc. Catoni, Olivier PAC-Bayesian supervised classification: the thermodynamics of statistical learning. Vol. 56. Inst. of Mathematical Statistic. Derbeko, Philip, Ran l-yaniv, and Ron Meir xplicit Learning Curves for Transduction and Application to Clustering and Compression Algorithms. In: J. Artif. Intell. Res. JAIR 22. Germain, Pascal Généralisations de la théorie PAC-bayésienne pour l apprentissage inductif, l apprentissage transductif et l adaptation de domaine. PhD thesis. Université Laval. url: Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 2
47 References II Germain, Pascal, Francis Bach, Alexandre Lacoste, and Simon Lacoste-Julien PAC-Bayesian Theory Meets Bayesian Inference. In: ArXiv e-prints. url: Germain, Pascal, Amaury Habrard, François Laviolette, and milie Morvant A New PAC-Bayesian Perspective on Domain Adaptation. In: ICML. url: Germain, Pascal, Alexandre Lacasse, Francois Laviolette, and Mario Marchand PAC-Bayesian learning of linear classifiers. In: ICML. Germain, Pascal, Alexandre Lacasse, Francois Laviolette, Mario Marchand, and Jean-Francis Roy Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm. In: JMLR 16. Langford, John and Matthias Seeger Bounds for averaging classifiers. Tech. rep. Carnegie Mellon, Departement of Computer Science. Maurer, Andreas A Note on the PAC-Bayesian Theorem. In: CoRR cs.lg/ McAllester, David Some PAC-Bayesian Theorems. In: Machine Learning PAC-Bayesian Stochastic Model selection. In: Machine Learning Pascal Germain INRIA/SIRRA Variations sur la borne PAC-bayésienne 11 juillet / 2
Generalization of the PAC-Bayesian Theory
Generalization of the PACBayesian Theory and Applications to SemiSupervised Learning Pascal Germain INRIA Paris (SIERRA Team) Modal Seminar INRIA Lille January 24, 2017 Dans la vie, l essentiel est de
More informationPAC-Bayesian Bounds based on the Rényi Divergence
Luc Bégin Pascal Germain 2 François Laviolette 3 Jean-Francis Roy 3 lucbegin@umonctonca pascalgermain@inriafr {francoislaviolette, jean-francisroy}@iftulavalca Campus d dmundston, Université de Moncton,
More informationA tutorial on the Pac-Bayesian Theory. by François Laviolette
A tutorial on the Pac-Bayesian Theory NIPS workshop - (Almost) 50 shades of Bayesian Learning: PAC-Bayesian trends and insights by François Laviolette Laboratoire du GRAAL, Université Laval December 9th
More informationPAC-Bayesian Generalization Bound for Multi-class Learning
PAC-Bayesian Generalization Bound for Multi-class Learning Loubna BENABBOU Department of Industrial Engineering Ecole Mohammadia d Ingènieurs Mohammed V University in Rabat, Morocco Benabbou@emi.ac.ma
More informationPAC-Bayesian Theory Meets Bayesian Inference
PAC-Bayesian Theory Meets Bayesian Inference Pascal Germain Francis Bach Alexandre Lacoste Simon Lacoste-Julien INRIA Paris - École Normale Supérieure, firstname.lastname@inria.fr Google, allac@google.com
More informationPAC-Bayesian Learning and Domain Adaptation
PAC-Bayesian Learning and Domain Adaptation Pascal Germain 1 François Laviolette 1 Amaury Habrard 2 Emilie Morvant 3 1 GRAAL Machine Learning Research Group Département d informatique et de génie logiciel
More informationLa théorie PAC-Bayes en apprentissage supervisé
La théorie PAC-Bayes en apprentissage supervisé Présentation au LRI de l université Paris XI François Laviolette, Laboratoire du GRAAL, Université Laval, Québec, Canada 14 dcembre 2010 Summary Aujourd
More informationRisk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm
Journal of Machine Learning Research 16 2015 787-860 Submitted 5/13; Revised 9/14; Published 4/15 Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm Pascal Germain
More informationFrom PAC-Bayes Bounds to Quadratic Programs for Majority Votes
François Laviolette FrancoisLaviolette@iftulavalca Mario Marchand MarioMarchand@iftulavalca Jean-Francis Roy Jean-FrancisRoy1@ulavalca Département d informatique et de génie logiciel, Université Laval,
More informationPAC-Bayes Risk Bounds for Sample-Compressed Gibbs Classifiers
PAC-Bayes Ris Bounds for Sample-Compressed Gibbs Classifiers François Laviolette Francois.Laviolette@ift.ulaval.ca Mario Marchand Mario.Marchand@ift.ulaval.ca Département d informatique et de génie logiciel,
More informationA Column Generation Bound Minimization Approach with PAC-Bayesian Generalization Guarantees
A Column Generation Bound Minimization Approach with PAC-Bayesian Generalization Guarantees Jean-Francis Roy jean-francis.roy@ift.ulaval.ca Mario Marchand mario.marchand@ift.ulaval.ca François Laviolette
More informationDomain-Adversarial Neural Networks
Domain-Adversarial Neural Networks Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand Département d informatique et de génie logiciel, Université Laval, Québec, Canada Département
More informationA Pseudo-Boolean Set Covering Machine
A Pseudo-Boolean Set Covering Machine Pascal Germain, Sébastien Giguère, Jean-Francis Roy, Brice Zirakiza, François Laviolette, and Claude-Guy Quimper Département d informatique et de génie logiciel, Université
More informationDeep Neural Networks: From Flat Minima to Numerically Nonvacuous Generalization Bounds via PAC-Bayes
Deep Neural Networks: From Flat Minima to Numerically Nonvacuous Generalization Bounds via PAC-Bayes Daniel M. Roy University of Toronto; Vector Institute Joint work with Gintarė K. Džiugaitė University
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationA Strongly Quasiconvex PAC-Bayesian Bound
A Strongly Quasiconvex PAC-Bayesian Bound Yevgeny Seldin NIPS-2017 Workshop on (Almost) 50 Shades of Bayesian Learning: PAC-Bayesian trends and insights Based on joint work with Niklas Thiemann, Christian
More informationGeneralization bounds
Advanced Course in Machine Learning pring 200 Generalization bounds Handouts are jointly prepared by hie Mannor and hai halev-hwartz he problem of characterizing learnability is the most basic question
More informationIntroduction to Statistical Learning Theory
Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated
More information1-bit Matrix Completion. PAC-Bayes and Variational Approximation
: PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Bayes In Paris, 5 January 2017 (Happy New Year!) Various Topics covered Matrix Completion PAC-Bayesian Estimation Variational
More informationModel Averaging With Holdout Estimation of the Posterior Distribution
Model Averaging With Holdout stimation of the Posterior Distribution Alexandre Lacoste alexandre.lacoste.1@ulaval.ca François Laviolette francois.laviolette@ift.ulaval.ca Mario Marchand mario.marchand@ift.ulaval.ca
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationarxiv: v1 [stat.ml] 30 Oct 2018
Gaël Letarte gael.letarte.@ulaval.ca Emilie Morvant 2 emilie.morvant@univ-st-etienne.fr Pascal Germain 3 pascal.germain@inria.fr Département d informatique et de génie logiciel, Université Laval, Québec,
More informationPAC-Bayesian Analysis for a two-step Hierarchical Multiview Learning Approach
PAC-Bayesian Analysis for a two-step Hierarchical Multiview Learning Approach Anil Goyal, milie Morvant, Pascal Germain, Massih-Reza Amini To cite this version: Anil Goyal, milie Morvant, Pascal Germain,
More informationPAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification
PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification Emilie Morvant, Sokol Koço, Liva Ralaivola To cite this version: Emilie Morvant, Sokol Koço, Liva Ralaivola. PAC-Bayesian
More informationGeneralization Bounds
Generalization Bounds Here we consider the problem of learning from binary labels. We assume training data D = x 1, y 1,... x N, y N with y t being one of the two values 1 or 1. We will assume that these
More informationExpectation Propagation for Approximate Bayesian Inference
Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More information1-bit Matrix Completion. PAC-Bayes and Variational Approximation
: PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Junior Conference on Data Science 2016 Université Paris Saclay, 15-16 September 2016 Introduction: Matrix Completion
More informationMultiview Boosting by Controlling the Diversity and the Accuracy of View-specific Voters
Multiview Boosting by Controlling the Diversity and the Accuracy of View-specific Voters Anil Goyal 1,2 milie Morvant 1 Pascal Germain 3 Massih-Reza Amini 2 1 Univ Lyon, UJM-Saint-tienne, CNRS, Institut
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationLearning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationVariance Reduction and Ensemble Methods
Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis
More informationPAC-Bayes Generalization Bounds for Randomized Structured Prediction
PAC-Bayes Generalization Bounds for Randomized Structured Prediction Ben London University of Maryland blondon@cs.umd.edu Ben Taskar University of Washington taskar@cs.washington.edu Bert Huang University
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationStatistical learning. Chapter 20, Sections 1 3 1
Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationGWAS IV: Bayesian linear (variance component) models
GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian
More informationMachine Learning Basics: Maximum Likelihood Estimation
Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning
More informationLearning a Multiview Weighted Majority Vote Classifier: Using PAC-Bayesian Theory and Boosting
Learning a Multiview Weighted Majority Vote Classifier: Using PAC-Bayesian Theory and Boosting Anil Goyal To cite this version: Anil Goyal. Learning a Multiview Weighted Majority Vote Classifier: Using
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 3 Stochastic Gradients, Bayesian Inference, and Occam s Razor https://people.orie.cornell.edu/andrew/orie6741 Cornell University August
More informationarxiv: v1 [stat.ml] 17 Jul 2017
PACBayes and Domain Adaptation arxiv:1707.05712v1 [stat.ml] 17 Jul 2017 Pascal Germain pascal.germain@inria.fr Département d informatique de l ENS, École normale supérieure, CNRS, PSL Research University,
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationDEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures
More informationPosterior Regularization
Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationBayesian inference J. Daunizeau
Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK Overview of the talk 1 Probabilistic modelling and representation of uncertainty
More informationMachine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /
Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical
More informationTTIC 31230, Fundamentals of Deep Learning David McAllester, Winter Generalization and Regularization
TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter 2019 Generalization and Regularization 1 Chomsky vs. Kolmogorov and Hinton Noam Chomsky: Natural language grammar cannot be learned by
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationSTA 414/2104, Spring 2014, Practice Problem Set #1
STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationExpectation Propagation in Dynamical Systems
Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex
More informationIntroduction to Probability and Statistics (Continued)
Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:
More informationMachine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationGeneralization, Overfitting, and Model Selection
Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How
More informationDomain Adaptation of Majority Votes via Perturbed Variation-based Label Transfer
Domain Adaptation of Majority Votes via Perturbed Variation-based Label Transfer Emilie Morvant To cite this version: Emilie Morvant. Domain Adaptation of Majority Votes via Perturbed Variation-based Label
More informationLarge-scale Ordinal Collaborative Filtering
Large-scale Ordinal Collaborative Filtering Ulrich Paquet, Blaise Thomson, and Ole Winther Microsoft Research Cambridge, University of Cambridge, Technical University of Denmark ulripa@microsoft.com,brmt2@cam.ac.uk,owi@imm.dtu.dk
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationDoes Unlabeled Data Help?
Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline
More informationFoundations of Machine Learning
Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about
More informationAn Introduction to Statistical and Probabilistic Linear Models
An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationPDEEC Machine Learning 2016/17
PDEEC Machine Learning 2016/17 Lecture - Model assessment, selection and Ensemble Jaime S. Cardoso jaime.cardoso@inesctec.pt INESC TEC and Faculdade Engenharia, Universidade do Porto Nov. 07, 2017 1 /
More informationStatistical Machine Learning Lectures 4: Variational Bayes
1 / 29 Statistical Machine Learning Lectures 4: Variational Bayes Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 29 Synonyms Variational Bayes Variational Inference Variational Bayesian Inference
More informationIntroduction and Models
CSE522, Winter 2011, Learning Theory Lecture 1 and 2-01/04/2011, 01/06/2011 Lecturer: Ofer Dekel Introduction and Models Scribe: Jessica Chang Machine learning algorithms have emerged as the dominant and
More informationStratégies bayésiennes et fréquentistes dans un modèle de bandit
Stratégies bayésiennes et fréquentistes dans un modèle de bandit thèse effectuée à Telecom ParisTech, co-dirigée par Olivier Cappé, Aurélien Garivier et Rémi Munos Journées MAS, Grenoble, 30 août 2016
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationIntro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation
Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor
More informationStatistical learning. Chapter 20, Sections 1 3 1
Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationan introduction to bayesian inference
with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 4: MDL and PAC-Bayes Uniform vs Non-Uniform Bias No Free Lunch: we need some inductive bias Limiting attention to hypothesis
More informationCOMS 4771 Introduction to Machine Learning. Nakul Verma
COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW2 due now! Project proposal due on tomorrow Midterm next lecture! HW3 posted Last time Linear Regression Parametric vs Nonparametric
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationCS-E3210 Machine Learning: Basic Principles
CS-E3210 Machine Learning: Basic Principles Lecture 3: Regression I slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 48 In a nutshell
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s
More informationCurve Fitting Re-visited, Bishop1.2.5
Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the
More informationIntroduction to Machine Learning
Introduction to Machine Learning Vapnik Chervonenkis Theory Barnabás Póczos Empirical Risk and True Risk 2 Empirical Risk Shorthand: True risk of f (deterministic): Bayes risk: Let us use the empirical
More informationMachine Learning Lecture 2
Machine Perceptual Learning and Sensory Summer Augmented 6 Computing Announcements Machine Learning Lecture 2 Course webpage http://www.vision.rwth-aachen.de/teaching/ Slides will be made available on
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationStochastic optimization in Hilbert spaces
Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic optimization Hilbert spaces 1 / 48 Outline Learning vs Statistics Aymeric Dieuleveut Stochastic optimization Hilbert
More informationPart 1: Expectation Propagation
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud
More informationBayesian Inference. Chris Mathys Wellcome Trust Centre for Neuroimaging UCL. London SPM Course
Bayesian Inference Chris Mathys Wellcome Trust Centre for Neuroimaging UCL London SPM Course Thanks to Jean Daunizeau and Jérémie Mattout for previous versions of this talk A spectacular piece of information
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationStatistical Learning. Philipp Koehn. 10 November 2015
Statistical Learning Philipp Koehn 10 November 2015 Outline 1 Learning agents Inductive learning Decision tree learning Measuring learning performance Bayesian learning Maximum a posteriori and maximum
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationBayesian Inference Course, WTCN, UCL, March 2013
Bayesian Course, WTCN, UCL, March 2013 Shannon (1948) asked how much information is received when we observe a specific value of the variable x? If an unlikely event occurs then one would expect the information
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationMachine Learning using Bayesian Approaches
Machine Learning using Bayesian Approaches Sargur N. Srihari University at Buffalo, State University of New York 1 Outline 1. Progress in ML and PR 2. Fully Bayesian Approach 1. Probability theory Bayes
More information