Generalization of the PAC-Bayesian Theory

Size: px

Start display at page:

Download "Generalization of the PAC-Bayesian Theory"

Angela Marshall
5 years ago
Views:

1 Generalization of the PACBayesian Theory and Applications to SemiSupervised Learning Pascal Germain INRIA Paris (SIERRA Team) Modal Seminar INRIA Lille January 24, 2017 Dans la vie, l essentiel est de porter sur tout des jugements a priori. Boris Vian Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

2 Plan 1 Introduction 2 PACBayesian Theory Majority Vote Classifiers A General PACBayesian Theorem Bounding the Majority Vote Risk 3 SemiSupervised Learning and Variations SemiSupervised Learning Transductive Learning Domain Adaptation 4 Conclusion Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

3 Plan 1 Introduction 2 PACBayesian Theory Majority Vote Classifiers A General PACBayesian Theorem Bounding the Majority Vote Risk 3 SemiSupervised Learning and Variations SemiSupervised Learning Transductive Learning Domain Adaptation 4 Conclusion Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

4 Definitions Learning example An example (x, y) X Y is a descriptionlabel pair. Data generating distribution Each example is an observation from distribution D on X Y. Learning sample S = { (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) } D n Predictors (or hypothesis) h : X Y, h H Learning algorithm A(S) h Loss function l : H X Y R Empirical loss L l S (h) = 1 n n l(h, x i, y i ) i=1 Generalization loss LD l (h) = E l(h, x, y) (x,y) D Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

5 PACBayesian Theory Initiated by McAllester (1999), the PACBayesian theory gives PAC generalization guarantees to Bayesian like algorithms. PAC guarantees (Probably Approximately Correct) With probability at least 1 δ, the loss of predictor h is less than ε ( Pr LD(h) l ε( L ) S(h), l n, δ,...) 1 δ S D n Bayesian flavor Given: A prior distribution P on H. A posterior distribution Q on H. ( ) Pr E L l l D(h) ε( E L S (h), n, δ, P,...) 1 δ S D n h Q h Q Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

6 A Classical PACBayesian Theorem PACBayesian theorem (adapted from McAllester 1999, 2003) For any distribution D on X Y, for any set of predictors H, for any loss l : H X Y [0, 1], for any distribution P on H, for any δ (0, 1], we have, ( Pr S D n Q on H : EL l h Q D(h) l E h Q L S (h) [ 1 2n KL(Q P) ln 2 n δ ] ) 1 δ, where KL(Q P) = Training bound E ln Q(h) h Q P(h) is the KullbackLeibler divergence. Gives generalization guarantees not based on testing sample. Valid for all posterior Q on H Inspiration for conceiving new learning algorithms. Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

7 Plan 1 Introduction 2 PACBayesian Theory Majority Vote Classifiers A General PACBayesian Theorem Bounding the Majority Vote Risk 3 SemiSupervised Learning and Variations SemiSupervised Learning Transductive Learning Domain Adaptation 4 Conclusion Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

8 Plan 1 Introduction 2 PACBayesian Theory Majority Vote Classifiers A General PACBayesian Theorem Bounding the Majority Vote Risk 3 SemiSupervised Learning and Variations SemiSupervised Learning Transductive Learning Domain Adaptation 4 Conclusion Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

Majority Vote Classifiers Consider a binary classification problem, where Y = { 1, 1} and the set H contains binary voters h : X { 1, 1} Weighted majority vote To predict the label of x X, the

9 Majority Vote Classifiers Consider a binary classification problem, where Y = { 1, 1} and the set H contains binary voters h : X { 1, 1} Weighted majority vote To predict the label of x X, the classifier asks for the prevailing opinion ( ) B Q (x) = sgn E h(x) h Q Many learning algorithms output majority vote classifiers AdaBoost, Random Forests, Bagging,... Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

10 A Surrogate Loss Majority vote risk R D (B Q ) = Pr (x,y) D ( ) B Q (x) y where I[a] = 1 if predicate a is true; I[a] = 0 otherwise. [ ] = E I E y h(x) 0 (x,y) D h Q Gibbs Risk / Linear Loss The stochastic Gibbs classifier G Q (x) draws h H according to Q and output h (x). [ ] R D (G Q ) = E E I h(x) y (x,y) D h Q = E h Q L l 01 D (h), where l 01(h, x, y) = I [ h(x) y ]. Factor two It is wellknown that R D (B Q ) 2 R D (G Q ) y h(x) E y h(x) h Q Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

11 Q From the Factor 2 to the Cbound From Markov s inequality ( Pr (X a) E X a ), we obtain: Factor 2 bound ( ) R D (B Q ) = Pr 1 y h(x) 1 (x,y) D E (x,y) D ( 1 y h(x) ) = 2 RD (G Q ) y h(x) From Chebyshev s inequality ( Pr (X E X a) The Cbound (Lacasse et al., 2006) ( 1 2 R D (G Q ) R D (B Q ) CQ D = dq D where dq D is the expected disagreement: [ ] dq D = E E E I h i (x) h j (x) = 1 (x, ) D h i Q h j Q 2 ) 2 Var X a 2 Var X ), we obtain: d D ( [ 1 E (x, ) D R D (G Q ) E h(x) h Q ] 2 ) Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

12 Plan 1 Introduction 2 PACBayesian Theory Majority Vote Classifiers A General PACBayesian Theorem Bounding the Majority Vote Risk 3 SemiSupervised Learning and Variations SemiSupervised Learning Transductive Learning Domain Adaptation 4 Conclusion Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

13 I.I.D. Assumption Assumption Examples are generated i.i.d. by a distribution D on X Y. S = { (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) } D n Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

14 A General PACBayesian Theorem function: «distance» between R S (G Q ) et R D (G Q ) Convex function : [0, 1] [0, 1] R. General theorem (Bégin et al. 2014, 2016; Germain 2015) For any distribution D on X Y, for any set H of voters, for any distribution P on H, for any δ (0, 1], and for any function, we have, with probability at least 1 δ over the choice of S D n, ) Q on H : ( RS (G Q ), R D (G Q ) 1 n [ KL(Q P) ln I ] (n), δ where I (n) = sup r [0,1] [ n ] ( n ) k r k (1 r) n k e n ( k n, r). }{{} k=0 ( ) Bin k;n,r Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

15 General theorem Pr S D n ( Q on H : ( RS (G Q ), R D (G Q ) ) 1 n [ KL(Q P) ln I ]) (n) δ 1 δ. Interpretation ( R S (G Q ), r) 1 n [KL(Q P) ln I (n) δ ] r Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

16 General theorem Pr S D n ( Q on H : ( RS (G Q ), R D (G Q ) ) 1 n [ KL(Q P) ln I ]) (n) δ 1 δ. Proof ideas. Change of Measure Inequality For any P and Q on H, and for any measurable function φ : H R, we have ( E φ(h) KL(Q P) ln E eφ(h)). h Q h P Markov s inequality Pr (X a) E X Pr ( ) X E X a δ 1 δ. Probability of observing k misclassifications among n examples Given a voter h, consider a binomial variable of n trials with success L l 01 D ( ) (h): ( ) Pr L l 01 S D n S (h)= k n ( ) k ( ) n k = L l01 n D k (h) 1 L l 01 D (h) ( ) = Bin k; n, L l 01 D (h) Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

17 General theorem Pr S D n ( Q on H : ( RS (G Q ), R D (G Q ) ) 1 n [ KL(Q P) ln I ]) (n) δ 1 δ. Proof. ( n ) E L S l (h), E L l h Q D(h) ( L l S (h), LD(h)) l h Q Jensen s Inequality E h Q n ( Change of measure KL(Q P) ln E e n L h P Markov s Inequality 1 δ KL(Q P) ln 1 δ Expectation swap = KL(Q P) ln 1 δ Binomial law = KL(Q P) ln 1 δ E S D n E h P E h P Supremum over risk KL(Q P) ln 1 δ sup r [0,1] = KL(Q P) ln 1 δ I (n). ) l S (h),l D l (h) E e h P E S D l n ( L S (h),ld l (h)) l L nen ( S (h),l l D (h)) n Bin ( k; n, LD(h) l ) e n ( k n,l D l (h)) k=0 [ n Bin ( k; n, r ) ] e n ( k n, r) Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34 k=0

18 General theorem Pr S D n ( Q on H : ( RS (G Q ), R D (G Q ) ) 1 n [ KL(Q P) ln I ]) (n) δ 1 δ. Corollary [...] with probability at least 1 δ over the choice of S D n, for all Q on H : ) [ ] (a) kl( RS (G Q ), R D (G Q ) 1 n KL(Q P) ln 2 n δ, (Langford and Seeger 2001) (b) R D (G Q ) R [ ] 1 S (G Q ) 2n KL(Q P) ln 2 n δ, (McAllester 1999, 2003) (c) R D (G Q ) (c 1 R 1 e c S (G Q ) 1 [ ] ) n KL(Q P) ln 1 δ, (Catoni 2007) (d) R D (G Q ) R S (G Q ) 1 [ λ KL(Q P) ln 1 δ f (λ, n)]. (Alquier et al. 2015) kl(q, p) = q ln q 1 q p (1 q) ln 1 p 2(q p)2, c (q, p) = ln[1 (1 e c ) p] c q, λ (q, p) = λ n (p q). Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

19 Plan 1 Introduction 2 PACBayesian Theory Majority Vote Classifiers A General PACBayesian Theorem Bounding the Majority Vote Risk 3 SemiSupervised Learning and Variations SemiSupervised Learning Transductive Learning Domain Adaptation 4 Conclusion Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

20 Bounding the Expected Disagreement Expected disagreement [ ] dq D = E E E I h i (x) h j (x) h i Q h j Q (x, ) D = E (x, ) D E l d(h ij, x, ), h ij Q 2 where Q 2 (h ij ) = Q(h i ) Q(h j ) = KL(Q 2 P 2 ) = 2 KL(Q P). General theorem [...] with probability at least 1 δ over the choice of S D n, ) S Q on H : ( d Q, d Q D 1 n [ 2 KL(Q P) ln I ] (n). δ Corollary ) S (a) kl( d Q, d Q D 1 n [ 2 KL(Q P) ln 2 n δ ]. Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

21 d r The Cbound The Cbound (Lacasse et al., 2006) ( 1 2 R D (G Q ) R D (B Q ) CQ D = dq D ) PACBayes Cbound 1 [...] with probability at least 1 δ over the choice of S D n, with Q on H : { ) r = max ( RS (G Q ), r r [0, 1 2 ] d = min d [0, 1 2 ] R D (B Q ) 1 [ 1 n { ) S ( d Q, d, 1 [ n ( ) r 1 2 d 2KL(Q P) ln I (n) δ/2 2 KL(Q P) ln I (n) δ/2, Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34 ] }, ] }.

22 d r The Cbound The Cbound (Lacasse et al., 2006) ( 1 2 R D (G Q ) R D (B Q ) CQ D = dq D ) PACBayes Cbound 2 [...] with probability at least 1 δ over the choice of S D n, { Q on H : R D (B Q ) sup (r,d) A 1 ( 1 2 r 1 2 d 0 ) }, with ) A = {(r, d) [0, 12 ] 2 (( R S (G Q ), r), ( d SQ, d) 1 n [ 2KL(Q P) ln I 2 (n) δ ] }. Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

23 Bounds Values (adaboost iterates) Disagreement (test) Gibbs risk (test) Cbound (test) Maj. vote risk (test) Maj. vote risk (train) PAC 2*Gibbs bound PAC Cbound 1 PAC Cbound Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

24 Plan 1 Introduction 2 PACBayesian Theory Majority Vote Classifiers A General PACBayesian Theorem Bounding the Majority Vote Risk 3 SemiSupervised Learning and Variations SemiSupervised Learning Transductive Learning Domain Adaptation 4 Conclusion Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

25 Plan 1 Introduction 2 PACBayesian Theory Majority Vote Classifiers A General PACBayesian Theorem Bounding the Majority Vote Risk 3 SemiSupervised Learning and Variations SemiSupervised Learning Transductive Learning Domain Adaptation 4 Conclusion Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

26 SemiSupervised Learning Assumption Examples are generated i.i.d. by a distribution D on X Y. S = { (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) } D n U = { (x n1, ), (x n2, ),..., (x nn, ) } D n Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

27 d r SemiSupervised Learning The Cbound (Lacasse et al., 2006) ( 1 2 R D (G Q ) R D (B Q ) CQ D = dq D ) PAC Cbound semisupervised [...] with probability at least 1 δ over the choice of S D n and U D n, with d = Q on H : R D (B Q ) 1 { ) r = max ( RS (G Q ), r 1 [ r [0, 1 2 ] n { ) S U ( d Q, d min d [0, 1 2 ] 1 nn [ ( ) r 1 2 d 2KL(Q P) ln I (n) δ/2, ] }, ] } 2 KL(Q P) ln I (nn ). δ/2 Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

28 Bounds Values (adaboost iterates) Disagreement (test) Gibbs risk (test) Cbound (test) Maj. vote risk (test) Maj. vote risk (train) PAC 2*Gibbs bound PAC Cbound 1 PAC Cbound 2 PAC Cbound semisup Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

29 Plan 1 Introduction 2 PACBayesian Theory Majority Vote Classifiers A General PACBayesian Theorem Bounding the Majority Vote Risk 3 SemiSupervised Learning and Variations SemiSupervised Learning Transductive Learning Domain Adaptation 4 Conclusion Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

30 Transductive Learning Assumption Examples are drawn without replacement from a finite set Z of size N. S = { (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) } Z U = { (x n1, ), (x n2, ),..., (x N, ) } = Z \ S Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

31 Transductive Learning Assumption Examples are drawn without replacement from a finite set Z of size N. S = { (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) } Z U = { (x n1, ), (x n2, ),..., (x N, ) } = Z \ S Inductive learning: n draws with replacement according to D Binomial law. Transductive learning: n draws without replacement in Z Hypergeometric law. Theorem (Bégin et al. 2014) For any set Z of N examples, [...] with probability at least 1 δ over the choice of n examples among Z, Q on H : ( RS (G Q ), R Z (G Q ) ) 1 [ KL(Q P) ln T ] (n, N), n δ where T (n, N) = max K=0...N min[n,k] k=max[0,kn N] ( K k)( N K ( N n) n k ) e n ( k n, K N ) Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34.

32 Theorem Pr S [Z] n ( Q on H : ( RS (G Q ), R Z (G Q ) ) 1 n [ KL(Q P) ln T ]) (n, N) δ 1 δ. Proof. ( n ) E L S l l (h), E L Z (h) h Q ( L l S (h), L Z (h)) l h Q Jensen s inequality E h Q n ( Change of measure KL(Q P) ln E e n L h P Markov s inequality 1 δ KL(Q P) ln 1 δ Expectations swap = KL(Q P) ln 1 δ E Hypergeometric law = KL(Q P) ln 1 δ E h P Supremum over risk KL(Q P) ln 1 δ max K=0...N = KL(Q P) ln 1 δ T (n, N). ) l S (h), L Z l (h) l n ( L E E e S (h), L Z l (h)) S [Z] n h P l L E S h P S [Z] nen ( (h), L Z l (h)) k ( N L l Z (h) k k )( N N L Z l (h) n k ) e n ( k n, L Z l (h)) ( N n) [ ] ( K k)( N K n k ) e n ( k ( N n, K N ) n) Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

33 A New Transductive Bound A new function (Bégin et al. 2014) ( ) β (q, p) = kl(q, p) 1 β β kl p βq 1 β, p. Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

34 Bounds Values (adaboost iterates) Disagreement (test) Gibbs risk (test) Cbound (test) Maj. vote risk (test) Maj. vote risk (train) PAC 2*Gibbs bound PAC Cbound 1 PAC Cbound 2 PAC Cbound semisup. PAC CBound transductive Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

35 Plan 1 Introduction 2 PACBayesian Theory Majority Vote Classifiers A General PACBayesian Theorem Bounding the Majority Vote Risk 3 SemiSupervised Learning and Variations SemiSupervised Learning Transductive Learning Domain Adaptation 4 Conclusion Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

36 Domain Adaptation Assumption Source and target examples are generated by different distributions. S = { (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) } (D S ) n T = { (x 1, ), (x 2, ),..., (x n, ) } (D T ) n Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

37 Our Domain Adaptation Setting Assumption Source and target examples are generated by different distributions. S = { (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) } (D S ) n T = { (x 1, ), (x 2, ),..., (x n, ) } (D T ) n A domain adaptation learning algorithm is provided with a labeled source sample an unlabeled target sample The goal is to build a classifier with a low target risk R DT (h) Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

38 Generalization Bound Observation R D (G Q ) = 1 2 d D Q e D Q = 1 2 d D Q E E (x,y) D h i Q [ E I h j Q ] h i (x) y [ I h j (x) y PACBayesian DA Bound (Germain, Habrard, et al. 2016) For any prior P over H, any δ (0, 1], any real numbers b > 1 and c > 1, with a probability at least 1 δ over the choices of S (D S ) n and T (D T ) n, we have Q on H, R DT (G Q ) c 1 2 empirical target disagreement {}}{ d T Q b β (D T D S ) }{{} domain divergence Linear tradeoff between d T Q and ês Q : We consider β (D T D S ) = sup x D T (x) D S (x) empirical source joint error complexity term {}}{{}}{ êq S O ( ) KL(Q P) ln 1 δ. as a parameter to tune. Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34 ].

39 Learning algorithm for Linear Classifiers As many PACBayesian works (since Langford and ShaweTaylor 2002) : We consider the set H of all linear classifiers h v in X = R d : h v (x) = sgn(v x). Let Q w on H be a Gaussian distribution centered on w (with Σ=I d ): [ ] h w(x) = sgn E h v(x). v Q w Given T = {x i } n i=1 and S = {(x j, y j )} n j=1, find w Rd that minimizes: C d T Q ( ) C n i Φ w xi dis x i B êq S KL(Q w P 0 ). ) B n j (y Φerr w x j j x j 1 2 w Φ( ) Φ err ( ) Φ dis ( ) Margin y w x x losses Low density region on target Classification accuracy on source Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

40 Toy Experiment RBF kernel B = 1 C = Target rotation of 10 degrees Target rotation of 30 degrees Target rotation of 50 degrees Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

41 Empirical results on Amazon Dataset Linear kernel Hyperparameter selection by reverse crossvalidation svm dasvm coda PBDA DALC books DVDs books electro books kitchen DVDs books DVDs electro DVDs kitchen electro books electro DVDs electro kitchen kitchen books kitchen DVDs kitchen electro Average Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

42 Plan 1 Introduction 2 PACBayesian Theory Majority Vote Classifiers A General PACBayesian Theorem Bounding the Majority Vote Risk 3 SemiSupervised Learning and Variations SemiSupervised Learning Transductive Learning Domain Adaptation 4 Conclusion Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

43 Short Summary An original PACBayesian approach General theorem from which we recover existing results; Modular proof, easy to adapt to various frameworks (i.e., transductive learning). The virtue of disagreement Secondorder statistic from unlabeled sample (useful in semisupervised setting and variants); Allows to reduce the value of PACBayesian bounds (Cbound); Also used in a domain adaptation learning algorithm. Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

44 Some (self)pointers Our 74 pages journal paper (JMLR) Risk Bounds for the Majority Vote: From a PACBayesian Analysis to a Learning Algorithm (Germain, Lacasse, et al. 2015) My PhD thesis (in french) Généralisations de la théorie PACbayésienne pour l apprentissage inductif, l apprentissage transductif et l adaptation de domaine (Germain 2015) Recent papers ICML: A New PACBayesian Perspective on Domain Adaptation (Germain, Habrard, et al. 2016) AISTATS: PACBayesian Bounds Based on the Rényi Divergence (Bégin et al. 2016) NIPS: PACBayesian Theory Meets Bayesian Inference (Germain, Bach, et al. 2016) Even some code on my GitHub... Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

45 References I Alquier, Pierre, James Ridgway, and Nicolas Chopin (2015). On the properties of variational approximations of Gibbs posteriors. In: ArXiv eprints. url: Bégin, Luc, Pascal Germain, François Laviolette, and JeanFrancis Roy (2014). PACBayesian Theory for Transductive Learning. In: AISTATS. (2016). PACBayesian Bounds based on the Rényi Divergence. In: AISTATS. Catoni, Olivier (2007). PACBayesian supervised classification: the thermodynamics of statistical learning. Vol. 56. Inst. of Mathematical Statistic. Germain, Pascal (2015). Généralisations de la théorie PACbayésienne pour l apprentissage inductif, l apprentissage transductif et l adaptation de domaine. PhD thesis. Université Laval. url: Germain, Pascal, Francis Bach, Alexandre Lacoste, and Simon LacosteJulien (2016). PACBayesian Theory Meets Bayesian Inference. In: ArXiv eprints. url: Germain, Pascal, Amaury Habrard, François Laviolette, and Emilie Morvant (2016). A New PACBayesian Perspective on Domain Adaptation. In: ICML. url: Germain, Pascal, Alexandre Lacasse, Francois Laviolette, Mario Marchand, and JeanFrancis Roy (2015). Risk Bounds for the Majority Vote: From a PACBayesian Analysis to a Learning Algorithm. In: JMLR 16. Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 2

46 References II Langford, John and Matthias Seeger (2001). Bounds for averaging classifiers. Tech. rep. Carnegie Mellon, Departement of Computer Science. Langford, John and John ShaweTaylor (2002). PACBayes & Margins. In: NIPS. McAllester, David (1999). Some PACBayesian Theorems. In: Machine Learning (2003). PACBayesian Stochastic Model selection. In: Machine Learning Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 2

Variations sur la borne PAC-bayésienne

Variations sur la borne PAC-bayésienne Pascal Germain INRIA Paris Équipe SIRRA Séminaires du département d informatique et de génie logiciel Université Laval 11 juillet 2016 Pascal Germain INRIA/SIRRA