Generalization of the PAC-Bayesian Theory

Size: px
Start display at page:

Download "Generalization of the PAC-Bayesian Theory"

Transcription

1 Generalization of the PACBayesian Theory and Applications to SemiSupervised Learning Pascal Germain INRIA Paris (SIERRA Team) Modal Seminar INRIA Lille January 24, 2017 Dans la vie, l essentiel est de porter sur tout des jugements a priori. Boris Vian Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

2 Plan 1 Introduction 2 PACBayesian Theory Majority Vote Classifiers A General PACBayesian Theorem Bounding the Majority Vote Risk 3 SemiSupervised Learning and Variations SemiSupervised Learning Transductive Learning Domain Adaptation 4 Conclusion Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

3 Plan 1 Introduction 2 PACBayesian Theory Majority Vote Classifiers A General PACBayesian Theorem Bounding the Majority Vote Risk 3 SemiSupervised Learning and Variations SemiSupervised Learning Transductive Learning Domain Adaptation 4 Conclusion Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

4 Definitions Learning example An example (x, y) X Y is a descriptionlabel pair. Data generating distribution Each example is an observation from distribution D on X Y. Learning sample S = { (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) } D n Predictors (or hypothesis) h : X Y, h H Learning algorithm A(S) h Loss function l : H X Y R Empirical loss L l S (h) = 1 n n l(h, x i, y i ) i=1 Generalization loss LD l (h) = E l(h, x, y) (x,y) D Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

5 PACBayesian Theory Initiated by McAllester (1999), the PACBayesian theory gives PAC generalization guarantees to Bayesian like algorithms. PAC guarantees (Probably Approximately Correct) With probability at least 1 δ, the loss of predictor h is less than ε ( Pr LD(h) l ε( L ) S(h), l n, δ,...) 1 δ S D n Bayesian flavor Given: A prior distribution P on H. A posterior distribution Q on H. ( ) Pr E L l l D(h) ε( E L S (h), n, δ, P,...) 1 δ S D n h Q h Q Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

6 A Classical PACBayesian Theorem PACBayesian theorem (adapted from McAllester 1999, 2003) For any distribution D on X Y, for any set of predictors H, for any loss l : H X Y [0, 1], for any distribution P on H, for any δ (0, 1], we have, ( Pr S D n Q on H : EL l h Q D(h) l E h Q L S (h) [ 1 2n KL(Q P) ln 2 n δ ] ) 1 δ, where KL(Q P) = Training bound E ln Q(h) h Q P(h) is the KullbackLeibler divergence. Gives generalization guarantees not based on testing sample. Valid for all posterior Q on H Inspiration for conceiving new learning algorithms. Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

7 Plan 1 Introduction 2 PACBayesian Theory Majority Vote Classifiers A General PACBayesian Theorem Bounding the Majority Vote Risk 3 SemiSupervised Learning and Variations SemiSupervised Learning Transductive Learning Domain Adaptation 4 Conclusion Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

8 Plan 1 Introduction 2 PACBayesian Theory Majority Vote Classifiers A General PACBayesian Theorem Bounding the Majority Vote Risk 3 SemiSupervised Learning and Variations SemiSupervised Learning Transductive Learning Domain Adaptation 4 Conclusion Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

9 Majority Vote Classifiers Consider a binary classification problem, where Y = { 1, 1} and the set H contains binary voters h : X { 1, 1} Weighted majority vote To predict the label of x X, the classifier asks for the prevailing opinion ( ) B Q (x) = sgn E h(x) h Q Many learning algorithms output majority vote classifiers AdaBoost, Random Forests, Bagging,... Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

10 A Surrogate Loss Majority vote risk R D (B Q ) = Pr (x,y) D ( ) B Q (x) y where I[a] = 1 if predicate a is true; I[a] = 0 otherwise. [ ] = E I E y h(x) 0 (x,y) D h Q Gibbs Risk / Linear Loss The stochastic Gibbs classifier G Q (x) draws h H according to Q and output h (x). [ ] R D (G Q ) = E E I h(x) y (x,y) D h Q = E h Q L l 01 D (h), where l 01(h, x, y) = I [ h(x) y ]. Factor two It is wellknown that R D (B Q ) 2 R D (G Q ) y h(x) E y h(x) h Q Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

11 Q From the Factor 2 to the Cbound From Markov s inequality ( Pr (X a) E X a ), we obtain: Factor 2 bound ( ) R D (B Q ) = Pr 1 y h(x) 1 (x,y) D E (x,y) D ( 1 y h(x) ) = 2 RD (G Q ) y h(x) From Chebyshev s inequality ( Pr (X E X a) The Cbound (Lacasse et al., 2006) ( 1 2 R D (G Q ) R D (B Q ) CQ D = dq D where dq D is the expected disagreement: [ ] dq D = E E E I h i (x) h j (x) = 1 (x, ) D h i Q h j Q 2 ) 2 Var X a 2 Var X ), we obtain: d D ( [ 1 E (x, ) D R D (G Q ) E h(x) h Q ] 2 ) Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

12 Plan 1 Introduction 2 PACBayesian Theory Majority Vote Classifiers A General PACBayesian Theorem Bounding the Majority Vote Risk 3 SemiSupervised Learning and Variations SemiSupervised Learning Transductive Learning Domain Adaptation 4 Conclusion Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

13 I.I.D. Assumption Assumption Examples are generated i.i.d. by a distribution D on X Y. S = { (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) } D n Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

14 A General PACBayesian Theorem function: «distance» between R S (G Q ) et R D (G Q ) Convex function : [0, 1] [0, 1] R. General theorem (Bégin et al. 2014, 2016; Germain 2015) For any distribution D on X Y, for any set H of voters, for any distribution P on H, for any δ (0, 1], and for any function, we have, with probability at least 1 δ over the choice of S D n, ) Q on H : ( RS (G Q ), R D (G Q ) 1 n [ KL(Q P) ln I ] (n), δ where I (n) = sup r [0,1] [ n ] ( n ) k r k (1 r) n k e n ( k n, r). }{{} k=0 ( ) Bin k;n,r Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

15 General theorem Pr S D n ( Q on H : ( RS (G Q ), R D (G Q ) ) 1 n [ KL(Q P) ln I ]) (n) δ 1 δ. Interpretation ( R S (G Q ), r) 1 n [KL(Q P) ln I (n) δ ] r Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

16 General theorem Pr S D n ( Q on H : ( RS (G Q ), R D (G Q ) ) 1 n [ KL(Q P) ln I ]) (n) δ 1 δ. Proof ideas. Change of Measure Inequality For any P and Q on H, and for any measurable function φ : H R, we have ( E φ(h) KL(Q P) ln E eφ(h)). h Q h P Markov s inequality Pr (X a) E X Pr ( ) X E X a δ 1 δ. Probability of observing k misclassifications among n examples Given a voter h, consider a binomial variable of n trials with success L l 01 D ( ) (h): ( ) Pr L l 01 S D n S (h)= k n ( ) k ( ) n k = L l01 n D k (h) 1 L l 01 D (h) ( ) = Bin k; n, L l 01 D (h) Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

17 General theorem Pr S D n ( Q on H : ( RS (G Q ), R D (G Q ) ) 1 n [ KL(Q P) ln I ]) (n) δ 1 δ. Proof. ( n ) E L S l (h), E L l h Q D(h) ( L l S (h), LD(h)) l h Q Jensen s Inequality E h Q n ( Change of measure KL(Q P) ln E e n L h P Markov s Inequality 1 δ KL(Q P) ln 1 δ Expectation swap = KL(Q P) ln 1 δ Binomial law = KL(Q P) ln 1 δ E S D n E h P E h P Supremum over risk KL(Q P) ln 1 δ sup r [0,1] = KL(Q P) ln 1 δ I (n). ) l S (h),l D l (h) E e h P E S D l n ( L S (h),ld l (h)) l L nen ( S (h),l l D (h)) n Bin ( k; n, LD(h) l ) e n ( k n,l D l (h)) k=0 [ n Bin ( k; n, r ) ] e n ( k n, r) Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34 k=0

18 General theorem Pr S D n ( Q on H : ( RS (G Q ), R D (G Q ) ) 1 n [ KL(Q P) ln I ]) (n) δ 1 δ. Corollary [...] with probability at least 1 δ over the choice of S D n, for all Q on H : ) [ ] (a) kl( RS (G Q ), R D (G Q ) 1 n KL(Q P) ln 2 n δ, (Langford and Seeger 2001) (b) R D (G Q ) R [ ] 1 S (G Q ) 2n KL(Q P) ln 2 n δ, (McAllester 1999, 2003) (c) R D (G Q ) (c 1 R 1 e c S (G Q ) 1 [ ] ) n KL(Q P) ln 1 δ, (Catoni 2007) (d) R D (G Q ) R S (G Q ) 1 [ λ KL(Q P) ln 1 δ f (λ, n)]. (Alquier et al. 2015) kl(q, p) = q ln q 1 q p (1 q) ln 1 p 2(q p)2, c (q, p) = ln[1 (1 e c ) p] c q, λ (q, p) = λ n (p q). Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

19 Plan 1 Introduction 2 PACBayesian Theory Majority Vote Classifiers A General PACBayesian Theorem Bounding the Majority Vote Risk 3 SemiSupervised Learning and Variations SemiSupervised Learning Transductive Learning Domain Adaptation 4 Conclusion Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

20 Bounding the Expected Disagreement Expected disagreement [ ] dq D = E E E I h i (x) h j (x) h i Q h j Q (x, ) D = E (x, ) D E l d(h ij, x, ), h ij Q 2 where Q 2 (h ij ) = Q(h i ) Q(h j ) = KL(Q 2 P 2 ) = 2 KL(Q P). General theorem [...] with probability at least 1 δ over the choice of S D n, ) S Q on H : ( d Q, d Q D 1 n [ 2 KL(Q P) ln I ] (n). δ Corollary ) S (a) kl( d Q, d Q D 1 n [ 2 KL(Q P) ln 2 n δ ]. Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

21 d r The Cbound The Cbound (Lacasse et al., 2006) ( 1 2 R D (G Q ) R D (B Q ) CQ D = dq D ) PACBayes Cbound 1 [...] with probability at least 1 δ over the choice of S D n, with Q on H : { ) r = max ( RS (G Q ), r r [0, 1 2 ] d = min d [0, 1 2 ] R D (B Q ) 1 [ 1 n { ) S ( d Q, d, 1 [ n ( ) r 1 2 d 2KL(Q P) ln I (n) δ/2 2 KL(Q P) ln I (n) δ/2, Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34 ] }, ] }.

22 d r The Cbound The Cbound (Lacasse et al., 2006) ( 1 2 R D (G Q ) R D (B Q ) CQ D = dq D ) PACBayes Cbound 2 [...] with probability at least 1 δ over the choice of S D n, { Q on H : R D (B Q ) sup (r,d) A 1 ( 1 2 r 1 2 d 0 ) }, with ) A = {(r, d) [0, 12 ] 2 (( R S (G Q ), r), ( d SQ, d) 1 n [ 2KL(Q P) ln I 2 (n) δ ] }. Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

23 Bounds Values (adaboost iterates) Disagreement (test) Gibbs risk (test) Cbound (test) Maj. vote risk (test) Maj. vote risk (train) PAC 2*Gibbs bound PAC Cbound 1 PAC Cbound Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

24 Plan 1 Introduction 2 PACBayesian Theory Majority Vote Classifiers A General PACBayesian Theorem Bounding the Majority Vote Risk 3 SemiSupervised Learning and Variations SemiSupervised Learning Transductive Learning Domain Adaptation 4 Conclusion Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

25 Plan 1 Introduction 2 PACBayesian Theory Majority Vote Classifiers A General PACBayesian Theorem Bounding the Majority Vote Risk 3 SemiSupervised Learning and Variations SemiSupervised Learning Transductive Learning Domain Adaptation 4 Conclusion Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

26 SemiSupervised Learning Assumption Examples are generated i.i.d. by a distribution D on X Y. S = { (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) } D n U = { (x n1, ), (x n2, ),..., (x nn, ) } D n Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

27 d r SemiSupervised Learning The Cbound (Lacasse et al., 2006) ( 1 2 R D (G Q ) R D (B Q ) CQ D = dq D ) PAC Cbound semisupervised [...] with probability at least 1 δ over the choice of S D n and U D n, with d = Q on H : R D (B Q ) 1 { ) r = max ( RS (G Q ), r 1 [ r [0, 1 2 ] n { ) S U ( d Q, d min d [0, 1 2 ] 1 nn [ ( ) r 1 2 d 2KL(Q P) ln I (n) δ/2, ] }, ] } 2 KL(Q P) ln I (nn ). δ/2 Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

28 Bounds Values (adaboost iterates) Disagreement (test) Gibbs risk (test) Cbound (test) Maj. vote risk (test) Maj. vote risk (train) PAC 2*Gibbs bound PAC Cbound 1 PAC Cbound 2 PAC Cbound semisup Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

29 Plan 1 Introduction 2 PACBayesian Theory Majority Vote Classifiers A General PACBayesian Theorem Bounding the Majority Vote Risk 3 SemiSupervised Learning and Variations SemiSupervised Learning Transductive Learning Domain Adaptation 4 Conclusion Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

30 Transductive Learning Assumption Examples are drawn without replacement from a finite set Z of size N. S = { (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) } Z U = { (x n1, ), (x n2, ),..., (x N, ) } = Z \ S Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

31 Transductive Learning Assumption Examples are drawn without replacement from a finite set Z of size N. S = { (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) } Z U = { (x n1, ), (x n2, ),..., (x N, ) } = Z \ S Inductive learning: n draws with replacement according to D Binomial law. Transductive learning: n draws without replacement in Z Hypergeometric law. Theorem (Bégin et al. 2014) For any set Z of N examples, [...] with probability at least 1 δ over the choice of n examples among Z, Q on H : ( RS (G Q ), R Z (G Q ) ) 1 [ KL(Q P) ln T ] (n, N), n δ where T (n, N) = max K=0...N min[n,k] k=max[0,kn N] ( K k)( N K ( N n) n k ) e n ( k n, K N ) Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34.

32 Theorem Pr S [Z] n ( Q on H : ( RS (G Q ), R Z (G Q ) ) 1 n [ KL(Q P) ln T ]) (n, N) δ 1 δ. Proof. ( n ) E L S l l (h), E L Z (h) h Q ( L l S (h), L Z (h)) l h Q Jensen s inequality E h Q n ( Change of measure KL(Q P) ln E e n L h P Markov s inequality 1 δ KL(Q P) ln 1 δ Expectations swap = KL(Q P) ln 1 δ E Hypergeometric law = KL(Q P) ln 1 δ E h P Supremum over risk KL(Q P) ln 1 δ max K=0...N = KL(Q P) ln 1 δ T (n, N). ) l S (h), L Z l (h) l n ( L E E e S (h), L Z l (h)) S [Z] n h P l L E S h P S [Z] nen ( (h), L Z l (h)) k ( N L l Z (h) k k )( N N L Z l (h) n k ) e n ( k n, L Z l (h)) ( N n) [ ] ( K k)( N K n k ) e n ( k ( N n, K N ) n) Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

33 A New Transductive Bound A new function (Bégin et al. 2014) ( ) β (q, p) = kl(q, p) 1 β β kl p βq 1 β, p. Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

34 Bounds Values (adaboost iterates) Disagreement (test) Gibbs risk (test) Cbound (test) Maj. vote risk (test) Maj. vote risk (train) PAC 2*Gibbs bound PAC Cbound 1 PAC Cbound 2 PAC Cbound semisup. PAC CBound transductive Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

35 Plan 1 Introduction 2 PACBayesian Theory Majority Vote Classifiers A General PACBayesian Theorem Bounding the Majority Vote Risk 3 SemiSupervised Learning and Variations SemiSupervised Learning Transductive Learning Domain Adaptation 4 Conclusion Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

36 Domain Adaptation Assumption Source and target examples are generated by different distributions. S = { (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) } (D S ) n T = { (x 1, ), (x 2, ),..., (x n, ) } (D T ) n Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

37 Our Domain Adaptation Setting Assumption Source and target examples are generated by different distributions. S = { (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) } (D S ) n T = { (x 1, ), (x 2, ),..., (x n, ) } (D T ) n A domain adaptation learning algorithm is provided with a labeled source sample an unlabeled target sample The goal is to build a classifier with a low target risk R DT (h) Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

38 Generalization Bound Observation R D (G Q ) = 1 2 d D Q e D Q = 1 2 d D Q E E (x,y) D h i Q [ E I h j Q ] h i (x) y [ I h j (x) y PACBayesian DA Bound (Germain, Habrard, et al. 2016) For any prior P over H, any δ (0, 1], any real numbers b > 1 and c > 1, with a probability at least 1 δ over the choices of S (D S ) n and T (D T ) n, we have Q on H, R DT (G Q ) c 1 2 empirical target disagreement {}}{ d T Q b β (D T D S ) }{{} domain divergence Linear tradeoff between d T Q and ês Q : We consider β (D T D S ) = sup x D T (x) D S (x) empirical source joint error complexity term {}}{{}}{ êq S O ( ) KL(Q P) ln 1 δ. as a parameter to tune. Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34 ].

39 Learning algorithm for Linear Classifiers As many PACBayesian works (since Langford and ShaweTaylor 2002) : We consider the set H of all linear classifiers h v in X = R d : h v (x) = sgn(v x). Let Q w on H be a Gaussian distribution centered on w (with Σ=I d ): [ ] h w(x) = sgn E h v(x). v Q w Given T = {x i } n i=1 and S = {(x j, y j )} n j=1, find w Rd that minimizes: C d T Q ( ) C n i Φ w xi dis x i B êq S KL(Q w P 0 ). ) B n j (y Φerr w x j j x j 1 2 w Φ( ) Φ err ( ) Φ dis ( ) Margin y w x x losses Low density region on target Classification accuracy on source Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

40 Toy Experiment RBF kernel B = 1 C = Target rotation of 10 degrees Target rotation of 30 degrees Target rotation of 50 degrees Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

41 Empirical results on Amazon Dataset Linear kernel Hyperparameter selection by reverse crossvalidation svm dasvm coda PBDA DALC books DVDs books electro books kitchen DVDs books DVDs electro DVDs kitchen electro books electro DVDs electro kitchen kitchen books kitchen DVDs kitchen electro Average Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

42 Plan 1 Introduction 2 PACBayesian Theory Majority Vote Classifiers A General PACBayesian Theorem Bounding the Majority Vote Risk 3 SemiSupervised Learning and Variations SemiSupervised Learning Transductive Learning Domain Adaptation 4 Conclusion Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

43 Short Summary An original PACBayesian approach General theorem from which we recover existing results; Modular proof, easy to adapt to various frameworks (i.e., transductive learning). The virtue of disagreement Secondorder statistic from unlabeled sample (useful in semisupervised setting and variants); Allows to reduce the value of PACBayesian bounds (Cbound); Also used in a domain adaptation learning algorithm. Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

44 Some (self)pointers Our 74 pages journal paper (JMLR) Risk Bounds for the Majority Vote: From a PACBayesian Analysis to a Learning Algorithm (Germain, Lacasse, et al. 2015) My PhD thesis (in french) Généralisations de la théorie PACbayésienne pour l apprentissage inductif, l apprentissage transductif et l adaptation de domaine (Germain 2015) Recent papers ICML: A New PACBayesian Perspective on Domain Adaptation (Germain, Habrard, et al. 2016) AISTATS: PACBayesian Bounds Based on the Rényi Divergence (Bégin et al. 2016) NIPS: PACBayesian Theory Meets Bayesian Inference (Germain, Bach, et al. 2016) Even some code on my GitHub... Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34

45 References I Alquier, Pierre, James Ridgway, and Nicolas Chopin (2015). On the properties of variational approximations of Gibbs posteriors. In: ArXiv eprints. url: Bégin, Luc, Pascal Germain, François Laviolette, and JeanFrancis Roy (2014). PACBayesian Theory for Transductive Learning. In: AISTATS. (2016). PACBayesian Bounds based on the Rényi Divergence. In: AISTATS. Catoni, Olivier (2007). PACBayesian supervised classification: the thermodynamics of statistical learning. Vol. 56. Inst. of Mathematical Statistic. Germain, Pascal (2015). Généralisations de la théorie PACbayésienne pour l apprentissage inductif, l apprentissage transductif et l adaptation de domaine. PhD thesis. Université Laval. url: Germain, Pascal, Francis Bach, Alexandre Lacoste, and Simon LacosteJulien (2016). PACBayesian Theory Meets Bayesian Inference. In: ArXiv eprints. url: Germain, Pascal, Amaury Habrard, François Laviolette, and Emilie Morvant (2016). A New PACBayesian Perspective on Domain Adaptation. In: ICML. url: Germain, Pascal, Alexandre Lacasse, Francois Laviolette, Mario Marchand, and JeanFrancis Roy (2015). Risk Bounds for the Majority Vote: From a PACBayesian Analysis to a Learning Algorithm. In: JMLR 16. Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 2

46 References II Langford, John and Matthias Seeger (2001). Bounds for averaging classifiers. Tech. rep. Carnegie Mellon, Departement of Computer Science. Langford, John and John ShaweTaylor (2002). PACBayes & Margins. In: NIPS. McAllester, David (1999). Some PACBayesian Theorems. In: Machine Learning (2003). PACBayesian Stochastic Model selection. In: Machine Learning Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 2

Variations sur la borne PAC-bayésienne

Variations sur la borne PAC-bayésienne Variations sur la borne PAC-bayésienne Pascal Germain INRIA Paris Équipe SIRRA Séminaires du département d informatique et de génie logiciel Université Laval 11 juillet 2016 Pascal Germain INRIA/SIRRA

More information

A tutorial on the Pac-Bayesian Theory. by François Laviolette

A tutorial on the Pac-Bayesian Theory. by François Laviolette A tutorial on the Pac-Bayesian Theory NIPS workshop - (Almost) 50 shades of Bayesian Learning: PAC-Bayesian trends and insights by François Laviolette Laboratoire du GRAAL, Université Laval December 9th

More information

PAC-Bayesian Learning and Domain Adaptation

PAC-Bayesian Learning and Domain Adaptation PAC-Bayesian Learning and Domain Adaptation Pascal Germain 1 François Laviolette 1 Amaury Habrard 2 Emilie Morvant 3 1 GRAAL Machine Learning Research Group Département d informatique et de génie logiciel

More information

PAC-Bayesian Generalization Bound for Multi-class Learning

PAC-Bayesian Generalization Bound for Multi-class Learning PAC-Bayesian Generalization Bound for Multi-class Learning Loubna BENABBOU Department of Industrial Engineering Ecole Mohammadia d Ingènieurs Mohammed V University in Rabat, Morocco Benabbou@emi.ac.ma

More information

La théorie PAC-Bayes en apprentissage supervisé

La théorie PAC-Bayes en apprentissage supervisé La théorie PAC-Bayes en apprentissage supervisé Présentation au LRI de l université Paris XI François Laviolette, Laboratoire du GRAAL, Université Laval, Québec, Canada 14 dcembre 2010 Summary Aujourd

More information

From PAC-Bayes Bounds to Quadratic Programs for Majority Votes

From PAC-Bayes Bounds to Quadratic Programs for Majority Votes François Laviolette FrancoisLaviolette@iftulavalca Mario Marchand MarioMarchand@iftulavalca Jean-Francis Roy Jean-FrancisRoy1@ulavalca Département d informatique et de génie logiciel, Université Laval,

More information

Domain-Adversarial Neural Networks

Domain-Adversarial Neural Networks Domain-Adversarial Neural Networks Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand Département d informatique et de génie logiciel, Université Laval, Québec, Canada Département

More information

PAC-Bayesian Bounds based on the Rényi Divergence

PAC-Bayesian Bounds based on the Rényi Divergence Luc Bégin Pascal Germain 2 François Laviolette 3 Jean-Francis Roy 3 lucbegin@umonctonca pascalgermain@inriafr {francoislaviolette, jean-francisroy}@iftulavalca Campus d dmundston, Université de Moncton,

More information

Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm

Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm Journal of Machine Learning Research 16 2015 787-860 Submitted 5/13; Revised 9/14; Published 4/15 Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm Pascal Germain

More information

arxiv: v1 [stat.ml] 17 Jul 2017

arxiv: v1 [stat.ml] 17 Jul 2017 PACBayes and Domain Adaptation arxiv:1707.05712v1 [stat.ml] 17 Jul 2017 Pascal Germain pascal.germain@inria.fr Département d informatique de l ENS, École normale supérieure, CNRS, PSL Research University,

More information

A Strongly Quasiconvex PAC-Bayesian Bound

A Strongly Quasiconvex PAC-Bayesian Bound A Strongly Quasiconvex PAC-Bayesian Bound Yevgeny Seldin NIPS-2017 Workshop on (Almost) 50 Shades of Bayesian Learning: PAC-Bayesian trends and insights Based on joint work with Niklas Thiemann, Christian

More information

A Column Generation Bound Minimization Approach with PAC-Bayesian Generalization Guarantees

A Column Generation Bound Minimization Approach with PAC-Bayesian Generalization Guarantees A Column Generation Bound Minimization Approach with PAC-Bayesian Generalization Guarantees Jean-Francis Roy jean-francis.roy@ift.ulaval.ca Mario Marchand mario.marchand@ift.ulaval.ca François Laviolette

More information

PAC-Bayes Risk Bounds for Sample-Compressed Gibbs Classifiers

PAC-Bayes Risk Bounds for Sample-Compressed Gibbs Classifiers PAC-Bayes Ris Bounds for Sample-Compressed Gibbs Classifiers François Laviolette Francois.Laviolette@ift.ulaval.ca Mario Marchand Mario.Marchand@ift.ulaval.ca Département d informatique et de génie logiciel,

More information

PAC-Bayesian Analysis for a two-step Hierarchical Multiview Learning Approach

PAC-Bayesian Analysis for a two-step Hierarchical Multiview Learning Approach PAC-Bayesian Analysis for a two-step Hierarchical Multiview Learning Approach Anil Goyal, milie Morvant, Pascal Germain, Massih-Reza Amini To cite this version: Anil Goyal, milie Morvant, Pascal Germain,

More information

Domain Adaptation of Majority Votes via Perturbed Variation-based Label Transfer

Domain Adaptation of Majority Votes via Perturbed Variation-based Label Transfer Domain Adaptation of Majority Votes via Perturbed Variation-based Label Transfer Emilie Morvant To cite this version: Emilie Morvant. Domain Adaptation of Majority Votes via Perturbed Variation-based Label

More information

Deep Neural Networks: From Flat Minima to Numerically Nonvacuous Generalization Bounds via PAC-Bayes

Deep Neural Networks: From Flat Minima to Numerically Nonvacuous Generalization Bounds via PAC-Bayes Deep Neural Networks: From Flat Minima to Numerically Nonvacuous Generalization Bounds via PAC-Bayes Daniel M. Roy University of Toronto; Vector Institute Joint work with Gintarė K. Džiugaitė University

More information

Multiview Boosting by Controlling the Diversity and the Accuracy of View-specific Voters

Multiview Boosting by Controlling the Diversity and the Accuracy of View-specific Voters Multiview Boosting by Controlling the Diversity and the Accuracy of View-specific Voters Anil Goyal 1,2 milie Morvant 1 Pascal Germain 3 Massih-Reza Amini 2 1 Univ Lyon, UJM-Saint-tienne, CNRS, Institut

More information

PAC-Bayesian Theory Meets Bayesian Inference

PAC-Bayesian Theory Meets Bayesian Inference PAC-Bayesian Theory Meets Bayesian Inference Pascal Germain Francis Bach Alexandre Lacoste Simon Lacoste-Julien INRIA Paris - École Normale Supérieure, firstname.lastname@inria.fr Google, allac@google.com

More information

PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification

PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification Emilie Morvant, Sokol Koço, Liva Ralaivola To cite this version: Emilie Morvant, Sokol Koço, Liva Ralaivola. PAC-Bayesian

More information

Model Averaging With Holdout Estimation of the Posterior Distribution

Model Averaging With Holdout Estimation of the Posterior Distribution Model Averaging With Holdout stimation of the Posterior Distribution Alexandre Lacoste alexandre.lacoste.1@ulaval.ca François Laviolette francois.laviolette@ift.ulaval.ca Mario Marchand mario.marchand@ift.ulaval.ca

More information

Two transfer learning approaches for domain adaptation

Two transfer learning approaches for domain adaptation Two transfer learning approaches for domain adaptation Amaury Habrard amaury.habrard@univ-st-etienne.fr Laboratoire Hubert Curien, UMR CNRS 5516 University Jean Monnet - Saint-Étienne (France) Seminar,

More information

Domain adaptation of weighted majority votes via perturbed variation-based self-labeling

Domain adaptation of weighted majority votes via perturbed variation-based self-labeling Domain adaptation of weighted majority votes via perturbed variation-based self-labeling Emilie Morvant To cite this version: Emilie Morvant. Domain adaptation of weighted majority votes via perturbed

More information

Sparse Domain Adaptation in a Good Similarity-Based Projection Space

Sparse Domain Adaptation in a Good Similarity-Based Projection Space Sparse Domain Adaptation in a Good Similarity-Based Projection Space Emilie Morvant, Amaury Habrard, Stéphane Ayache To cite this version: Emilie Morvant, Amaury Habrard, Stéphane Ayache. Sparse Domain

More information

arxiv: v1 [stat.ml] 30 Oct 2018

arxiv: v1 [stat.ml] 30 Oct 2018 Gaël Letarte gael.letarte.@ulaval.ca Emilie Morvant 2 emilie.morvant@univ-st-etienne.fr Pascal Germain 3 pascal.germain@inria.fr Département d informatique et de génie logiciel, Université Laval, Québec,

More information

1-bit Matrix Completion. PAC-Bayes and Variational Approximation

1-bit Matrix Completion. PAC-Bayes and Variational Approximation : PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Junior Conference on Data Science 2016 Université Paris Saclay, 15-16 September 2016 Introduction: Matrix Completion

More information

1-bit Matrix Completion. PAC-Bayes and Variational Approximation

1-bit Matrix Completion. PAC-Bayes and Variational Approximation : PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Bayes In Paris, 5 January 2017 (Happy New Year!) Various Topics covered Matrix Completion PAC-Bayesian Estimation Variational

More information

Active and Semi-supervised Kernel Classification

Active and Semi-supervised Kernel Classification Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),

More information

Foundations of Machine Learning

Foundations of Machine Learning Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about

More information

Generalization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh

Generalization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh Generalization Bounds in Machine Learning Presented by: Afshin Rostamizadeh Outline Introduction to generalization bounds. Examples: VC-bounds Covering Number bounds Rademacher bounds Stability bounds

More information

Generalization Bounds

Generalization Bounds Generalization Bounds Here we consider the problem of learning from binary labels. We assume training data D = x 1, y 1,... x N, y N with y t being one of the two values 1 or 1. We will assume that these

More information

Introduction to Statistical Learning Theory

Introduction to Statistical Learning Theory Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated

More information

A New PAC-Bayesian Perspective on Domain Adaptation

A New PAC-Bayesian Perspective on Domain Adaptation A New PAC-Bayesian Perspective on Domain Adaptation Pascal Germain, Amaury Habrard, François Laviolette, milie Morvant To cite this version: Pascal Germain, Amaury Habrard, François Laviolette, milie Morvant.

More information

Sample and Computationally Efficient Active Learning. Maria-Florina Balcan Carnegie Mellon University

Sample and Computationally Efficient Active Learning. Maria-Florina Balcan Carnegie Mellon University Sample and Computationally Efficient Active Learning Maria-Florina Balcan Carnegie Mellon University Machine Learning is Shaping the World Highly successful discipline with lots of applications. Computational

More information

Learning with Rejection

Learning with Rejection Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 2,1 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Mathematical Sciences, 251 Mercer Street, New York,

More information

A Pseudo-Boolean Set Covering Machine

A Pseudo-Boolean Set Covering Machine A Pseudo-Boolean Set Covering Machine Pascal Germain, Sébastien Giguère, Jean-Francis Roy, Brice Zirakiza, François Laviolette, and Claude-Guy Quimper Département d informatique et de génie logiciel, Université

More information

PAC-learning, VC Dimension and Margin-based Bounds

PAC-learning, VC Dimension and Margin-based Bounds More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based

More information

Learning a Multiview Weighted Majority Vote Classifier: Using PAC-Bayesian Theory and Boosting

Learning a Multiview Weighted Majority Vote Classifier: Using PAC-Bayesian Theory and Boosting Learning a Multiview Weighted Majority Vote Classifier: Using PAC-Bayesian Theory and Boosting Anil Goyal To cite this version: Anil Goyal. Learning a Multiview Weighted Majority Vote Classifier: Using

More information

Generalization, Overfitting, and Model Selection

Generalization, Overfitting, and Model Selection Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How

More information

Does Unlabeled Data Help?

Does Unlabeled Data Help? Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline

More information

PAC-Bayes Generalization Bounds for Randomized Structured Prediction

PAC-Bayes Generalization Bounds for Randomized Structured Prediction PAC-Bayes Generalization Bounds for Randomized Structured Prediction Ben London University of Maryland blondon@cs.umd.edu Ben Taskar University of Washington taskar@cs.washington.edu Bert Huang University

More information

What is semi-supervised learning?

What is semi-supervised learning? What is semi-supervised learning? In many practical learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate text processing, video-indexing,

More information

Efficient Semi-supervised and Active Learning of Disjunctions

Efficient Semi-supervised and Active Learning of Disjunctions Maria-Florina Balcan ninamf@cc.gatech.edu Christopher Berlind cberlind@gatech.edu Steven Ehrlich sehrlich@cc.gatech.edu Yingyu Liang yliang39@gatech.edu School of Computer Science, College of Computing,

More information

Generalization bounds

Generalization bounds Advanced Course in Machine Learning pring 200 Generalization bounds Handouts are jointly prepared by hie Mannor and hai halev-hwartz he problem of characterizing learnability is the most basic question

More information

Learning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14

Learning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14 Learning Theory Piyush Rai CS5350/6350: Machine Learning September 27, 2011 (CS5350/6350) Learning Theory September 27, 2011 1 / 14 Why Learning Theory? We want to have theoretical guarantees about our

More information

A Pseudo-Boolean Set Covering Machine

A Pseudo-Boolean Set Covering Machine A Pseudo-Boolean Set Covering Machine Pascal Germain, Sébastien Giguère, Jean-Francis Roy, Brice Zirakiza, François Laviolette, and Claude-Guy Quimper GRAAL (Université Laval, Québec city) October 9, 2012

More information

TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter Generalization and Regularization

TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter Generalization and Regularization TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter 2019 Generalization and Regularization 1 Chomsky vs. Kolmogorov and Hinton Noam Chomsky: Natural language grammar cannot be learned by

More information

Joint distribution optimal transportation for domain adaptation

Joint distribution optimal transportation for domain adaptation Joint distribution optimal transportation for domain adaptation Changhuang Wan Mechanical and Aerospace Engineering Department The Ohio State University March 8 th, 2018 Joint distribution optimal transportation

More information

1 Active Learning Foundations of Machine Learning and Data Science. Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015

1 Active Learning Foundations of Machine Learning and Data Science. Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015 10-806 Foundations of Machine Learning and Data Science Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015 1 Active Learning Most classic machine learning methods and the formal learning

More information

Domain Adaptation for Regression

Domain Adaptation for Regression Domain Adaptation for Regression Corinna Cortes Google Research corinna@google.com Mehryar Mohri Courant Institute and Google mohri@cims.nyu.edu Motivation Applications: distinct training and test distributions.

More information

Supervised Metric Learning with Generalization Guarantees

Supervised Metric Learning with Generalization Guarantees Supervised Metric Learning with Generalization Guarantees Aurélien Bellet Laboratoire Hubert Curien, Université de Saint-Etienne, Université de Lyon Reviewers: Pierre Dupont (UC Louvain) and Jose Oncina

More information

Introduction to Statistical Learning Theory

Introduction to Statistical Learning Theory Introduction to Statistical Learning Theory Definition Reminder: We are given m samples {(x i, y i )} m i=1 Dm and a hypothesis space H and we wish to return h H minimizing L D (h) = E[l(h(x), y)]. Problem

More information

Stochastic optimization in Hilbert spaces

Stochastic optimization in Hilbert spaces Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic optimization Hilbert spaces 1 / 48 Outline Learning vs Statistics Aymeric Dieuleveut Stochastic optimization Hilbert

More information

Improved Algorithms for Confidence-Rated Prediction with Error Guarantees

Improved Algorithms for Confidence-Rated Prediction with Error Guarantees Improved Algorithms for Confidence-Rated Prediction with Error Guarantees Kamalika Chaudhuri Chicheng Zhang Department of Computer Science and Engineering University of California, San Diego 95 Gilman

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning

More information

Domain-Adversarial Neural Networks

Domain-Adversarial Neural Networks Doain-Adversarial Neural Networks Hana Ajakan, Pascal Gerain 2, Hugo Larochelle 3, François Laviolette 2, Mario Marchand 2,2 Départeent d inforatique et de génie logiciel, Université Laval, Québec, Canada

More information

The Median Hypothesis

The Median Hypothesis The Median Hypothesis Ran Gilad-Bachrach Chris J. C. Burges Microsoft Research Technical Report: MSR-TR-202-56 (May 202) Abstract A typical learning process begins with some prior beliefs about the target

More information

Support Vector Machines (SVMs).

Support Vector Machines (SVMs). Support Vector Machines (SVMs). SemiSupervised Learning. SemiSupervised SVMs. MariaFlorina Balcan 3/25/215 Support Vector Machines (SVMs). One of the most theoretically well motivated and practically most

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1 Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Risk Minimization Barnabás Póczos What have we seen so far? Several classification & regression algorithms seem to work fine on training datasets: Linear

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II 1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector

More information

A Comparison of Tight Generalization Error Bounds

A Comparison of Tight Generalization Error Bounds Matti Kääriäinen matti.kaariainen@cs.helsinki.fi International Computer Science Institute, 1947 Center St Suite 600, Berkeley, CA 94704, USA John Langford jl@tti-c.org TTI-Chicago, 1427 E 60th Street,

More information

Towards Lifelong Machine Learning Multi-Task and Lifelong Learning with Unlabeled Tasks Christoph Lampert

Towards Lifelong Machine Learning Multi-Task and Lifelong Learning with Unlabeled Tasks Christoph Lampert Towards Lifelong Machine Learning Multi-Task and Lifelong Learning with Unlabeled Tasks Christoph Lampert HSE Computer Science Colloquium September 6, 2016 IST Austria (Institute of Science and Technology

More information

A PAC-Bayes Bound for Tailored Density Estimation

A PAC-Bayes Bound for Tailored Density Estimation A PAC-Bayes Bound for Tailored Density Estimation Matthew Higgs and John Shawe-Taylor Center for Computational Statistics and Machine Learning, University College London {mhiggs,jst}@cs.ucl.ac.uk Abstract.

More information

Synthesis of Maximum Margin and Multiview Learning using Unlabeled Data

Synthesis of Maximum Margin and Multiview Learning using Unlabeled Data Synthesis of Maximum Margin and Multiview Learning using Unlabeled Data Sandor Szedma 1 and John Shawe-Taylor 1 1 - Electronics and Computer Science, ISIS Group University of Southampton, SO17 1BJ, United

More information

A Tight Excess Risk Bound via a Unified PAC-Bayesian- Rademacher-Shtarkov-MDL Complexity

A Tight Excess Risk Bound via a Unified PAC-Bayesian- Rademacher-Shtarkov-MDL Complexity A Tight Excess Risk Bound via a Unified PAC-Bayesian- Rademacher-Shtarkov-MDL Complexity Peter Grünwald Centrum Wiskunde & Informatica Amsterdam Mathematical Institute Leiden University Joint work with

More information

Machine Learning. Ensemble Methods. Manfred Huber

Machine Learning. Ensemble Methods. Manfred Huber Machine Learning Ensemble Methods Manfred Huber 2015 1 Bias, Variance, Noise Classification errors have different sources Choice of hypothesis space and algorithm Training set Noise in the data The expected

More information

Consistency of Nearest Neighbor Methods

Consistency of Nearest Neighbor Methods E0 370 Statistical Learning Theory Lecture 16 Oct 25, 2011 Consistency of Nearest Neighbor Methods Lecturer: Shivani Agarwal Scribe: Arun Rajkumar 1 Introduction In this lecture we return to the study

More information

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring / Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple

More information

Numerical Learning Algorithms

Numerical Learning Algorithms Numerical Learning Algorithms Example SVM for Separable Examples.......................... Example SVM for Nonseparable Examples....................... 4 Example Gaussian Kernel SVM...............................

More information

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015 Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and

More information

PAC-learning, VC Dimension and Margin-based Bounds

PAC-learning, VC Dimension and Margin-based Bounds More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based

More information

arxiv: v3 [stat.ml] 9 Aug 2016

arxiv: v3 [stat.ml] 9 Aug 2016 with Specialization to Linear Classifiers Pascal Gerain Aaury Habrard François Laviolette 3 ilie Morvant INRIA, SIRRA Project-Tea, 75589 Paris, France, et DI, École Norale Supérieure, 7530 Paris, France

More information

IFT Lecture 7 Elements of statistical learning theory

IFT Lecture 7 Elements of statistical learning theory IFT 6085 - Lecture 7 Elements of statistical learning theory This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s): Brady Neal and

More information

COMS 4771 Lecture Boosting 1 / 16

COMS 4771 Lecture Boosting 1 / 16 COMS 4771 Lecture 12 1. Boosting 1 / 16 Boosting What is boosting? Boosting: Using a learning algorithm that provides rough rules-of-thumb to construct a very accurate predictor. 3 / 16 What is boosting?

More information

Lecture 21: Minimax Theory

Lecture 21: Minimax Theory Lecture : Minimax Theory Akshay Krishnamurthy akshay@cs.umass.edu November 8, 07 Recap In the first part of the course, we spent the majority of our time studying risk minimization. We found many ways

More information

Support vector machines Lecture 4

Support vector machines Lecture 4 Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The

More information

Active learning. Sanjoy Dasgupta. University of California, San Diego

Active learning. Sanjoy Dasgupta. University of California, San Diego Active learning Sanjoy Dasgupta University of California, San Diego Exploiting unlabeled data A lot of unlabeled data is plentiful and cheap, eg. documents off the web speech samples images and video But

More information

IIwl1 2 w XII = x - XT

IIwl1 2 w XII = x - XT shorter argument and much tighter than previous margin bounds. There are two mathematical flavors of margin bound dependent upon the weights Wi of the vote and the features Xi that the vote is taken over.

More information

Learning with multiple models. Boosting.

Learning with multiple models. Boosting. CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 17: Stochastic Optimization Part II: Realizable vs Agnostic Rates Part III: Nearest Neighbor Classification Stochastic

More information

PAC Generalization Bounds for Co-training

PAC Generalization Bounds for Co-training PAC Generalization Bounds for Co-training Sanjoy Dasgupta AT&T Labs Research dasgupta@research.att.com Michael L. Littman AT&T Labs Research mlittman@research.att.com David McAllester AT&T Labs Research

More information

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31 Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking Dengyong Zhou zhou@tuebingen.mpg.de Dept. Schölkopf, Max Planck Institute for Biological Cybernetics, Germany Learning from

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

Sharp Generalization Error Bounds for Randomly-projected Classifiers

Sharp Generalization Error Bounds for Randomly-projected Classifiers Sharp Generalization Error Bounds for Randomly-projected Classifiers R.J. Durrant and A. Kabán School of Computer Science The University of Birmingham Birmingham B15 2TT, UK http://www.cs.bham.ac.uk/ axk

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory Problem set 1 Due: Monday, October 10th Please send your solutions to learning-submissions@ttic.edu Notation: Input space: X Label space: Y = {±1} Sample:

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 4: MDL and PAC-Bayes Uniform vs Non-Uniform Bias No Free Lunch: we need some inductive bias Limiting attention to hypothesis

More information

The sample complexity of agnostic learning with deterministic labels

The sample complexity of agnostic learning with deterministic labels The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390 Introduction For the analysis of data structures and algorithms

More information

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab

More information

Classical Predictive Models

Classical Predictive Models Laplace Max-margin Markov Networks Recent Advances in Learning SPARSE Structured I/O Models: models, algorithms, and applications Eric Xing epxing@cs.cmu.edu Machine Learning Dept./Language Technology

More information

CS7267 MACHINE LEARNING

CS7267 MACHINE LEARNING CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori

More information

Selective Prediction. Binary classifications. Rong Zhou November 8, 2017

Selective Prediction. Binary classifications. Rong Zhou November 8, 2017 Selective Prediction Binary classifications Rong Zhou November 8, 2017 Table of contents 1. What are selective classifiers? 2. The Realizable Setting 3. The Noisy Setting 1 What are selective classifiers?

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity; CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and

More information

Convergence Rates of Kernel Quadrature Rules

Convergence Rates of Kernel Quadrature Rules Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction

More information