Generalization of the PAC-Bayesian Theory
|
|
- Angela Marshall
- 5 years ago
- Views:
Transcription
1 Generalization of the PACBayesian Theory and Applications to SemiSupervised Learning Pascal Germain INRIA Paris (SIERRA Team) Modal Seminar INRIA Lille January 24, 2017 Dans la vie, l essentiel est de porter sur tout des jugements a priori. Boris Vian Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
2 Plan 1 Introduction 2 PACBayesian Theory Majority Vote Classifiers A General PACBayesian Theorem Bounding the Majority Vote Risk 3 SemiSupervised Learning and Variations SemiSupervised Learning Transductive Learning Domain Adaptation 4 Conclusion Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
3 Plan 1 Introduction 2 PACBayesian Theory Majority Vote Classifiers A General PACBayesian Theorem Bounding the Majority Vote Risk 3 SemiSupervised Learning and Variations SemiSupervised Learning Transductive Learning Domain Adaptation 4 Conclusion Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
4 Definitions Learning example An example (x, y) X Y is a descriptionlabel pair. Data generating distribution Each example is an observation from distribution D on X Y. Learning sample S = { (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) } D n Predictors (or hypothesis) h : X Y, h H Learning algorithm A(S) h Loss function l : H X Y R Empirical loss L l S (h) = 1 n n l(h, x i, y i ) i=1 Generalization loss LD l (h) = E l(h, x, y) (x,y) D Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
5 PACBayesian Theory Initiated by McAllester (1999), the PACBayesian theory gives PAC generalization guarantees to Bayesian like algorithms. PAC guarantees (Probably Approximately Correct) With probability at least 1 δ, the loss of predictor h is less than ε ( Pr LD(h) l ε( L ) S(h), l n, δ,...) 1 δ S D n Bayesian flavor Given: A prior distribution P on H. A posterior distribution Q on H. ( ) Pr E L l l D(h) ε( E L S (h), n, δ, P,...) 1 δ S D n h Q h Q Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
6 A Classical PACBayesian Theorem PACBayesian theorem (adapted from McAllester 1999, 2003) For any distribution D on X Y, for any set of predictors H, for any loss l : H X Y [0, 1], for any distribution P on H, for any δ (0, 1], we have, ( Pr S D n Q on H : EL l h Q D(h) l E h Q L S (h) [ 1 2n KL(Q P) ln 2 n δ ] ) 1 δ, where KL(Q P) = Training bound E ln Q(h) h Q P(h) is the KullbackLeibler divergence. Gives generalization guarantees not based on testing sample. Valid for all posterior Q on H Inspiration for conceiving new learning algorithms. Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
7 Plan 1 Introduction 2 PACBayesian Theory Majority Vote Classifiers A General PACBayesian Theorem Bounding the Majority Vote Risk 3 SemiSupervised Learning and Variations SemiSupervised Learning Transductive Learning Domain Adaptation 4 Conclusion Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
8 Plan 1 Introduction 2 PACBayesian Theory Majority Vote Classifiers A General PACBayesian Theorem Bounding the Majority Vote Risk 3 SemiSupervised Learning and Variations SemiSupervised Learning Transductive Learning Domain Adaptation 4 Conclusion Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
9 Majority Vote Classifiers Consider a binary classification problem, where Y = { 1, 1} and the set H contains binary voters h : X { 1, 1} Weighted majority vote To predict the label of x X, the classifier asks for the prevailing opinion ( ) B Q (x) = sgn E h(x) h Q Many learning algorithms output majority vote classifiers AdaBoost, Random Forests, Bagging,... Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
10 A Surrogate Loss Majority vote risk R D (B Q ) = Pr (x,y) D ( ) B Q (x) y where I[a] = 1 if predicate a is true; I[a] = 0 otherwise. [ ] = E I E y h(x) 0 (x,y) D h Q Gibbs Risk / Linear Loss The stochastic Gibbs classifier G Q (x) draws h H according to Q and output h (x). [ ] R D (G Q ) = E E I h(x) y (x,y) D h Q = E h Q L l 01 D (h), where l 01(h, x, y) = I [ h(x) y ]. Factor two It is wellknown that R D (B Q ) 2 R D (G Q ) y h(x) E y h(x) h Q Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
11 Q From the Factor 2 to the Cbound From Markov s inequality ( Pr (X a) E X a ), we obtain: Factor 2 bound ( ) R D (B Q ) = Pr 1 y h(x) 1 (x,y) D E (x,y) D ( 1 y h(x) ) = 2 RD (G Q ) y h(x) From Chebyshev s inequality ( Pr (X E X a) The Cbound (Lacasse et al., 2006) ( 1 2 R D (G Q ) R D (B Q ) CQ D = dq D where dq D is the expected disagreement: [ ] dq D = E E E I h i (x) h j (x) = 1 (x, ) D h i Q h j Q 2 ) 2 Var X a 2 Var X ), we obtain: d D ( [ 1 E (x, ) D R D (G Q ) E h(x) h Q ] 2 ) Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
12 Plan 1 Introduction 2 PACBayesian Theory Majority Vote Classifiers A General PACBayesian Theorem Bounding the Majority Vote Risk 3 SemiSupervised Learning and Variations SemiSupervised Learning Transductive Learning Domain Adaptation 4 Conclusion Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
13 I.I.D. Assumption Assumption Examples are generated i.i.d. by a distribution D on X Y. S = { (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) } D n Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
14 A General PACBayesian Theorem function: «distance» between R S (G Q ) et R D (G Q ) Convex function : [0, 1] [0, 1] R. General theorem (Bégin et al. 2014, 2016; Germain 2015) For any distribution D on X Y, for any set H of voters, for any distribution P on H, for any δ (0, 1], and for any function, we have, with probability at least 1 δ over the choice of S D n, ) Q on H : ( RS (G Q ), R D (G Q ) 1 n [ KL(Q P) ln I ] (n), δ where I (n) = sup r [0,1] [ n ] ( n ) k r k (1 r) n k e n ( k n, r). }{{} k=0 ( ) Bin k;n,r Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
15 General theorem Pr S D n ( Q on H : ( RS (G Q ), R D (G Q ) ) 1 n [ KL(Q P) ln I ]) (n) δ 1 δ. Interpretation ( R S (G Q ), r) 1 n [KL(Q P) ln I (n) δ ] r Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
16 General theorem Pr S D n ( Q on H : ( RS (G Q ), R D (G Q ) ) 1 n [ KL(Q P) ln I ]) (n) δ 1 δ. Proof ideas. Change of Measure Inequality For any P and Q on H, and for any measurable function φ : H R, we have ( E φ(h) KL(Q P) ln E eφ(h)). h Q h P Markov s inequality Pr (X a) E X Pr ( ) X E X a δ 1 δ. Probability of observing k misclassifications among n examples Given a voter h, consider a binomial variable of n trials with success L l 01 D ( ) (h): ( ) Pr L l 01 S D n S (h)= k n ( ) k ( ) n k = L l01 n D k (h) 1 L l 01 D (h) ( ) = Bin k; n, L l 01 D (h) Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
17 General theorem Pr S D n ( Q on H : ( RS (G Q ), R D (G Q ) ) 1 n [ KL(Q P) ln I ]) (n) δ 1 δ. Proof. ( n ) E L S l (h), E L l h Q D(h) ( L l S (h), LD(h)) l h Q Jensen s Inequality E h Q n ( Change of measure KL(Q P) ln E e n L h P Markov s Inequality 1 δ KL(Q P) ln 1 δ Expectation swap = KL(Q P) ln 1 δ Binomial law = KL(Q P) ln 1 δ E S D n E h P E h P Supremum over risk KL(Q P) ln 1 δ sup r [0,1] = KL(Q P) ln 1 δ I (n). ) l S (h),l D l (h) E e h P E S D l n ( L S (h),ld l (h)) l L nen ( S (h),l l D (h)) n Bin ( k; n, LD(h) l ) e n ( k n,l D l (h)) k=0 [ n Bin ( k; n, r ) ] e n ( k n, r) Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34 k=0
18 General theorem Pr S D n ( Q on H : ( RS (G Q ), R D (G Q ) ) 1 n [ KL(Q P) ln I ]) (n) δ 1 δ. Corollary [...] with probability at least 1 δ over the choice of S D n, for all Q on H : ) [ ] (a) kl( RS (G Q ), R D (G Q ) 1 n KL(Q P) ln 2 n δ, (Langford and Seeger 2001) (b) R D (G Q ) R [ ] 1 S (G Q ) 2n KL(Q P) ln 2 n δ, (McAllester 1999, 2003) (c) R D (G Q ) (c 1 R 1 e c S (G Q ) 1 [ ] ) n KL(Q P) ln 1 δ, (Catoni 2007) (d) R D (G Q ) R S (G Q ) 1 [ λ KL(Q P) ln 1 δ f (λ, n)]. (Alquier et al. 2015) kl(q, p) = q ln q 1 q p (1 q) ln 1 p 2(q p)2, c (q, p) = ln[1 (1 e c ) p] c q, λ (q, p) = λ n (p q). Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
19 Plan 1 Introduction 2 PACBayesian Theory Majority Vote Classifiers A General PACBayesian Theorem Bounding the Majority Vote Risk 3 SemiSupervised Learning and Variations SemiSupervised Learning Transductive Learning Domain Adaptation 4 Conclusion Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
20 Bounding the Expected Disagreement Expected disagreement [ ] dq D = E E E I h i (x) h j (x) h i Q h j Q (x, ) D = E (x, ) D E l d(h ij, x, ), h ij Q 2 where Q 2 (h ij ) = Q(h i ) Q(h j ) = KL(Q 2 P 2 ) = 2 KL(Q P). General theorem [...] with probability at least 1 δ over the choice of S D n, ) S Q on H : ( d Q, d Q D 1 n [ 2 KL(Q P) ln I ] (n). δ Corollary ) S (a) kl( d Q, d Q D 1 n [ 2 KL(Q P) ln 2 n δ ]. Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
21 d r The Cbound The Cbound (Lacasse et al., 2006) ( 1 2 R D (G Q ) R D (B Q ) CQ D = dq D ) PACBayes Cbound 1 [...] with probability at least 1 δ over the choice of S D n, with Q on H : { ) r = max ( RS (G Q ), r r [0, 1 2 ] d = min d [0, 1 2 ] R D (B Q ) 1 [ 1 n { ) S ( d Q, d, 1 [ n ( ) r 1 2 d 2KL(Q P) ln I (n) δ/2 2 KL(Q P) ln I (n) δ/2, Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34 ] }, ] }.
22 d r The Cbound The Cbound (Lacasse et al., 2006) ( 1 2 R D (G Q ) R D (B Q ) CQ D = dq D ) PACBayes Cbound 2 [...] with probability at least 1 δ over the choice of S D n, { Q on H : R D (B Q ) sup (r,d) A 1 ( 1 2 r 1 2 d 0 ) }, with ) A = {(r, d) [0, 12 ] 2 (( R S (G Q ), r), ( d SQ, d) 1 n [ 2KL(Q P) ln I 2 (n) δ ] }. Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
23 Bounds Values (adaboost iterates) Disagreement (test) Gibbs risk (test) Cbound (test) Maj. vote risk (test) Maj. vote risk (train) PAC 2*Gibbs bound PAC Cbound 1 PAC Cbound Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
24 Plan 1 Introduction 2 PACBayesian Theory Majority Vote Classifiers A General PACBayesian Theorem Bounding the Majority Vote Risk 3 SemiSupervised Learning and Variations SemiSupervised Learning Transductive Learning Domain Adaptation 4 Conclusion Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
25 Plan 1 Introduction 2 PACBayesian Theory Majority Vote Classifiers A General PACBayesian Theorem Bounding the Majority Vote Risk 3 SemiSupervised Learning and Variations SemiSupervised Learning Transductive Learning Domain Adaptation 4 Conclusion Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
26 SemiSupervised Learning Assumption Examples are generated i.i.d. by a distribution D on X Y. S = { (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) } D n U = { (x n1, ), (x n2, ),..., (x nn, ) } D n Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
27 d r SemiSupervised Learning The Cbound (Lacasse et al., 2006) ( 1 2 R D (G Q ) R D (B Q ) CQ D = dq D ) PAC Cbound semisupervised [...] with probability at least 1 δ over the choice of S D n and U D n, with d = Q on H : R D (B Q ) 1 { ) r = max ( RS (G Q ), r 1 [ r [0, 1 2 ] n { ) S U ( d Q, d min d [0, 1 2 ] 1 nn [ ( ) r 1 2 d 2KL(Q P) ln I (n) δ/2, ] }, ] } 2 KL(Q P) ln I (nn ). δ/2 Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
28 Bounds Values (adaboost iterates) Disagreement (test) Gibbs risk (test) Cbound (test) Maj. vote risk (test) Maj. vote risk (train) PAC 2*Gibbs bound PAC Cbound 1 PAC Cbound 2 PAC Cbound semisup Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
29 Plan 1 Introduction 2 PACBayesian Theory Majority Vote Classifiers A General PACBayesian Theorem Bounding the Majority Vote Risk 3 SemiSupervised Learning and Variations SemiSupervised Learning Transductive Learning Domain Adaptation 4 Conclusion Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
30 Transductive Learning Assumption Examples are drawn without replacement from a finite set Z of size N. S = { (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) } Z U = { (x n1, ), (x n2, ),..., (x N, ) } = Z \ S Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
31 Transductive Learning Assumption Examples are drawn without replacement from a finite set Z of size N. S = { (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) } Z U = { (x n1, ), (x n2, ),..., (x N, ) } = Z \ S Inductive learning: n draws with replacement according to D Binomial law. Transductive learning: n draws without replacement in Z Hypergeometric law. Theorem (Bégin et al. 2014) For any set Z of N examples, [...] with probability at least 1 δ over the choice of n examples among Z, Q on H : ( RS (G Q ), R Z (G Q ) ) 1 [ KL(Q P) ln T ] (n, N), n δ where T (n, N) = max K=0...N min[n,k] k=max[0,kn N] ( K k)( N K ( N n) n k ) e n ( k n, K N ) Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34.
32 Theorem Pr S [Z] n ( Q on H : ( RS (G Q ), R Z (G Q ) ) 1 n [ KL(Q P) ln T ]) (n, N) δ 1 δ. Proof. ( n ) E L S l l (h), E L Z (h) h Q ( L l S (h), L Z (h)) l h Q Jensen s inequality E h Q n ( Change of measure KL(Q P) ln E e n L h P Markov s inequality 1 δ KL(Q P) ln 1 δ Expectations swap = KL(Q P) ln 1 δ E Hypergeometric law = KL(Q P) ln 1 δ E h P Supremum over risk KL(Q P) ln 1 δ max K=0...N = KL(Q P) ln 1 δ T (n, N). ) l S (h), L Z l (h) l n ( L E E e S (h), L Z l (h)) S [Z] n h P l L E S h P S [Z] nen ( (h), L Z l (h)) k ( N L l Z (h) k k )( N N L Z l (h) n k ) e n ( k n, L Z l (h)) ( N n) [ ] ( K k)( N K n k ) e n ( k ( N n, K N ) n) Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
33 A New Transductive Bound A new function (Bégin et al. 2014) ( ) β (q, p) = kl(q, p) 1 β β kl p βq 1 β, p. Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
34 Bounds Values (adaboost iterates) Disagreement (test) Gibbs risk (test) Cbound (test) Maj. vote risk (test) Maj. vote risk (train) PAC 2*Gibbs bound PAC Cbound 1 PAC Cbound 2 PAC Cbound semisup. PAC CBound transductive Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
35 Plan 1 Introduction 2 PACBayesian Theory Majority Vote Classifiers A General PACBayesian Theorem Bounding the Majority Vote Risk 3 SemiSupervised Learning and Variations SemiSupervised Learning Transductive Learning Domain Adaptation 4 Conclusion Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
36 Domain Adaptation Assumption Source and target examples are generated by different distributions. S = { (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) } (D S ) n T = { (x 1, ), (x 2, ),..., (x n, ) } (D T ) n Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
37 Our Domain Adaptation Setting Assumption Source and target examples are generated by different distributions. S = { (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) } (D S ) n T = { (x 1, ), (x 2, ),..., (x n, ) } (D T ) n A domain adaptation learning algorithm is provided with a labeled source sample an unlabeled target sample The goal is to build a classifier with a low target risk R DT (h) Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
38 Generalization Bound Observation R D (G Q ) = 1 2 d D Q e D Q = 1 2 d D Q E E (x,y) D h i Q [ E I h j Q ] h i (x) y [ I h j (x) y PACBayesian DA Bound (Germain, Habrard, et al. 2016) For any prior P over H, any δ (0, 1], any real numbers b > 1 and c > 1, with a probability at least 1 δ over the choices of S (D S ) n and T (D T ) n, we have Q on H, R DT (G Q ) c 1 2 empirical target disagreement {}}{ d T Q b β (D T D S ) }{{} domain divergence Linear tradeoff between d T Q and ês Q : We consider β (D T D S ) = sup x D T (x) D S (x) empirical source joint error complexity term {}}{{}}{ êq S O ( ) KL(Q P) ln 1 δ. as a parameter to tune. Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34 ].
39 Learning algorithm for Linear Classifiers As many PACBayesian works (since Langford and ShaweTaylor 2002) : We consider the set H of all linear classifiers h v in X = R d : h v (x) = sgn(v x). Let Q w on H be a Gaussian distribution centered on w (with Σ=I d ): [ ] h w(x) = sgn E h v(x). v Q w Given T = {x i } n i=1 and S = {(x j, y j )} n j=1, find w Rd that minimizes: C d T Q ( ) C n i Φ w xi dis x i B êq S KL(Q w P 0 ). ) B n j (y Φerr w x j j x j 1 2 w Φ( ) Φ err ( ) Φ dis ( ) Margin y w x x losses Low density region on target Classification accuracy on source Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
40 Toy Experiment RBF kernel B = 1 C = Target rotation of 10 degrees Target rotation of 30 degrees Target rotation of 50 degrees Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
41 Empirical results on Amazon Dataset Linear kernel Hyperparameter selection by reverse crossvalidation svm dasvm coda PBDA DALC books DVDs books electro books kitchen DVDs books DVDs electro DVDs kitchen electro books electro DVDs electro kitchen kitchen books kitchen DVDs kitchen electro Average Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
42 Plan 1 Introduction 2 PACBayesian Theory Majority Vote Classifiers A General PACBayesian Theorem Bounding the Majority Vote Risk 3 SemiSupervised Learning and Variations SemiSupervised Learning Transductive Learning Domain Adaptation 4 Conclusion Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
43 Short Summary An original PACBayesian approach General theorem from which we recover existing results; Modular proof, easy to adapt to various frameworks (i.e., transductive learning). The virtue of disagreement Secondorder statistic from unlabeled sample (useful in semisupervised setting and variants); Allows to reduce the value of PACBayesian bounds (Cbound); Also used in a domain adaptation learning algorithm. Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
44 Some (self)pointers Our 74 pages journal paper (JMLR) Risk Bounds for the Majority Vote: From a PACBayesian Analysis to a Learning Algorithm (Germain, Lacasse, et al. 2015) My PhD thesis (in french) Généralisations de la théorie PACbayésienne pour l apprentissage inductif, l apprentissage transductif et l adaptation de domaine (Germain 2015) Recent papers ICML: A New PACBayesian Perspective on Domain Adaptation (Germain, Habrard, et al. 2016) AISTATS: PACBayesian Bounds Based on the Rényi Divergence (Bégin et al. 2016) NIPS: PACBayesian Theory Meets Bayesian Inference (Germain, Bach, et al. 2016) Even some code on my GitHub... Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 34
45 References I Alquier, Pierre, James Ridgway, and Nicolas Chopin (2015). On the properties of variational approximations of Gibbs posteriors. In: ArXiv eprints. url: Bégin, Luc, Pascal Germain, François Laviolette, and JeanFrancis Roy (2014). PACBayesian Theory for Transductive Learning. In: AISTATS. (2016). PACBayesian Bounds based on the Rényi Divergence. In: AISTATS. Catoni, Olivier (2007). PACBayesian supervised classification: the thermodynamics of statistical learning. Vol. 56. Inst. of Mathematical Statistic. Germain, Pascal (2015). Généralisations de la théorie PACbayésienne pour l apprentissage inductif, l apprentissage transductif et l adaptation de domaine. PhD thesis. Université Laval. url: Germain, Pascal, Francis Bach, Alexandre Lacoste, and Simon LacosteJulien (2016). PACBayesian Theory Meets Bayesian Inference. In: ArXiv eprints. url: Germain, Pascal, Amaury Habrard, François Laviolette, and Emilie Morvant (2016). A New PACBayesian Perspective on Domain Adaptation. In: ICML. url: Germain, Pascal, Alexandre Lacasse, Francois Laviolette, Mario Marchand, and JeanFrancis Roy (2015). Risk Bounds for the Majority Vote: From a PACBayesian Analysis to a Learning Algorithm. In: JMLR 16. Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 2
46 References II Langford, John and Matthias Seeger (2001). Bounds for averaging classifiers. Tech. rep. Carnegie Mellon, Departement of Computer Science. Langford, John and John ShaweTaylor (2002). PACBayes & Margins. In: NIPS. McAllester, David (1999). Some PACBayesian Theorems. In: Machine Learning (2003). PACBayesian Stochastic Model selection. In: Machine Learning Pascal Germain (INRIA/SIERRA) Generalization of the PACBayesian Theory January 24, / 2
Variations sur la borne PAC-bayésienne
Variations sur la borne PAC-bayésienne Pascal Germain INRIA Paris Équipe SIRRA Séminaires du département d informatique et de génie logiciel Université Laval 11 juillet 2016 Pascal Germain INRIA/SIRRA
More informationA tutorial on the Pac-Bayesian Theory. by François Laviolette
A tutorial on the Pac-Bayesian Theory NIPS workshop - (Almost) 50 shades of Bayesian Learning: PAC-Bayesian trends and insights by François Laviolette Laboratoire du GRAAL, Université Laval December 9th
More informationPAC-Bayesian Learning and Domain Adaptation
PAC-Bayesian Learning and Domain Adaptation Pascal Germain 1 François Laviolette 1 Amaury Habrard 2 Emilie Morvant 3 1 GRAAL Machine Learning Research Group Département d informatique et de génie logiciel
More informationPAC-Bayesian Generalization Bound for Multi-class Learning
PAC-Bayesian Generalization Bound for Multi-class Learning Loubna BENABBOU Department of Industrial Engineering Ecole Mohammadia d Ingènieurs Mohammed V University in Rabat, Morocco Benabbou@emi.ac.ma
More informationLa théorie PAC-Bayes en apprentissage supervisé
La théorie PAC-Bayes en apprentissage supervisé Présentation au LRI de l université Paris XI François Laviolette, Laboratoire du GRAAL, Université Laval, Québec, Canada 14 dcembre 2010 Summary Aujourd
More informationFrom PAC-Bayes Bounds to Quadratic Programs for Majority Votes
François Laviolette FrancoisLaviolette@iftulavalca Mario Marchand MarioMarchand@iftulavalca Jean-Francis Roy Jean-FrancisRoy1@ulavalca Département d informatique et de génie logiciel, Université Laval,
More informationDomain-Adversarial Neural Networks
Domain-Adversarial Neural Networks Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand Département d informatique et de génie logiciel, Université Laval, Québec, Canada Département
More informationPAC-Bayesian Bounds based on the Rényi Divergence
Luc Bégin Pascal Germain 2 François Laviolette 3 Jean-Francis Roy 3 lucbegin@umonctonca pascalgermain@inriafr {francoislaviolette, jean-francisroy}@iftulavalca Campus d dmundston, Université de Moncton,
More informationRisk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm
Journal of Machine Learning Research 16 2015 787-860 Submitted 5/13; Revised 9/14; Published 4/15 Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm Pascal Germain
More informationarxiv: v1 [stat.ml] 17 Jul 2017
PACBayes and Domain Adaptation arxiv:1707.05712v1 [stat.ml] 17 Jul 2017 Pascal Germain pascal.germain@inria.fr Département d informatique de l ENS, École normale supérieure, CNRS, PSL Research University,
More informationA Strongly Quasiconvex PAC-Bayesian Bound
A Strongly Quasiconvex PAC-Bayesian Bound Yevgeny Seldin NIPS-2017 Workshop on (Almost) 50 Shades of Bayesian Learning: PAC-Bayesian trends and insights Based on joint work with Niklas Thiemann, Christian
More informationA Column Generation Bound Minimization Approach with PAC-Bayesian Generalization Guarantees
A Column Generation Bound Minimization Approach with PAC-Bayesian Generalization Guarantees Jean-Francis Roy jean-francis.roy@ift.ulaval.ca Mario Marchand mario.marchand@ift.ulaval.ca François Laviolette
More informationPAC-Bayes Risk Bounds for Sample-Compressed Gibbs Classifiers
PAC-Bayes Ris Bounds for Sample-Compressed Gibbs Classifiers François Laviolette Francois.Laviolette@ift.ulaval.ca Mario Marchand Mario.Marchand@ift.ulaval.ca Département d informatique et de génie logiciel,
More informationPAC-Bayesian Analysis for a two-step Hierarchical Multiview Learning Approach
PAC-Bayesian Analysis for a two-step Hierarchical Multiview Learning Approach Anil Goyal, milie Morvant, Pascal Germain, Massih-Reza Amini To cite this version: Anil Goyal, milie Morvant, Pascal Germain,
More informationDomain Adaptation of Majority Votes via Perturbed Variation-based Label Transfer
Domain Adaptation of Majority Votes via Perturbed Variation-based Label Transfer Emilie Morvant To cite this version: Emilie Morvant. Domain Adaptation of Majority Votes via Perturbed Variation-based Label
More informationDeep Neural Networks: From Flat Minima to Numerically Nonvacuous Generalization Bounds via PAC-Bayes
Deep Neural Networks: From Flat Minima to Numerically Nonvacuous Generalization Bounds via PAC-Bayes Daniel M. Roy University of Toronto; Vector Institute Joint work with Gintarė K. Džiugaitė University
More informationMultiview Boosting by Controlling the Diversity and the Accuracy of View-specific Voters
Multiview Boosting by Controlling the Diversity and the Accuracy of View-specific Voters Anil Goyal 1,2 milie Morvant 1 Pascal Germain 3 Massih-Reza Amini 2 1 Univ Lyon, UJM-Saint-tienne, CNRS, Institut
More informationPAC-Bayesian Theory Meets Bayesian Inference
PAC-Bayesian Theory Meets Bayesian Inference Pascal Germain Francis Bach Alexandre Lacoste Simon Lacoste-Julien INRIA Paris - École Normale Supérieure, firstname.lastname@inria.fr Google, allac@google.com
More informationPAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification
PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification Emilie Morvant, Sokol Koço, Liva Ralaivola To cite this version: Emilie Morvant, Sokol Koço, Liva Ralaivola. PAC-Bayesian
More informationModel Averaging With Holdout Estimation of the Posterior Distribution
Model Averaging With Holdout stimation of the Posterior Distribution Alexandre Lacoste alexandre.lacoste.1@ulaval.ca François Laviolette francois.laviolette@ift.ulaval.ca Mario Marchand mario.marchand@ift.ulaval.ca
More informationTwo transfer learning approaches for domain adaptation
Two transfer learning approaches for domain adaptation Amaury Habrard amaury.habrard@univ-st-etienne.fr Laboratoire Hubert Curien, UMR CNRS 5516 University Jean Monnet - Saint-Étienne (France) Seminar,
More informationDomain adaptation of weighted majority votes via perturbed variation-based self-labeling
Domain adaptation of weighted majority votes via perturbed variation-based self-labeling Emilie Morvant To cite this version: Emilie Morvant. Domain adaptation of weighted majority votes via perturbed
More informationSparse Domain Adaptation in a Good Similarity-Based Projection Space
Sparse Domain Adaptation in a Good Similarity-Based Projection Space Emilie Morvant, Amaury Habrard, Stéphane Ayache To cite this version: Emilie Morvant, Amaury Habrard, Stéphane Ayache. Sparse Domain
More informationarxiv: v1 [stat.ml] 30 Oct 2018
Gaël Letarte gael.letarte.@ulaval.ca Emilie Morvant 2 emilie.morvant@univ-st-etienne.fr Pascal Germain 3 pascal.germain@inria.fr Département d informatique et de génie logiciel, Université Laval, Québec,
More information1-bit Matrix Completion. PAC-Bayes and Variational Approximation
: PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Junior Conference on Data Science 2016 Université Paris Saclay, 15-16 September 2016 Introduction: Matrix Completion
More information1-bit Matrix Completion. PAC-Bayes and Variational Approximation
: PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Bayes In Paris, 5 January 2017 (Happy New Year!) Various Topics covered Matrix Completion PAC-Bayesian Estimation Variational
More informationActive and Semi-supervised Kernel Classification
Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),
More informationFoundations of Machine Learning
Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about
More informationGeneralization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh
Generalization Bounds in Machine Learning Presented by: Afshin Rostamizadeh Outline Introduction to generalization bounds. Examples: VC-bounds Covering Number bounds Rademacher bounds Stability bounds
More informationGeneralization Bounds
Generalization Bounds Here we consider the problem of learning from binary labels. We assume training data D = x 1, y 1,... x N, y N with y t being one of the two values 1 or 1. We will assume that these
More informationIntroduction to Statistical Learning Theory
Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated
More informationA New PAC-Bayesian Perspective on Domain Adaptation
A New PAC-Bayesian Perspective on Domain Adaptation Pascal Germain, Amaury Habrard, François Laviolette, milie Morvant To cite this version: Pascal Germain, Amaury Habrard, François Laviolette, milie Morvant.
More informationSample and Computationally Efficient Active Learning. Maria-Florina Balcan Carnegie Mellon University
Sample and Computationally Efficient Active Learning Maria-Florina Balcan Carnegie Mellon University Machine Learning is Shaping the World Highly successful discipline with lots of applications. Computational
More informationLearning with Rejection
Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 2,1 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Mathematical Sciences, 251 Mercer Street, New York,
More informationA Pseudo-Boolean Set Covering Machine
A Pseudo-Boolean Set Covering Machine Pascal Germain, Sébastien Giguère, Jean-Francis Roy, Brice Zirakiza, François Laviolette, and Claude-Guy Quimper Département d informatique et de génie logiciel, Université
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationLearning a Multiview Weighted Majority Vote Classifier: Using PAC-Bayesian Theory and Boosting
Learning a Multiview Weighted Majority Vote Classifier: Using PAC-Bayesian Theory and Boosting Anil Goyal To cite this version: Anil Goyal. Learning a Multiview Weighted Majority Vote Classifier: Using
More informationGeneralization, Overfitting, and Model Selection
Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How
More informationDoes Unlabeled Data Help?
Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline
More informationPAC-Bayes Generalization Bounds for Randomized Structured Prediction
PAC-Bayes Generalization Bounds for Randomized Structured Prediction Ben London University of Maryland blondon@cs.umd.edu Ben Taskar University of Washington taskar@cs.washington.edu Bert Huang University
More informationWhat is semi-supervised learning?
What is semi-supervised learning? In many practical learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate text processing, video-indexing,
More informationEfficient Semi-supervised and Active Learning of Disjunctions
Maria-Florina Balcan ninamf@cc.gatech.edu Christopher Berlind cberlind@gatech.edu Steven Ehrlich sehrlich@cc.gatech.edu Yingyu Liang yliang39@gatech.edu School of Computer Science, College of Computing,
More informationGeneralization bounds
Advanced Course in Machine Learning pring 200 Generalization bounds Handouts are jointly prepared by hie Mannor and hai halev-hwartz he problem of characterizing learnability is the most basic question
More informationLearning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14
Learning Theory Piyush Rai CS5350/6350: Machine Learning September 27, 2011 (CS5350/6350) Learning Theory September 27, 2011 1 / 14 Why Learning Theory? We want to have theoretical guarantees about our
More informationA Pseudo-Boolean Set Covering Machine
A Pseudo-Boolean Set Covering Machine Pascal Germain, Sébastien Giguère, Jean-Francis Roy, Brice Zirakiza, François Laviolette, and Claude-Guy Quimper GRAAL (Université Laval, Québec city) October 9, 2012
More informationTTIC 31230, Fundamentals of Deep Learning David McAllester, Winter Generalization and Regularization
TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter 2019 Generalization and Regularization 1 Chomsky vs. Kolmogorov and Hinton Noam Chomsky: Natural language grammar cannot be learned by
More informationJoint distribution optimal transportation for domain adaptation
Joint distribution optimal transportation for domain adaptation Changhuang Wan Mechanical and Aerospace Engineering Department The Ohio State University March 8 th, 2018 Joint distribution optimal transportation
More information1 Active Learning Foundations of Machine Learning and Data Science. Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015
10-806 Foundations of Machine Learning and Data Science Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015 1 Active Learning Most classic machine learning methods and the formal learning
More informationDomain Adaptation for Regression
Domain Adaptation for Regression Corinna Cortes Google Research corinna@google.com Mehryar Mohri Courant Institute and Google mohri@cims.nyu.edu Motivation Applications: distinct training and test distributions.
More informationSupervised Metric Learning with Generalization Guarantees
Supervised Metric Learning with Generalization Guarantees Aurélien Bellet Laboratoire Hubert Curien, Université de Saint-Etienne, Université de Lyon Reviewers: Pierre Dupont (UC Louvain) and Jose Oncina
More informationIntroduction to Statistical Learning Theory
Introduction to Statistical Learning Theory Definition Reminder: We are given m samples {(x i, y i )} m i=1 Dm and a hypothesis space H and we wish to return h H minimizing L D (h) = E[l(h(x), y)]. Problem
More informationStochastic optimization in Hilbert spaces
Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic optimization Hilbert spaces 1 / 48 Outline Learning vs Statistics Aymeric Dieuleveut Stochastic optimization Hilbert
More informationImproved Algorithms for Confidence-Rated Prediction with Error Guarantees
Improved Algorithms for Confidence-Rated Prediction with Error Guarantees Kamalika Chaudhuri Chicheng Zhang Department of Computer Science and Engineering University of California, San Diego 95 Gilman
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning
More informationDomain-Adversarial Neural Networks
Doain-Adversarial Neural Networks Hana Ajakan, Pascal Gerain 2, Hugo Larochelle 3, François Laviolette 2, Mario Marchand 2,2 Départeent d inforatique et de génie logiciel, Université Laval, Québec, Canada
More informationThe Median Hypothesis
The Median Hypothesis Ran Gilad-Bachrach Chris J. C. Burges Microsoft Research Technical Report: MSR-TR-202-56 (May 202) Abstract A typical learning process begins with some prior beliefs about the target
More informationSupport Vector Machines (SVMs).
Support Vector Machines (SVMs). SemiSupervised Learning. SemiSupervised SVMs. MariaFlorina Balcan 3/25/215 Support Vector Machines (SVMs). One of the most theoretically well motivated and practically most
More informationIntroduction. Chapter 1
Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics
More informationIntroduction to Machine Learning. Introduction to ML - TAU 2016/7 1
Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger
More informationAdvanced Introduction to Machine Learning CMU-10715
Advanced Introduction to Machine Learning CMU-10715 Risk Minimization Barnabás Póczos What have we seen so far? Several classification & regression algorithms seem to work fine on training datasets: Linear
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II
1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector
More informationA Comparison of Tight Generalization Error Bounds
Matti Kääriäinen matti.kaariainen@cs.helsinki.fi International Computer Science Institute, 1947 Center St Suite 600, Berkeley, CA 94704, USA John Langford jl@tti-c.org TTI-Chicago, 1427 E 60th Street,
More informationTowards Lifelong Machine Learning Multi-Task and Lifelong Learning with Unlabeled Tasks Christoph Lampert
Towards Lifelong Machine Learning Multi-Task and Lifelong Learning with Unlabeled Tasks Christoph Lampert HSE Computer Science Colloquium September 6, 2016 IST Austria (Institute of Science and Technology
More informationA PAC-Bayes Bound for Tailored Density Estimation
A PAC-Bayes Bound for Tailored Density Estimation Matthew Higgs and John Shawe-Taylor Center for Computational Statistics and Machine Learning, University College London {mhiggs,jst}@cs.ucl.ac.uk Abstract.
More informationSynthesis of Maximum Margin and Multiview Learning using Unlabeled Data
Synthesis of Maximum Margin and Multiview Learning using Unlabeled Data Sandor Szedma 1 and John Shawe-Taylor 1 1 - Electronics and Computer Science, ISIS Group University of Southampton, SO17 1BJ, United
More informationA Tight Excess Risk Bound via a Unified PAC-Bayesian- Rademacher-Shtarkov-MDL Complexity
A Tight Excess Risk Bound via a Unified PAC-Bayesian- Rademacher-Shtarkov-MDL Complexity Peter Grünwald Centrum Wiskunde & Informatica Amsterdam Mathematical Institute Leiden University Joint work with
More informationMachine Learning. Ensemble Methods. Manfred Huber
Machine Learning Ensemble Methods Manfred Huber 2015 1 Bias, Variance, Noise Classification errors have different sources Choice of hypothesis space and algorithm Training set Noise in the data The expected
More informationConsistency of Nearest Neighbor Methods
E0 370 Statistical Learning Theory Lecture 16 Oct 25, 2011 Consistency of Nearest Neighbor Methods Lecturer: Shivani Agarwal Scribe: Arun Rajkumar 1 Introduction In this lecture we return to the study
More informationMachine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /
Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationNumerical Learning Algorithms
Numerical Learning Algorithms Example SVM for Separable Examples.......................... Example SVM for Nonseparable Examples....................... 4 Example Gaussian Kernel SVM...............................
More informationMachine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015
Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationarxiv: v3 [stat.ml] 9 Aug 2016
with Specialization to Linear Classifiers Pascal Gerain Aaury Habrard François Laviolette 3 ilie Morvant INRIA, SIRRA Project-Tea, 75589 Paris, France, et DI, École Norale Supérieure, 7530 Paris, France
More informationIFT Lecture 7 Elements of statistical learning theory
IFT 6085 - Lecture 7 Elements of statistical learning theory This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s): Brady Neal and
More informationCOMS 4771 Lecture Boosting 1 / 16
COMS 4771 Lecture 12 1. Boosting 1 / 16 Boosting What is boosting? Boosting: Using a learning algorithm that provides rough rules-of-thumb to construct a very accurate predictor. 3 / 16 What is boosting?
More informationLecture 21: Minimax Theory
Lecture : Minimax Theory Akshay Krishnamurthy akshay@cs.umass.edu November 8, 07 Recap In the first part of the course, we spent the majority of our time studying risk minimization. We found many ways
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationActive learning. Sanjoy Dasgupta. University of California, San Diego
Active learning Sanjoy Dasgupta University of California, San Diego Exploiting unlabeled data A lot of unlabeled data is plentiful and cheap, eg. documents off the web speech samples images and video But
More informationIIwl1 2 w XII = x - XT
shorter argument and much tighter than previous margin bounds. There are two mathematical flavors of margin bound dependent upon the weights Wi of the vote and the features Xi that the vote is taken over.
More informationLearning with multiple models. Boosting.
CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 17: Stochastic Optimization Part II: Realizable vs Agnostic Rates Part III: Nearest Neighbor Classification Stochastic
More informationPAC Generalization Bounds for Co-training
PAC Generalization Bounds for Co-training Sanjoy Dasgupta AT&T Labs Research dasgupta@research.att.com Michael L. Littman AT&T Labs Research mlittman@research.att.com David McAllester AT&T Labs Research
More informationLearning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31
Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking Dengyong Zhou zhou@tuebingen.mpg.de Dept. Schölkopf, Max Planck Institute for Biological Cybernetics, Germany Learning from
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationSharp Generalization Error Bounds for Randomly-projected Classifiers
Sharp Generalization Error Bounds for Randomly-projected Classifiers R.J. Durrant and A. Kabán School of Computer Science The University of Birmingham Birmingham B15 2TT, UK http://www.cs.bham.ac.uk/ axk
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory Problem set 1 Due: Monday, October 10th Please send your solutions to learning-submissions@ttic.edu Notation: Input space: X Label space: Y = {±1} Sample:
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 4: MDL and PAC-Bayes Uniform vs Non-Uniform Bias No Free Lunch: we need some inductive bias Limiting attention to hypothesis
More informationThe sample complexity of agnostic learning with deterministic labels
The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College
More informationComputational Learning Theory
Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390 Introduction For the analysis of data structures and algorithms
More informationLarge-Scale Feature Learning with Spike-and-Slab Sparse Coding
Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab
More informationClassical Predictive Models
Laplace Max-margin Markov Networks Recent Advances in Learning SPARSE Structured I/O Models: models, algorithms, and applications Eric Xing epxing@cs.cmu.edu Machine Learning Dept./Language Technology
More informationCS7267 MACHINE LEARNING
CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori
More informationSelective Prediction. Binary classifications. Rong Zhou November 8, 2017
Selective Prediction Binary classifications Rong Zhou November 8, 2017 Table of contents 1. What are selective classifiers? 2. The Realizable Setting 3. The Noisy Setting 1 What are selective classifiers?
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationLecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;
CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and
More informationConvergence Rates of Kernel Quadrature Rules
Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction
More information