A Strongly Quasiconvex PAC-Bayesian Bound

Size: px
Start display at page:

Download "A Strongly Quasiconvex PAC-Bayesian Bound"

Transcription

1 A Strongly Quasiconvex PAC-Bayesian Bound Yevgeny Seldin NIPS-2017 Workshop on (Almost) 50 Shades of Bayesian Learning: PAC-Bayesian trends and insights Based on joint work with Niklas Thiemann, Christian Igel, and Olivier Wintenberger, ALT 2017

2 Quick Summary Two major ways to convexify classification with 0-1 loss Convexify the loss Work in the space of distributions over H (PAC-Bayes)

3 Quick Summary Two major ways to convexify classification with 0-1 loss Convexify the loss Work in the space of distributions over H (PAC-Bayes) We propose A relaxation of the PAC-Bayes-kl bound (Seeger, 2002) and an alternating minimization procedure Sufficient conditions for strong quasiconvexity of the bound which guarantee convergence to the global minimum Construction of a hypothesis space tailored for the bound In our experiments rigorous minimization of the bound was competitive with cross-validation in tuning the trade-off between complexity and empirical performance

4 Outline A Very Quick Recap of PAC-Bayesian Analysis A Strongly Quasiconvex PAC-Bayesian Bound Construction of a Hypothesis Set Experiments

5 Randomized Classifiers Let ρ be a distribution over H Randomized Classifiers At each round of the game: 1. Pick h H according to ρ(h) 2. Observe x 3. Return h(x)

6 Randomized Classifiers Let ρ be a distribution over H Randomized Classifiers At each round of the game: 1. Pick h H according to ρ(h) 2. Observe x 3. Return h(x) Expected loss of ρ E h ρ [L(h) = E ρ [L(h) Empirical loss of ρ on a sample S E h ρ [ˆL(h, S) = E ρ [ˆL(h, S)

7 Approximation-Estimation Perspective (Bias-Variance)

8 Approximation-Estimation Perspective (Bias-Variance) Randomized Classification Avoid selection when not necessary If ˆL(h, S) ˆL(h, S) and π(h) π(h ), take ρ(h) ρ(h ) Reduced variance at the same bias level

9 Kullback-Leibler (KL) divergence = Relative Entropy KL divergence Let ρ and π be two distributions over H [ KL(ρ π) = E ρ ln ρ π Binary kl divergence For two Bernoulli random variables with biases p and q kl(p q) = KL([p, 1 p [q, 1 q)

10 PAC-Bayes-kl Inequality Theorem (Seeger, 2002) For any prior π over H and any (0, 1), with probability greater than 1 over a random draw of a sample S, for all distributions ρ over H simultaneously: ( ) Eρ kl E ρ [ˆL(h, S) [L(h) KL(ρ π) + ln 2 n. n

11 PAC-Bayes-kl Inequality Theorem (Seeger, 2002) For any prior π over H and any (0, 1), with probability greater than 1 over a random draw of a sample S, for all distributions ρ over H simultaneously: ( ) Eρ kl E ρ [ˆL(h, S) [L(h) KL(ρ π) + ln 2 n n. Challenge The bound is not convex in ρ Common heuristic: replace with a parametrized tradeoff βne ρ [ˆL(h, S) + KL(ρ π) and tune β by cross-validation

12 Outline A Very Quick Recap of PAC-Bayesian Analysis A Strongly Quasiconvex PAC-Bayesian Bound Construction of a Hypothesis Set Experiments

13 Relaxation of PAC-Bayes-kl Based on refined Pinsker s inequality Theorem (PAC-Bayes-λ Inequality) For any prior π and any (0, 1), with probability greater than 1, for all ρ and λ (0, 2) simultaneously: E ρ [ˆL(h, S) 2 KL(ρ π) + ln n E ρ [L(h) 1 λ + 2 λ ( 1 λ ). 2 n

14 Relaxation of PAC-Bayes-kl Based on refined Pinsker s inequality Theorem (PAC-Bayes-λ Inequality) For any prior π and any (0, 1), with probability greater than 1, for all ρ and λ (0, 2) simultaneously: E ρ [ˆL(h, S) 2 KL(ρ π) + ln n E ρ [L(h) 1 λ + 2 λ ( 1 λ ). 2 n For the optimal λ this leads to ( ) 2E ρ [ˆL(h, S) KL(ρ π) + ln E ρ [L(h) E ρ [ˆL(h, S) + 2 n n Fast convergence rate ( 2 + KL(ρ π) + ln 2 n n )

15 Alternating Minimization of PAC-Bayes-λ E ρ [ˆL(h, S) E ρ [L(h) 1 λ KL(ρ π) + ln n λ ( 1 λ ) 2 n } {{ } F(ρ,λ)

16 Alternating Minimization of PAC-Bayes-λ E ρ [ˆL(h, S) E ρ [L(h) 1 λ KL(ρ π) + ln n λ ( 1 λ ) 2 n } {{ } F(ρ,λ) For a fixed λ the bound is convex in ρ and minimized by π(h)e λnˆl(h,s) ρ λ (h) = E π [e λnˆl(h,s)

17 Alternating Minimization of PAC-Bayes-λ E ρ [ˆL(h, S) E ρ [L(h) 1 λ KL(ρ π) + ln n λ ( 1 λ ) 2 n } {{ } F(ρ,λ) For a fixed λ the bound is convex in ρ and minimized by π(h)e λnˆl(h,s) ρ λ (h) = E π [e λnˆl(h,s) For a fixed ρ the bound is convex in λ and minimized by λ = 2 2nE ρ[ˆl(h,s) KL(ρ π)+ln 2 n

18 Alternating Minimization of PAC-Bayes-λ E ρ [ˆL(h, S) E ρ [L(h) 1 λ KL(ρ π) + ln n λ ( 1 λ ) 2 n } {{ } F(ρ,λ) For a fixed λ the bound is convex in ρ and minimized by π(h)e λnˆl(h,s) ρ λ (h) = E π [e λnˆl(h,s) For a fixed ρ the bound is convex in λ and minimized by λ = 2 2nE ρ[ˆl(h,s) KL(ρ π)+ln 2 n F(ρ, λ) is not necessarily jointly convex in ρ and λ

19 Simplification 1 F(ρ, λ) = E ρ [ˆL(h, S) 1 λ KL(ρ π) + ln n λ ( 1 λ ) 2 n π(h)e λnˆl(h,s) ρ λ (h) = E π [e λnˆl(h,s)

20 Simplification 1 F(ρ, λ) = E ρ [ˆL(h, S) 1 λ KL(ρ π) + ln n λ ( 1 λ ) 2 n π(h)e λnˆl(h,s) ρ λ (h) = E π [e λnˆl(h,s) E ρλ [ˆL(h, S) F(λ) = F(ρ λ, λ) = 1 λ 2 + KL(ρ λ π) + ln 2 n λ ( 1 λ 2 ) n

21 Simplification 1 F(ρ, λ) = E ρ [ˆL(h, S) 1 λ KL(ρ π) + ln n λ ( 1 λ ) 2 n π(h)e λnˆl(h,s) ρ λ (h) = E π [e λnˆl(h,s) E ρλ [ˆL(h, S) F(λ) = F(ρ λ, λ) = 1 λ 2 + KL(ρ λ π) + ln 2 n λ ( 1 λ ) 2 n One-dimensional function

22 Simplification 2 F(λ) = F(ρ λ, λ) = E ρλ [ˆL(h, S) 1 λ 2 + KL(ρ λ π) + ln 2 n λ ( 1 λ ) 2 n π(h)e λnˆl(h,s) ρ λ (h) = E π [e λnˆl(h,s)

23 Simplification 2 F(λ) = F(ρ λ, λ) = E ρλ [ˆL(h, S) 1 λ 2 + KL(ρ λ π) + ln 2 n λ ( 1 λ ) 2 n π(h)e λnˆl(h,s) ρ λ (h) = E π [e λnˆl(h,s) [ KL(ρ λ π) = E ρλ ln ρ λ(h) = E ρλ ln π(h) e nλˆl(h,s) E π [e nλˆl(h,s) = nλe ρλ [ˆL(h, S) ln E π [ e nλˆl(h,s)

24 Simplification 2 F(λ) = F(ρ λ, λ) = E ρλ [ˆL(h, S) 1 λ 2 + KL(ρ λ π) + ln 2 n λ ( 1 λ ) 2 n π(h)e λnˆl(h,s) ρ λ (h) = E π [e λnˆl(h,s) [ KL(ρ λ π) = E ρλ ln ρ λ(h) = E ρλ ln π(h) e nλˆl(h,s) E π [e nλˆl(h,s) = nλe ρλ [ˆL(h, S) ln E π [ e nλˆl(h,s) F(λ) = ln E π [e nλˆl(h,s) + ln 2 n nλ(1 λ/2)

25 Strong Quasiconvexity - Sufficient Condition Theorem (Strong Quasiconvexity) If at least one of the two conditions or is satisfied for all λ 2 KL(ρ λ π) + ln 4n 2 > λ2 n 2 Var ρλ [ˆL(h, S) E ρλ [ˆL(h, S) > (1 λ)nvar ρλ [ˆL(h, S) [ ln 2 n n, 1, then F(λ) is strongly quasiconvex for λ (0, 1 and alternating minimization converges to the global minimum of F.

26 Proof Highlights F(λ) = ln E π [e nλˆl(h,s) + ln 2 n nλ(1 λ/2) Show that the second derivative of F(λ) is positive at all stationary points 1 d ln E π [e nλ ˆL(h,S) n dλ = E ρλ [ˆL(h, S) 1 d 2 ln E π [e nλ ˆL(h,S) n = nvar dλ 2 ρλ [ˆL(h, S)

27 Weak Separation Sufficient Condition for Strong Quasiconvexity

28 Weak Separation Sufficient Condition for Strong Quasiconvexity Theorem (Weak Separation) Let H be finite with H = m and π(h) uniform. Let a = ln(3mn) n ln 2 n and b. If the number of hypotheses for which ˆL(h, S) (ˆL(h, S) + a, ˆL(h ), S) + b is at most e2 4n 12 ln 2 F(λ) is strongly quasiconvex and alternating minimization converges to the global minimum. ln 4n 2 n 3 then

29 Proof Highlights By the Strong Quasiconvexity Theorem, if Var ρλ [ˆL(h, S) is small then F(λ) is strongly quasiconvex Let h = ˆL(h, S) ˆL(h, S) [ Var ρλ [ˆL(h, S) E ρλ 2 h = ρ λ (h) 2 h = h h 2 h e nλ h / h e nλ h

30 Breaking the Quasiconvexity It is possible to break the quasiconvexity but one has to work hard for it

31 Breaking the Quasiconvexity It is possible to break the quasiconvexity but one has to work hard for it For example, taking n = 200, = 0.25, m = , h = 0.1 and uniform π breaks it

32 Breaking the Quasiconvexity It is possible to break the quasiconvexity but one has to work hard for it For example, taking n = 200, = 0.25, m = , h = 0.1 and uniform π breaks it In all our experiments F(λ) was convex even when the weak separation sufficient condition was violated So it might be possible to relax the sufficient condition further

33 Outline A Very Quick Recap of PAC-Bayesian Analysis A Strongly Quasiconvex PAC-Bayesian Bound Construction of a Hypothesis Set Experiments

34 Challenge Computation of the normalization of ρ λ can be prohibitively expensive π(h)e λnˆl(h,s) ρ λ (h) = E π [e λnˆl(h,s)

35 Challenge Computation of the normalization of ρ λ can be prohibitively expensive π(h)e λnˆl(h,s) ρ λ (h) = E π [e λnˆl(h,s) Parametrization of ρ may break the convexity

36 Challenge Computation of the normalization of ρ λ can be prohibitively expensive π(h)e λnˆl(h,s) ρ λ (h) = = E π [e λnˆl(h,s) Parametrization of ρ may break the convexity Solution Work with finite H We need a powerful finite H π(h)e λnˆl(h,s) h π(h )e λnˆl(h,s)

37 Construction of a finite sample-dependent H Select m = H subsamples of r points each Train a model h on r points and validate on n r points Validation loss: ˆLval (h)

38 Construction of a finite sample-dependent H Select m = H subsamples of r points each Train a model h on r points and validate on n r points Validation loss: ˆLval (h) Adapted Bound E ρ [L(h) [ˆLval E ρ (h, S) 1 λ 2 n r+1 KL(ρ π) + ln + (n r)λ ( 1 λ ) 2

39 Construction of a finite sample-dependent H Select m = H subsamples of r points each Train a model h on r points and validate on n r points Validation loss: ˆLval (h) Adapted Bound E ρ [L(h) [ˆLval E ρ (h, S) 1 λ 2 n r+1 KL(ρ π) + ln + (n r)λ ( 1 λ ) 2 Special Case: k-fold cross-validation Most computational advantage is achieved by inverse CV

40 Outline A Very Quick Recap of PAC-Bayesian Analysis A Strongly Quasiconvex PAC-Bayesian Bound Construction of a Hypothesis Set Experiments

41 Experiments We compare Kernel-SVM trained by cross-validation ρ-weighting of multiple weak SVMs trained on d + 1 samples

42 Experiments We compare Kernel-SVM trained by cross-validation ρ-weighting of multiple weak SVMs trained on d + 1 samples * More precisely, we apply ρ-weighted aggregation ( ) MV ρ (x) = sign ρ(h)h(x) but in our case there was no significant difference between L(MV ρ ) and E ρ [L(h) h

43 Rough Runtime Comparison k-fold cross-validation of kernel SVMs k }{{} n 2+ + V }{{} kn 2+ training validation

44 Rough Runtime Comparison k-fold cross-validation of kernel SVMs k }{{} n 2+ + V }{{} kn 2+ training validation PAC-Bayesian aggregation of kernel SVMs For r = d + 1 and m = n: m r 2+ }{{} training + rn }{{} validation + A }{{} aggregation mrn dn 2

45 Rough Runtime Comparison k-fold cross-validation of kernel SVMs k }{{} n 2+ + V }{{} kn 2+ training validation PAC-Bayesian aggregation of kernel SVMs For r = d + 1 and m = n: m r 2+ }{{} training + rn }{{} validation + A }{{} aggregation mrn dn 2 Computational Speed-up!

46 Experiments (a) Ionosphere n = 200, r = d + 1 = 35. (b) Waveform n = 2000, r = d + 1 = 41. (c) Breast cancer n = 340, r = d + 1 = 11. (d) AvsB n = 1000, r = d + 1 = 17.

47 Summary We proposed A relaxation of the PAC-Bayes-kl bound (Seeger, 2002) An alternating minimization procedure Sufficient conditions for strong quasiconvexity which guarantee convergence to the global minimum Construction of H In our experiments rigorous minimization of the bound was competitive with cross-validation in tuning the trade-off between complexity and empirical performance

48 Summary We proposed A relaxation of the PAC-Bayes-kl bound (Seeger, 2002) An alternating minimization procedure Sufficient conditions for strong quasiconvexity which guarantee convergence to the global minimum Construction of H In our experiments rigorous minimization of the bound was competitive with cross-validation in tuning the trade-off between complexity and empirical performance Rigorous minimization of a theoretical bound competitive with cross-validation!

49 What s next? Improved Sufficient Conditions In practice the bound was strongly convex even when the weak separation sufficient condition was violated. Relax the sufficient condition We have dropped some terms when going from the Strong Quasiconvexity Theorem to the Weak Separation Condition

50 Strong Quasiconvexity - Sufficient Condition Theorem (Strong Quasiconvexity) If at least one of the two conditions or is satisfied for all λ 2 KL(ρ λ π) + ln 4n 2 > λ2 n 2 Var ρλ [ˆL(h, S) E ρλ [ˆL(h, S) > (1 λ)nvar ρλ [ˆL(h, S) [ ln 2 n n, 1, then F(λ) is strongly quasiconvex for λ (0, 1 and alternating minimization converges to the global minimum of F.

51 What s next? Improved Sufficient Conditions In practice the bound was strongly convex even when the weak separation sufficient condition was violated. Relax the sufficient condition We have dropped some terms when going from the Strong Quasiconvexity Theorem to the Weak Separation Condition

52 What s next? Improved Sufficient Conditions In practice the bound was strongly convex even when the weak separation sufficient condition was violated. Relax the sufficient condition We have dropped some terms when going from the Strong Quasiconvexity Theorem to the Weak Separation Condition Improved Analysis of the Weighted Majority Vote Combine the results with improved analysis of weighted majority vote (the C-bound ) Lacasse, Laviolette, Marchand, Germain, and Usunier, NIPS, 2007 Laviolette, Marchand, Roy, ICML, 2011 Germain, Lacasse, Laviolette, Marchand, Roy, JMLR, 2015

PAC-Bayesian Generalization Bound for Multi-class Learning

PAC-Bayesian Generalization Bound for Multi-class Learning PAC-Bayesian Generalization Bound for Multi-class Learning Loubna BENABBOU Department of Industrial Engineering Ecole Mohammadia d Ingènieurs Mohammed V University in Rabat, Morocco Benabbou@emi.ac.ma

More information

Generalization of the PAC-Bayesian Theory

Generalization of the PAC-Bayesian Theory Generalization of the PACBayesian Theory and Applications to SemiSupervised Learning Pascal Germain INRIA Paris (SIERRA Team) Modal Seminar INRIA Lille January 24, 2017 Dans la vie, l essentiel est de

More information

PAC-Bayesian Learning and Domain Adaptation

PAC-Bayesian Learning and Domain Adaptation PAC-Bayesian Learning and Domain Adaptation Pascal Germain 1 François Laviolette 1 Amaury Habrard 2 Emilie Morvant 3 1 GRAAL Machine Learning Research Group Département d informatique et de génie logiciel

More information

Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm

Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm Journal of Machine Learning Research 16 2015 787-860 Submitted 5/13; Revised 9/14; Published 4/15 Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm Pascal Germain

More information

Introduction to Statistical Learning Theory

Introduction to Statistical Learning Theory Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated

More information

Variations sur la borne PAC-bayésienne

Variations sur la borne PAC-bayésienne Variations sur la borne PAC-bayésienne Pascal Germain INRIA Paris Équipe SIRRA Séminaires du département d informatique et de génie logiciel Université Laval 11 juillet 2016 Pascal Germain INRIA/SIRRA

More information

PAC-Bayesian Analysis for a two-step Hierarchical Multiview Learning Approach

PAC-Bayesian Analysis for a two-step Hierarchical Multiview Learning Approach PAC-Bayesian Analysis for a two-step Hierarchical Multiview Learning Approach Anil Goyal, milie Morvant, Pascal Germain, Massih-Reza Amini To cite this version: Anil Goyal, milie Morvant, Pascal Germain,

More information

Generalization bounds

Generalization bounds Advanced Course in Machine Learning pring 200 Generalization bounds Handouts are jointly prepared by hie Mannor and hai halev-hwartz he problem of characterizing learnability is the most basic question

More information

Lecture 21: Minimax Theory

Lecture 21: Minimax Theory Lecture : Minimax Theory Akshay Krishnamurthy akshay@cs.umass.edu November 8, 07 Recap In the first part of the course, we spent the majority of our time studying risk minimization. We found many ways

More information

La théorie PAC-Bayes en apprentissage supervisé

La théorie PAC-Bayes en apprentissage supervisé La théorie PAC-Bayes en apprentissage supervisé Présentation au LRI de l université Paris XI François Laviolette, Laboratoire du GRAAL, Université Laval, Québec, Canada 14 dcembre 2010 Summary Aujourd

More information

From PAC-Bayes Bounds to Quadratic Programs for Majority Votes

From PAC-Bayes Bounds to Quadratic Programs for Majority Votes François Laviolette FrancoisLaviolette@iftulavalca Mario Marchand MarioMarchand@iftulavalca Jean-Francis Roy Jean-FrancisRoy1@ulavalca Département d informatique et de génie logiciel, Université Laval,

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Evaluation. Andrea Passerini Machine Learning. Evaluation

Evaluation. Andrea Passerini Machine Learning. Evaluation Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

More information

Learning with Rejection

Learning with Rejection Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 2,1 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Mathematical Sciences, 251 Mercer Street, New York,

More information

Stephen Scott.

Stephen Scott. 1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

PAC-Bayes Risk Bounds for Sample-Compressed Gibbs Classifiers

PAC-Bayes Risk Bounds for Sample-Compressed Gibbs Classifiers PAC-Bayes Ris Bounds for Sample-Compressed Gibbs Classifiers François Laviolette Francois.Laviolette@ift.ulaval.ca Mario Marchand Mario.Marchand@ift.ulaval.ca Département d informatique et de génie logiciel,

More information

A PAC-Bayes Bound for Tailored Density Estimation

A PAC-Bayes Bound for Tailored Density Estimation A PAC-Bayes Bound for Tailored Density Estimation Matthew Higgs and John Shawe-Taylor Center for Computational Statistics and Machine Learning, University College London {mhiggs,jst}@cs.ucl.ac.uk Abstract.

More information

A Column Generation Bound Minimization Approach with PAC-Bayesian Generalization Guarantees

A Column Generation Bound Minimization Approach with PAC-Bayesian Generalization Guarantees A Column Generation Bound Minimization Approach with PAC-Bayesian Generalization Guarantees Jean-Francis Roy jean-francis.roy@ift.ulaval.ca Mario Marchand mario.marchand@ift.ulaval.ca François Laviolette

More information

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18 CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)

More information

Generalization Bounds

Generalization Bounds Generalization Bounds Here we consider the problem of learning from binary labels. We assume training data D = x 1, y 1,... x N, y N with y t being one of the two values 1 or 1. We will assume that these

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Evaluation requires to define performance measures to be optimized

Evaluation requires to define performance measures to be optimized Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation

More information

PAC-Bayes Generalization Bounds for Randomized Structured Prediction

PAC-Bayes Generalization Bounds for Randomized Structured Prediction PAC-Bayes Generalization Bounds for Randomized Structured Prediction Ben London University of Maryland blondon@cs.umd.edu Ben Taskar University of Washington taskar@cs.washington.edu Bert Huang University

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Max-margin learning of GM Eric Xing Lecture 28, Apr 28, 2014 b r a c e Reading: 1 Classical Predictive Models Input and output space: Predictive

More information

Machine Learning. Model Selection and Validation. Fabio Vandin November 7, 2017

Machine Learning. Model Selection and Validation. Fabio Vandin November 7, 2017 Machine Learning Model Selection and Validation Fabio Vandin November 7, 2017 1 Model Selection When we have to solve a machine learning task: there are different algorithms/classes algorithms have parameters

More information

Understanding Generalization Error: Bounds and Decompositions

Understanding Generalization Error: Bounds and Decompositions CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the

More information

Introduction and Models

Introduction and Models CSE522, Winter 2011, Learning Theory Lecture 1 and 2-01/04/2011, 01/06/2011 Lecturer: Ofer Dekel Introduction and Models Scribe: Jessica Chang Machine learning algorithms have emerged as the dominant and

More information

arxiv: v1 [stat.ml] 17 Jul 2017

arxiv: v1 [stat.ml] 17 Jul 2017 PACBayes and Domain Adaptation arxiv:1707.05712v1 [stat.ml] 17 Jul 2017 Pascal Germain pascal.germain@inria.fr Département d informatique de l ENS, École normale supérieure, CNRS, PSL Research University,

More information

TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter Generalization and Regularization

TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter Generalization and Regularization TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter 2019 Generalization and Regularization 1 Chomsky vs. Kolmogorov and Hinton Noam Chomsky: Natural language grammar cannot be learned by

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

Hands-On Learning Theory Fall 2016, Lecture 3

Hands-On Learning Theory Fall 2016, Lecture 3 Hands-On Learning Theory Fall 016, Lecture 3 Jean Honorio jhonorio@purdue.edu 1 Information Theory First, we provide some information theory background. Definition 3.1 (Entropy). The entropy of a discrete

More information

PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification

PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification Emilie Morvant, Sokol Koço, Liva Ralaivola To cite this version: Emilie Morvant, Sokol Koço, Liva Ralaivola. PAC-Bayesian

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement

More information

PAC-learning, VC Dimension and Margin-based Bounds

PAC-learning, VC Dimension and Margin-based Bounds More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based

More information

Deep Neural Networks: From Flat Minima to Numerically Nonvacuous Generalization Bounds via PAC-Bayes

Deep Neural Networks: From Flat Minima to Numerically Nonvacuous Generalization Bounds via PAC-Bayes Deep Neural Networks: From Flat Minima to Numerically Nonvacuous Generalization Bounds via PAC-Bayes Daniel M. Roy University of Toronto; Vector Institute Joint work with Gintarė K. Džiugaitė University

More information

Model Averaging With Holdout Estimation of the Posterior Distribution

Model Averaging With Holdout Estimation of the Posterior Distribution Model Averaging With Holdout stimation of the Posterior Distribution Alexandre Lacoste alexandre.lacoste.1@ulaval.ca François Laviolette francois.laviolette@ift.ulaval.ca Mario Marchand mario.marchand@ift.ulaval.ca

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

Multiview Boosting by Controlling the Diversity and the Accuracy of View-specific Voters

Multiview Boosting by Controlling the Diversity and the Accuracy of View-specific Voters Multiview Boosting by Controlling the Diversity and the Accuracy of View-specific Voters Anil Goyal 1,2 milie Morvant 1 Pascal Germain 3 Massih-Reza Amini 2 1 Univ Lyon, UJM-Saint-tienne, CNRS, Institut

More information

Large-Scale Nearest Neighbor Classification with Statistical Guarantees

Large-Scale Nearest Neighbor Classification with Statistical Guarantees Large-Scale Nearest Neighbor Classification with Statistical Guarantees Guang Cheng Big Data Theory Lab Department of Statistics Purdue University Joint Work with Xingye Qiao and Jiexin Duan July 3, 2018

More information

PDEEC Machine Learning 2016/17

PDEEC Machine Learning 2016/17 PDEEC Machine Learning 2016/17 Lecture - Model assessment, selection and Ensemble Jaime S. Cardoso jaime.cardoso@inesctec.pt INESC TEC and Faculdade Engenharia, Universidade do Porto Nov. 07, 2017 1 /

More information

A first model of learning

A first model of learning A first model of learning Let s restrict our attention to binary classification our labels belong to (or ) We observe the data where each Suppose we are given an ensemble of possible hypotheses / classifiers

More information

PAC-Bayesian Bounds based on the Rényi Divergence

PAC-Bayesian Bounds based on the Rényi Divergence Luc Bégin Pascal Germain 2 François Laviolette 3 Jean-Francis Roy 3 lucbegin@umonctonca pascalgermain@inriafr {francoislaviolette, jean-francisroy}@iftulavalca Campus d dmundston, Université de Moncton,

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

A tutorial on the Pac-Bayesian Theory. by François Laviolette

A tutorial on the Pac-Bayesian Theory. by François Laviolette A tutorial on the Pac-Bayesian Theory NIPS workshop - (Almost) 50 shades of Bayesian Learning: PAC-Bayesian trends and insights by François Laviolette Laboratoire du GRAAL, Université Laval December 9th

More information

Adaptive Monte Carlo methods

Adaptive Monte Carlo methods Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert

More information

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families Midterm Review Topics we covered Machine Learning Optimization Basics of optimization Convexity Unconstrained: GD, SGD Constrained: Lagrange, KKT Duality Linear Methods Perceptrons Support Vector Machines

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple

More information

Improved Algorithms for Confidence-Rated Prediction with Error Guarantees

Improved Algorithms for Confidence-Rated Prediction with Error Guarantees Improved Algorithms for Confidence-Rated Prediction with Error Guarantees Kamalika Chaudhuri Chicheng Zhang Department of Computer Science and Engineering University of California, San Diego 95 Gilman

More information

ABC random forest for parameter estimation. Jean-Michel Marin

ABC random forest for parameter estimation. Jean-Michel Marin ABC random forest for parameter estimation Jean-Michel Marin Université de Montpellier Institut Montpelliérain Alexander Grothendieck (IMAG) Institut de Biologie Computationnelle (IBC) Labex Numev! joint

More information

The AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m

The AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m ) Set W () i The AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m m =.5 1 n W (m 1) i y i h(x i ; 2 ˆθ

More information

Learning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin

Learning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin Learning Theory Machine Learning CSE546 Carlos Guestrin University of Washington November 25, 2013 Carlos Guestrin 2005-2013 1 What now n We have explored many ways of learning from data n But How good

More information

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting Estimating the accuracy of a hypothesis Setting Assume a binary classification setting Assume input/output pairs (x, y) are sampled from an unknown probability distribution D = p(x, y) Train a binary classifier

More information

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017 Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training

More information

PAC-Bayesian Theory Meets Bayesian Inference

PAC-Bayesian Theory Meets Bayesian Inference PAC-Bayesian Theory Meets Bayesian Inference Pascal Germain Francis Bach Alexandre Lacoste Simon Lacoste-Julien INRIA Paris - École Normale Supérieure, firstname.lastname@inria.fr Google, allac@google.com

More information

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015 Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and

More information

Active and Semi-supervised Kernel Classification

Active and Semi-supervised Kernel Classification Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

Generative MaxEnt Learning for Multiclass Classification

Generative MaxEnt Learning for Multiclass Classification Generative Maximum Entropy Learning for Multiclass Classification A. Dukkipati, G. Pandey, D. Ghoshdastidar, P. Koley, D. M. V. S. Sriram Dept. of Computer Science and Automation Indian Institute of Science,

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

i=1 cosn (x 2 i y2 i ) over RN R N. cos y sin x

i=1 cosn (x 2 i y2 i ) over RN R N. cos y sin x Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 November 16, 017 Due: Dec 01, 017 A. Kernels Show that the following kernels K are PDS: 1.

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW2 due now! Project proposal due on tomorrow Midterm next lecture! HW3 posted Last time Linear Regression Parametric vs Nonparametric

More information

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

More information

Voting (Ensemble Methods)

Voting (Ensemble Methods) 1 2 Voting (Ensemble Methods) Instead of learning a single classifier, learn many weak classifiers that are good at different parts of the data Output class: (Weighted) vote of each classifier Classifiers

More information

IFT Lecture 7 Elements of statistical learning theory

IFT Lecture 7 Elements of statistical learning theory IFT 6085 - Lecture 7 Elements of statistical learning theory This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s): Brady Neal and

More information

Bias-Variance in Machine Learning

Bias-Variance in Machine Learning Bias-Variance in Machine Learning Bias-Variance: Outline Underfitting/overfitting: Why are complex hypotheses bad? Simple example of bias/variance Error as bias+variance for regression brief comments on

More information

Domain Adaptation of Majority Votes via Perturbed Variation-based Label Transfer

Domain Adaptation of Majority Votes via Perturbed Variation-based Label Transfer Domain Adaptation of Majority Votes via Perturbed Variation-based Label Transfer Emilie Morvant To cite this version: Emilie Morvant. Domain Adaptation of Majority Votes via Perturbed Variation-based Label

More information

10-701/ Machine Learning, Fall

10-701/ Machine Learning, Fall 0-70/5-78 Machine Learning, Fall 2003 Homework 2 Solution If you have questions, please contact Jiayong Zhang .. (Error Function) The sum-of-squares error is the most common training

More information

Part of the slides are adapted from Ziko Kolter

Part of the slides are adapted from Ziko Kolter Part of the slides are adapted from Ziko Kolter OUTLINE 1 Supervised learning: classification........................................................ 2 2 Non-linear regression/classification, overfitting,

More information

Variance Reduction and Ensemble Methods

Variance Reduction and Ensemble Methods Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis

More information

Dan Roth 461C, 3401 Walnut

Dan Roth   461C, 3401 Walnut CIS 519/419 Applied Machine Learning www.seas.upenn.edu/~cis519 Dan Roth danroth@seas.upenn.edu http://www.cis.upenn.edu/~danroth/ 461C, 3401 Walnut Slides were created by Dan Roth (for CIS519/419 at Penn

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information

Generalization, Overfitting, and Model Selection

Generalization, Overfitting, and Model Selection Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How

More information

1-bit Matrix Completion. PAC-Bayes and Variational Approximation

1-bit Matrix Completion. PAC-Bayes and Variational Approximation : PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Bayes In Paris, 5 January 2017 (Happy New Year!) Various Topics covered Matrix Completion PAC-Bayesian Estimation Variational

More information

Sample and Computationally Efficient Active Learning. Maria-Florina Balcan Carnegie Mellon University

Sample and Computationally Efficient Active Learning. Maria-Florina Balcan Carnegie Mellon University Sample and Computationally Efficient Active Learning Maria-Florina Balcan Carnegie Mellon University Machine Learning is Shaping the World Highly successful discipline with lots of applications. Computational

More information

Machine Learning. Ensemble Methods. Manfred Huber

Machine Learning. Ensemble Methods. Manfred Huber Machine Learning Ensemble Methods Manfred Huber 2015 1 Bias, Variance, Noise Classification errors have different sources Choice of hypothesis space and algorithm Training set Noise in the data The expected

More information

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE You are allowed a two-page cheat sheet. You are also allowed to use a calculator. Answer the questions in the spaces provided on the question sheets.

More information

U Logo Use Guidelines

U Logo Use Guidelines Information Theory Lecture 3: Applications to Machine Learning U Logo Use Guidelines Mark Reid logo is a contemporary n of our heritage. presents our name, d and our motto: arn the nature of things. authenticity

More information

Introduction to Restricted Boltzmann Machines

Introduction to Restricted Boltzmann Machines Introduction to Restricted Boltzmann Machines Ilija Bogunovic and Edo Collins EPFL {ilija.bogunovic,edo.collins}@epfl.ch October 13, 2014 Introduction Ingredients: 1. Probabilistic graphical models (undirected,

More information

Expectation Propagation for Approximate Bayesian Inference

Expectation Propagation for Approximate Bayesian Inference Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given

More information

Series 7, May 22, 2018 (EM Convergence)

Series 7, May 22, 2018 (EM Convergence) Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18

More information

Yevgeny Seldin. University of Copenhagen

Yevgeny Seldin. University of Copenhagen Yevgeny Seldin University of Copenhagen Classical (Batch) Machine Learning Collect Data Data Assumption The samples are independent identically distributed (i.i.d.) Machine Learning Prediction rule New

More information

On the Complexity of Best Arm Identification with Fixed Confidence

On the Complexity of Best Arm Identification with Fixed Confidence On the Complexity of Best Arm Identification with Fixed Confidence Discrete Optimization with Noise Aurélien Garivier, Emilie Kaufmann COLT, June 23 th 2016, New York Institut de Mathématiques de Toulouse

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms

VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms 03/Feb/2010 VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms Presented by Andriy Temko Department of Electrical and Electronic Engineering Page 2 of

More information

TDT4173 Machine Learning

TDT4173 Machine Learning TDT4173 Machine Learning Lecture 3 Bagging & Boosting + SVMs Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline 1 Ensemble-methods

More information

CS7267 MACHINE LEARNING

CS7267 MACHINE LEARNING CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning

More information

Computational Learning Theory. CS534 - Machine Learning

Computational Learning Theory. CS534 - Machine Learning Computational Learning Theory CS534 Machine Learning Introduction Computational learning theory Provides a theoretical analysis of learning Shows when a learning algorithm can be expected to succeed Shows

More information

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous

More information

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:

More information

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Slides adapted from Jordan Boyd-Graber Machine Learning: Chenhao Tan Boulder 1 of 39 Recap Supervised learning Previously: KNN, naïve

More information

Learning SVM Classifiers with Indefinite Kernels

Learning SVM Classifiers with Indefinite Kernels Learning SVM Classifiers with Indefinite Kernels Suicheng Gu and Yuhong Guo Dept. of Computer and Information Sciences Temple University Support Vector Machines (SVMs) (Kernel) SVMs are widely used in

More information

Machine Learning. Linear Models. Fabio Vandin October 10, 2017

Machine Learning. Linear Models. Fabio Vandin October 10, 2017 Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w

More information

CSE 546 Final Exam, Autumn 2013

CSE 546 Final Exam, Autumn 2013 CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,

More information

6.1 Variational representation of f-divergences

6.1 Variational representation of f-divergences ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Machine Learning using Bayesian Approaches

Machine Learning using Bayesian Approaches Machine Learning using Bayesian Approaches Sargur N. Srihari University at Buffalo, State University of New York 1 Outline 1. Progress in ML and PR 2. Fully Bayesian Approach 1. Probability theory Bayes

More information