PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification

Size: px
Start display at page:

Download "PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification"

Transcription

1 PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification Eilie Morvant okol Koço Liva Ralaivola Aix-Marseille Univ, QARMA, LIF, CNR, UMR 779, F-303, Marseille, France Abstract In this paper, we propose a PAC-Bayes bound for the generalization risk of the Gibbs classifier in the ulti-class classification fraework The novelty of our work is the critical use of the confusion atrix of a classifier as an error easure; this puts our contribution in the line of work aiing at dealing with perforance easure that are richer than ere scalar criterion such as the isclassification rate Thanks to very recent and beautiful results on atrix concentration inequalities, we derive two bounds showing that the true confusion risk of the Gibbs classifier is upper-bounded by its epirical risk plus a ter depending on the nuber of training exaples in each class To the best of our knowledge, this is the first PAC-Bayes bounds based on confusion atrices Introduction The PAC-Bayesian fraework, first introduced in McAllester, 999a, is an iportant field of research in learning theory It borrows ideas fro the philosophy of Bayesian inference and ixes the with techniques used in statistical approaches of learning Given a faily of classifiers F, the ingredients of a PAC-Bayesian bound are a prior distribution P over F, a learning saple and a posterior distribution Q over F Distribution P conveys soe prior belief on what are the best classifiers fro F prior any access to ; the classifiers expected to be the ost perforant for the classification task at hand therefore have the largest weights under P The posterior distribution Appearing in Proceedings of the 9 th International Conference on Machine Learning, Edinburgh, cotland, UK, 0 Copyright 0 by the authors/owners Q is learned/adjusted using the inforation provided by the training set The essence of PAC-Bayesian results is to bound the risk of the stochastic Gibbs classifier associated with Q Catoni, 004 When specialized to appropriate function spaces F and relevant failies of prior and posterior distributions, PAC-Bayes bounds can be used to characterize the error of different existing classification ethods, such as support vector achines Langford & hawe-taylor, 00; Langford, 005 PAC-Bayes bounds can also be used to derive new supervised learning algoriths For exaple, Lacasse et al 007 have introduced an elegant bound on the risk of the ajority vote, which holds for any space F This bound is used to derive an algorith, MinCq Laviolette et al, 0, which achieves epirical results on par with state-of-the-art ethods oe other iportant results are given in Catoni, 007; eegerr, 00; McAllester, 999b; Langford et al, 00 In this paper, we address the ulti-class classification proble Given the ability of PAC-Bayesian bounds to explain perforances of learning ethods, related contributions to our work are ulti-class forulations for the VMs, such as those of Weston & Watkins 998; Lee et al 004 and Craer & inger 00 As ajority vote ethods, we ay relate our work to boosting ulti-class extensions of AdaBoost Freund & chapire, 996, such as in the fraework of Mukherjee & chapire 0, and to the algoriths AdaBoostMH/AdaBoostMR chapire & inger, 999 and AMME Zhu et al, 009 The originality of our work is that we consider the confusion atrix of the Gibbs classifier as an error easure We believe that in the ulti-class fraework, it is ore relevant to consider the confusion atrix as the error easure than the ere isclassification error, which corresponds to the probability for soe classifier h to err for its prediction on x The inforation as to

2 what is the probability for an instance of class p to be classified into class q with p q by soe predictor is indeed crucial in soe applications think of the difference between false-negative and false-positive predictions in a diagnosis autoated syste To the best of our knowledge, we are the first to propose a generalization bound on the confusion atrix in the PAC- Bayesian fraework The result we propose heavily relies on a atrix concentration inequality for sus of rando atrices of Tropp 0 The rest of this paper is organized as follows ec introduces the setting of ulti-class learning and soe of the basic notation used throughout the paper ec 3 briefly recalls the folk PAC-Bayes bound as introduced in McAllester, 003 In ec 4, we present the ain contribution of this paper, our PAC-Bayes bound on the confusion atrix, followed by its proof in ec 5 We discuss soe future works in ec 6 etting and Notation This section presents the general setting that we consider and the different tools that we will ake use of General Proble etting We consider classification tasks over soe input space X The output space is Y,, Q}, where Q is the nuber of classes The learning saple is denoted by x i, y i } where each exaple is drawn iid fro a fixed but unknown probability distribution D over X Y ; D denotes the distribution of an - saple We consider the faily F Y X of classifiers, and P and Q will respectively refer to prior and posterior distributions over F Given the prior distribution P and the training set, learning ais at finding a posterior Q leading to good generalization The Kullback-Leibler divergence KLQ P between Q and P is KLQ P E f Q log Qf Pf ; signx if x 0 and otherwise; The indicator function Ix is equal to if x is true and 0 otherwise Conventions and Basics on Matrices Throughout the paper we consider only real-valued square atrices C of order Q the nuber of classes t C is the transpose of the atrix C, Id Q denotes the identity atrix of size Q and 0 is the zero atrix The results given in this paper are based on a concentration inequality of Tropp 0 for sus of rando self-adjoint atrices, which extends to general real-valued atrices thanks to the dilation technique Paulsen 00: the dilation C of atrix C is C 0 C t C 0 Notation refers to the operator or spectral nor It returns the axiu singular value of its arguent: C axλ ax C, λ in C}, 3 where λ ax and λ in are respectively the algebraic axiu and iniu singular value of C Note that dilation preserves spectral inforation, so λ ax C C C 4 3 The Usual PAC-Bayes Theore Here, we recall the ain PAC-Bayesian bound in the binary classification case as presented in McAllester, 003; eegerr, 00; Langford, 005, where the set of labels is Y, } The true risk Rf and the epirical error R f of f are ined as: Rf E x,y D Ifx y, R f Ifx i y i The learner s ai is to choose a posterior distribution Q on F such that the risk of the Q-weighted ajority vote also called the Bayes classifier B Q is as sall as possible B Q akes a prediction according to B Q x sign E f Q fx The true risk RB Q and the epirical error R B Q of the Bayes classifier are ined as the probability that it coits an error on an exaple: RB Q P x,y D B Q x y 5 However, the PAC-Bayes approach does not directly bound the risk of B Q Instead, it bounds the risk of the stochastic Gibbs classifier G Q which predicts the label of x X by first drawing f according to Q and then returning fx The true risk RG Q and the epirical error R G Q of G Q are therefore: RG Q E f Q Rf ; R G Q E f Q R f 6 Note that in this setting, we have RB Q RG Q We present the PAC-Bayes theore which gives a bound on the error of the stochastic Gibbs classifier

3 Theore For any D, any F, any P of support F, any δ 0,, we have, P D Q on F, kl R G Q, RG Q KLQ P + ln ξ δ, δ where kla, b a ln a a b + a ln b, and ξ i0 i i/ i i/ i We now provide a novel PAC-Bayes bound in the context of ulti-class classification by considering the confusion atrix as an error easure 4 Multiclass PAC-Bayes Bound 4 Definitions and etting As said earlier, we focus on ulti-class classification The output space is Y,, Q}, with Q > We only consider learning algoriths acting on learning saple x i, y i } where each exaple is drawn iid according to D, such that Q and yj for every class y j Y, where yj is the nuber of exaples of real class y j In the context of ulti-class classification, an error easure can be a perforance tool called confusion atrix We consider the classical inition of the confusion atrix based on conditional probabilities: it is inherent and desirable to iniize the effects of unbalanced classes Concretely, for a given classifier f F and a saple x i, y i } D, the epirical confusion atrix D f ˆd pq p,q Q of f is ined as follows: p, q, ˆdpq yi Ifx i qiy i p The true confusion atrix D f d pq p,q Q of f over D corresponds to: p, q, d pq E x yp I fx q P x,y D fx q y p If f correctly classifies every exaple of the saple, then all the eleents of the confusion atrix are 0, except for the diagonal ones which correspond to the correctly classified exaples Hence the ore there are non-zero eleents in a confusion atrix outside the diagonal, the ore the classifier is prone to err Recall that in a learning process the objective is to learn a classifier f F with a low true error ie with good generalization guarantees, we are thus only interested in the errors of f Our objective is then to find f leading to a confusion atrix with the ore zero eleents outside the diagonal ince the diagonal gives the conditional probabilities of correct predictions, we propose to consider a different kind of confusion atrix by discarding the diagonal values Then the only non-zero eleents of the new confusion atrix correspond to the exaples that are isclassified by f For all f F we ine the epirical and true confusion atrices of f by respectively C f ĉ pq p,q Q and C f c pq p,q Q such that for all p, q: 0 if q p ĉ pq 7 otherwise, c pq ˆd pq 0 if q p d pq P x,y D fx q y p otherwise 8 Note that if f correctly classifies every exaple of a given saple, then the epirical confusion atrix C f is equal to 0 iilarly, if f is a perfect classifier over the distribution D, then the true confusion atrix is equal to 0 Therefore a relevant task is to iniize the size of the confusion atrix, thus having a confusion atrix as close to 0 as possible 4 Main Result: Confusion PAC-Bayes Bound for the Gibbs Classifier Our ain result is a PAC-Bayes generalization bound over the Gibbs classifier G Q in this particular context, where the epirical and true error easures are respectively given by the confusion atrices fro 7 and 8 In this case, we can ine the true and the epirical confusion atrices of G Q respectively by: C G Q E f Q E D C f ; CG Q E f Q C f Given f Q and a saple D, our objective is to bound the difference between C G Q and C G Q, the true and epirical errors of the Gibbs classifier Reark that the error rate P fx y of a classifier f ight be directly coputed as the -nor of t C f p, where p is the vector of prior probabilities However, in our case, concentration inequalities are only available for the operator nor ince we have u Q u for any Q-diensional vector u, we have that P fx y Q C f op Thus trying to iniize the operator nor of C f is a relevant strategy to control the risk Here is our ain result, a bound on the operator nor of the difference between C G Q and C G Q Theore Let X R d be the input space, Y,, Q} the output space, D a distribution over X Y with D the distribution of a -saple and F a

4 faily of classifiers fro X to Y Then for every prior distribution P over F and any δ 0,, we have: P D Q on F, C G Q 8Q 8Q C G Q KLQ P + ln } δ, 4δ where in y,,q y is the inial nuber of exaples fro which belong to the sae class Proof Deferred to ection 5 Note that, for all y Y, we need the following hypothesis: y > 8, which is not too strong a liitation Finally, we rewrite Theore in order to provide a bound on the size C G Q Corollary We consider the hypothesis of the Theore We have: P D Q on F, C G Q C G Q + 8Q 8Q KLQ P + ln } δ 4δ Proof By application of the reverse triangle inequality A B A B to Theore Both Theore and Corollary yield a bound on the estiation through the operator nor of the true confusion atrix of the Gibbs classifier over the posterior distribution Q, though this is ore explicit in the corollary Let the nuber of classes Q be a constant, then the true risk is upper-bounded by the epirical risk of the Gibbs classifier and a ter depending on the nuber of training exaples, especially on the value which corresponds to the inial quantity of exaples that belong to the sae class This eans that the larger, the closer the epirical confusion atrix of the Gibbs classifier is to its true atrix 43 Upper Bound on the Risk of the Majority Vote Classifier We recall that the Bayes classifier B Q is well known as ajority vote classifier under a given posterior distribution Q In the ulticlass setting, B Q is such that for any exaple it returns the ajority class under the easure Q and we ine it as: B Q x argax c Y E f Q Ifx c 9 We ine the conditional Gibbs risk RG Q, p, q and Bayes risk RG Q, p, q as RG Q, p, q E x D yp E f Q Ifx q, 0 RB Q, p, q E x D yp I argax c Y gc, q where gc, q E f Q Ifx c q The forer is the p, q entry of C G Q if p q and the latter is the p, q entry of C B Q Proposition Let Q be the nuber of classes Then RB Q, p, q and RG Q, p, q are related by the following inequality : q, p, RB Q, p, q QRG Q, p, q Proof Given in Morvant et al, 0 This proposition iplies the following result on the confusion atrices associated to B Q and G Q Corollary Let Q be the nuber of class Then C B Q and C G Q are related by the following inequality: C B Q Q C G Q 3 Proof Given in Morvant et al, 0 5 Proof of Theore This section gives the foral proof of Theore We first introduce a concentration inequality for a su of rando square atrices This allows us to deduce the PAC-Bayes generalization bound for confusion atrices by following the sae three step process as the one given in McAllester, 003; eegerr, 00; Langford, 005 for the classic PAC-Bayesian bound 5 Concentration Inequality for the Confusion Matrix The ain result of our work is based on the following corollary of a result on the concentration inequality for a su of self-adjoint atrices given by Tropp 0 see Theore 3 in Appendix this theore generalizes Hoeffding s inequality to the case self-adjoint rando atrices The purpose of the following corollary is to restate Theore 3 so that it carries over to atrices that are not self-adjoint It is central to us to have such a result as the atrices we are dealing with, naely confusion atrices, are rarely syetric Corollary 3 Consider a finite sequence M i } of independent, rando, square atrices of order Q, and let a i } be a sequence of fixed scalars Assue that

5 each rando atrix satisfies E i M i 0 and M i a i alost surely Then, with σ i a i, we have, ɛ 0, P } ɛ M i ɛ Q exp 8σ 4 i Proof We want to verify the hypothesis given in Theore 3 in order to apply it Let M i } be a finite sequence of independent, rando, square atrices of order Q such that E i M i 0 and let a i } be a sequence of fixed scalars such that M i a i We consider the sequence M i } of rando self-adjoint atrices with diension Q By the inition of the dilation, we obtain E i M i 0 Fro Equation 4, the dilation preserves the spectral inforation Thus, on the one hand, we have: i M i λ ax M i λ ax M i On the other hand, we have: i M i M i λ ax Mi a i To assure the hypothesis M i A i, we need to find a suitable sequence of fixed self-adjoint atrices A i } of diension Q where refers to the seiinite order on self-adjoint atrices Indeed, it suffices to construct a diagonal atrix ined as λ ax Mi Id Q for ensuring M i λax Mi Id Q More precisely, since for every i we have λ ax Mi a i, we fix A i as a diagonal atrix with a i on the diagonal, ie A i a i Id Q, with i A i i a i σ Finally, we can invoke Theore 3 to obtain the concentration inequality 4 In order to ake use of this corollary, we rewrite the confusion atrices as sus of exaple-based confusion atrices That is, for each exaple x i, y i, we ine its epirical confusion atrix by C f i ĉ pq i p,q Q as follows: p, q, ĉ pq i 0 if q p Ifx qiy i p yi otherwise where yi is the nuber of exaples of class y i Y belonging to Given an exaple x i, y i, the exaple-based confusion atrix contains at ost one non zero-eleent when f isclassifies x i, y i In the sae way, when f correctly classifies x i, y i then the exaple-based confusion atrix is equal to 0 Our error easure is then C f Cf i, that is we penalize only when f errs i We further introduce the rando square atrices C f i : which verifies E i C f i 0 C f i C f i E D C f i, 5 We have yet to find a suitable a i for a given C f i Let λ axi be the axiu singular value of C f i It is easy to verify that λ axi yi Thus, for all i we fix a i equal to yi Finally, with the introduced notations, Corollary 3 leads to the following concentration inequality: P C f i ɛ } ɛ Q exp 8σ 6 This inequality 6 allows us to deonstrate our Theore by following the process of McAllester, 003; eegerr, 00; Langford, Three tep Proof Of Our Bound First, thanks to the concentration inequality 6, we prove the following lea Lea Let Q be the size of C f i C f i E D C f i ined as in 5 Then the following bound holds for any δ 0, : P D E f P exp and C f 8σ 8σ } C f i Q 8σ δ δ Proof For readability reasons, we note C f C f i If Z is a real valued rando variable so that P Z z k exp ngz with gz non-negative, non-decreasing and k a constant, then P exp n gz ν in, kν n/n We apply this to the concentration inequality 6 Choosing gz z non-negative, z ɛ, n 8σ and k Q, we obtain the following result: } 8σ P exp 8σ C f ν in, Qν / 8σ 8σ Note that exp 8σ C f is always non-negative

6 Hence it allows us to copute its expectation as: 8σ E exp 8σ P exp 0 Q + C f 8σ } 8σ C f ν dν Qν / 8σ dν Q Q 8σ 8σ ν 8σ / 8σ Q 8σ For a given classifier f F, we have: E D 8σ exp 8σ C f Q 8σ 7 Then, if P is a probability distribution over F, Equation 7 iplies that: E D 8σ E f P exp 8σ C f Q 8σ 8 Using Markov s inequality, we obtain the result of the lea The second step to prove Theore is to use the shift given in McAllester, 003 We recall this result in the following lea Lea Donsker-Varadhan inequality Donsker & Varadhan 975 Given the Kullback-Leibler divergence KLQ P between two distributions P and Q and let g be a function, we have: E b Q gb KLQ P + ln E b Q expgb Proof ee McAllester, 003 Recall that C f and b C f C f i, Lea iplies: 8σ With gb 8σ b 8σ E f Q 8σ C f 8σ KLQ P+ln E f P exp 8σ C f 9 see Theore 4 in Appendix The KL-divergence is ined in Equation The last step that copletes the proof of Theore consists in applying the result we obtained in Lea to Equation 9 Then, we have: 8σ E f Q 8σ C f KLQ P+ln Q 8σ δ 0 ince g is clearly convex, we apply Jensen s inequality 3 to 0 Then, with probability at least δ over, and for every distribution Q on F, we have: E f Q C f 8σ 8σ KLQ P+ln Q 8σ δ ince C f C f i E D C f i, then the bound is quite siilar to the one given in Theore We present in the next section, the calculations leading to our PAC-Bayesian generalization bound 53 iplification We first copute the variance paraeter σ a i For that purpose, in ection 5 we showed that for each i,, }, we can choose a i yi, where y i is the class of the i-th exaple and yi is the nuber of exaples of class y i Thus we have: σ y i Q y i:y iy y Q y y For sake of siplification of Equation and since the ter on the right side of this equation is an increasing function with respect to σ, we propose to upper-bound σ : σ Q y Q y in y,,q y Let in y,,q y, then using Equation, we obtain the following bound fro Equation : E f Q C f 8Q 8Q Then: E f Q C f 8Q 8Q KLQ P + ln 4δ KLQ P + ln 4δ 3 It reains to replace C f C f i E D C f i Recall that C G Q E f Q E D C f and CG Q 3 see Theore 5 in Appendix

7 E f Q C f, we obtain: E f Q C f E f Q E f Q C f i C f i E D C f i E f Q C f E D C f E f Q C f E D C f E D C f i E f Q C f E f QE D C f C G Q C G Q 4 By substituting the left part of the inequality 3 with the ter 4, we find the bound of our Theore 6 Discussion and Future Work This work gives rise to any interesting questions, aong which the following ones oe future works will be focused on instantiating our bound given in Theore for specific ulti-class fraeworks, such as ulti-class VM and ulti-class boosting Taking advantage of our theore while using the confusion atrices, ay allow us to derive new generalization bounds for these ethods Additionally, we are interested in seeing how effective learning ethods ay be derived fro the risk bound we propose For instance, in the binary PAC-Bayes setting, the algorith MinCq proposed by Laviolette et al 0 iniizes a bound depending on the first two oents of the argin of the Q-weighted ajority vote Fro our Theore and with a siilar study, we would like to design a new ulti-class algorith and observe how sound such an algorith could be This would probably require the derivation of a Cantelli- Tchebycheff deviation inequality in the atrix case Besides, it ight be very interesting to see how the noncoutative/atrix concentration inequalities provided by Tropp 0 ight be of soe use for other kinds of learning proble such as ultilabel classification, label ranking probles or structured prediction issues Finally, the question of extending the present work to the analysis of algoriths learning possibly infinitediensional operators as proposed by Abernethy et al 009 is also very exciting 7 Conclusion In this paper, we propose a new PAC-Bayesian generalization bound that applies in the ulti-class classification setting The originality of our contribution is that we consider the confusion atrix as an error easure Coupled with the use of the operator nor on atrices, we are capable of providing generalization bound on the size of confusion atrix with the idea that the saller the nor of the confusion atrix of the learned classifier, the better it is for the classification task at hand The derivation of our result takes advantage of the concentration inequality proposed by Tropp 0 for the su of rando self-adjoint atrices, that we directly adapt to square atrices which are not self-adjoint The ain results are presented in Theore and Corollary The bound in Theore is given on the difference between the true risk of the Gibbs classifier and its epirical error While the one given in Corollary upper-bounds the risk of the Gibbs classifier by its epirical error An interesting point is that our bound depends on the inial quantity of training exaples belonging to the sae class, for a given nuber of classes If this value increases, ie if we have a lot of training exaples, then the epirical confusion atrix of the Gibbs classifier tends to be close to its true confusion atrix A point worth noting is that the bound varies as O/, which is a typical rate in bounds not using second-order inforation Last but not least, the present work has given rise to a few algorithic and theoretical questions that we have discussed in the previous section Appendix Theore 3 Concentration Inequality for Rando Matrices Tropp 0 Consider a finite sequence M i } of independant, rando, self-adjoint atrices with diension Q, and let A i } be a sequence of fixed self-adjoint atrices Assue that each rando atrix satisfies EM i 0 and M i A i alost surely Then, for all ɛ 0, } P λ ax M i ɛ i Q exp ɛ 8σ, where σ i A i and refers to the seiinite order on self-adjoint atrices Theore 4 Markov s inequality Let Z be a rando

8 variable and z 0, then: P Z z E Z z Theore 5 Jensen s inequality Let X be an integrable real-valued rando variable and g be a convex function, then: References fez EgZ Abernethy, Jacob, Bach, Francis, Evgeniou, Theodoros, and Vert, Jean-Philippe A new approach to collaborative filtering: Operator estiation with spectral regularization Journal of Machine Learning Research, 0:803 86, 009 Catoni, Olivier 4 gibbs estiators In tatistical Learning Theory and tochastic Optiization, volue 85, pp 35 pringer, 004 Catoni, Olivier PAC-bayesian supervised classification: The therodynaics of statistical learning ArXiv e-prints, 007 Craer, Koby and inger, Yora On the algorithic ipleentation of ulticlass kernel-based vector achines Journal of Machine Learning Research, :65 9, 00 Donsker, David and Varadhan, Asyptotic evaluation of certain arkov process expectations for large tie Counications on Pure and Applied Matheatics, 8, 975 Freund, Yoav and chapire, Robert E Experients with a new boosting algorith In In Proceedings of the International Conference on Machine Learning, pp 48 56, 996 Lacasse, Alexandre, Laviolette, François, Marchand, Mario, Gerain, Pascal, and Usunier, Nicolas PAC-bayes bounds for the risk of the ajority vote and the variance of the Gibbs classifier In Adv in Neural Processing ystes NIP, 007 Langford, John Tutorial on practical prediction theory for classification Journal of Machine Learning Research, 6:73 306, 005 Langford, John and hawe-taylor, John PAC-bayes & argins In Advances in Neural Inforation Processing ystes 5, pp MIT Press, 00 Langford, John, eeger, Matthias, and Megiddo, Nirod An iproved predictive accuracy bound for averaging classifiers In Proc of the International Conference on Machine Learning, pp 90 97, 00 Laviolette, François, Marchand, Mario, and Roy, Jean- Francis Fro PAC-Bayes Bounds to Quadratic Progras for Majority Votes In Proc of the International Conference on Machine Learning, June 0 Lee, Y, Lin, Y, and Wahba, G Multicategory support vector achines, theory, and application to the classification of icroarray data and satellite radiance data Journal of the Aerican tatistical Association, 99:67 8, 004 McAllester, David A oe PAC-bayesian theores Machine Learning, 37: , 999a McAllester, David A PAC-bayesian odel averaging In Proceedings of the annual conference on Coputational learning theory COLT, pp 64 70, 999b McAllester, David A iplified PAC-bayesian argin bounds In Proc of the annual conference on Coputational learning theory COLT, pp 03 5, 003 Morvant, Eilie, Koço, okol, and Ralaivola, Liva PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification Arxiv: 0 Mukherjee, Indraneel and chapire, Robert E A theory of ulticlass boosting CoRR, abs/08989, 0 Paulsen, VI Copletely bounded aps and operator algebras Cabridge studies in advanced atheatics Cabridge University Press, 00 chapire, Robert E and inger, Yora Iproved boosting algoriths using confidence-rated predictions In Machine Learning, pp 80 9, 999 eegerr, Matthias PAC-bayesian generalization error bounds for gaussian process classification Journal of Machine Learning Research, 3:33 69, 00 Tropp, Joel A User-friendly tail bounds for sus of rando atrices Foundations of Coputational Matheatics, pp 46, 0 Weston, Jason and Watkins, Chris Multi-class support vector achines, 998 Zhu, Ji, Zou, Hui, Rosset, aharon, and Hastie, Trevor Multi-class adaboost, 009

PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification

PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification Emilie Morvant, Sokol Koço, Liva Ralaivola To cite this version: Emilie Morvant, Sokol Koço, Liva Ralaivola. PAC-Bayesian

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes Radeacher Coplexity Margin Bounds for Learning with a Large Nuber of Classes Vitaly Kuznetsov Courant Institute of Matheatical Sciences, 25 Mercer street, New York, NY, 002 Mehryar Mohri Courant Institute

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

PAC-Bayesian Learning of Linear Classifiers

PAC-Bayesian Learning of Linear Classifiers Pascal Gerain Pascal.Gerain.@ulaval.ca Alexandre Lacasse Alexandre.Lacasse@ift.ulaval.ca François Laviolette Francois.Laviolette@ift.ulaval.ca Mario Marchand Mario.Marchand@ift.ulaval.ca Départeent d inforatique

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

PAC-Bayesian Generalization Bound for Multi-class Learning

PAC-Bayesian Generalization Bound for Multi-class Learning PAC-Bayesian Generalization Bound for Multi-class Learning Loubna BENABBOU Department of Industrial Engineering Ecole Mohammadia d Ingènieurs Mohammed V University in Rabat, Morocco Benabbou@emi.ac.ma

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

Understanding Machine Learning Solution Manual

Understanding Machine Learning Solution Manual Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y

More information

Support Vector Machines. Goals for the lecture

Support Vector Machines. Goals for the lecture Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed

More information

arxiv: v3 [stat.ml] 9 Aug 2016

arxiv: v3 [stat.ml] 9 Aug 2016 with Specialization to Linear Classifiers Pascal Gerain Aaury Habrard François Laviolette 3 ilie Morvant INRIA, SIRRA Project-Tea, 75589 Paris, France, et DI, École Norale Supérieure, 7530 Paris, France

More information

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges

More information

Domain-Adversarial Neural Networks

Domain-Adversarial Neural Networks Doain-Adversarial Neural Networks Hana Ajakan, Pascal Gerain 2, Hugo Larochelle 3, François Laviolette 2, Mario Marchand 2,2 Départeent d inforatique et de génie logiciel, Université Laval, Québec, Canada

More information

Support Vector Machines. Maximizing the Margin

Support Vector Machines. Maximizing the Margin Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Bayes Decision Rule and Naïve Bayes Classifier

Bayes Decision Rule and Naïve Bayes Classifier Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.

More information

1 Rademacher Complexity Bounds

1 Rademacher Complexity Bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability

More information

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)

More information

arxiv: v3 [stat.ml] 13 Jul 2017

arxiv: v3 [stat.ml] 13 Jul 2017 PAC-Bayesian Analysis for a two-step Hierarchical Multiview Learning Approach Anil Goyal,2 ilie Morvant Pascal Gerain 3 Massih-Reza Aini 2 Univ Lyon, UJM-aint-tienne, CNR, Institut d Optique Graduate chool,

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

Ensemble Based on Data Envelopment Analysis

Ensemble Based on Data Envelopment Analysis Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807

More information

Multi-Dimensional Hegselmann-Krause Dynamics

Multi-Dimensional Hegselmann-Krause Dynamics Multi-Diensional Hegselann-Krause Dynaics A. Nedić Industrial and Enterprise Systes Engineering Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu B. Touri Coordinated Science Laboratory

More information

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon Lecture 2: Enseble Methods Isabelle Guyon guyoni@inf.ethz.ch Introduction Book Chapter 7 Weighted Majority Mixture of Experts/Coittee Assue K experts f, f 2, f K (base learners) x f (x) Each expert akes

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

Simplified PAC-Bayesian Margin Bounds

Simplified PAC-Bayesian Margin Bounds Siplified PAC-Bayesian Margin Bounds David McAllester Toyota Technological Institute at Chicago callester@tti-c.org Abstract. The theoretical understanding of support vector achines is largely based on

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion Suppleentary Material for Fast and Provable Algoriths for Spectrally Sparse Signal Reconstruction via Low-Ran Hanel Matrix Copletion Jian-Feng Cai Tianing Wang Ke Wei March 1, 017 Abstract We establish

More information

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence Best Ar Identification: A Unified Approach to Fixed Budget and Fixed Confidence Victor Gabillon Mohaad Ghavazadeh Alessandro Lazaric INRIA Lille - Nord Europe, Tea SequeL {victor.gabillon,ohaad.ghavazadeh,alessandro.lazaric}@inria.fr

More information

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material Consistent Multiclass Algoriths for Coplex Perforance Measures Suppleentary Material Notations. Let λ be the base easure over n given by the unifor rando variable (say U over n. Hence, for all easurable

More information

Fundamental Limits of Database Alignment

Fundamental Limits of Database Alignment Fundaental Liits of Database Alignent Daniel Cullina Dept of Electrical Engineering Princeton University dcullina@princetonedu Prateek Mittal Dept of Electrical Engineering Princeton University pittal@princetonedu

More information

Soft-margin SVM can address linearly separable problems with outliers

Soft-margin SVM can address linearly separable problems with outliers Non-linear Support Vector Machines Non-linearly separable probles Hard-argin SVM can address linearly separable probles Soft-argin SVM can address linearly separable probles with outliers Non-linearly

More information

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013).

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013). A Appendix: Proofs The proofs of Theore 1-3 are along the lines of Wied and Galeano (2013) Proof of Theore 1 Let D[d 1, d 2 ] be the space of càdlàg functions on the interval [d 1, d 2 ] equipped with

More information

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are,

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are, Page of 8 Suppleentary Materials: A ultiple testing procedure for ulti-diensional pairwise coparisons with application to gene expression studies Anjana Grandhi, Wenge Guo, Shyaal D. Peddada S Notations

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

Tight Bounds for the Expected Risk of Linear Classifiers and PAC-Bayes Finite-Sample Guarantees

Tight Bounds for the Expected Risk of Linear Classifiers and PAC-Bayes Finite-Sample Guarantees Tight Bounds for the Expected Risk of Linear Classifiers and PAC-Bayes Finite-Saple Guarantees Jean Honorio CSAIL, MIT Cabridge, MA 0239, USA jhonorio@csail.it.edu Toi Jaakkola CSAIL, MIT Cabridge, MA

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Symmetrization and Rademacher Averages

Symmetrization and Rademacher Averages Stat 928: Statistical Learning Theory Lecture: Syetrization and Radeacher Averages Instructor: Sha Kakade Radeacher Averages Recall that we are interested in bounding the difference between epirical and

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA)

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA) Bayesian Learning Chapter 6: Bayesian Learning CS 536: Machine Learning Littan (Wu, TA) [Read Ch. 6, except 6.3] [Suggested exercises: 6.1, 6.2, 6.6] Bayes Theore MAP, ML hypotheses MAP learners Miniu

More information

Robustness and Regularization of Support Vector Machines

Robustness and Regularization of Support Vector Machines Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA

More information

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS ISSN 1440-771X AUSTRALIA DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS An Iproved Method for Bandwidth Selection When Estiating ROC Curves Peter G Hall and Rob J Hyndan Working Paper 11/00 An iproved

More information

Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm

Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm Journal of Machine Learning Research 16 2015 787-860 Submitted 5/13; Revised 9/14; Published 4/15 Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm Pascal Germain

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

A Smoothed Boosting Algorithm Using Probabilistic Output Codes A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu

More information

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a

More information

Learnability of Gaussians with flexible variances

Learnability of Gaussians with flexible variances Learnability of Gaussians with flexible variances Ding-Xuan Zhou City University of Hong Kong E-ail: azhou@cityu.edu.hk Supported in part by Research Grants Council of Hong Kong Start October 20, 2007

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Randomized Recovery for Boolean Compressed Sensing

Randomized Recovery for Boolean Compressed Sensing Randoized Recovery for Boolean Copressed Sensing Mitra Fatei and Martin Vetterli Laboratory of Audiovisual Counication École Polytechnique Fédéral de Lausanne (EPFL) Eail: {itra.fatei, artin.vetterli}@epfl.ch

More information

Geometrical intuition behind the dual problem

Geometrical intuition behind the dual problem Based on: Geoetrical intuition behind the dual proble KP Bennett, EJ Bredensteiner, Duality and Geoetry in SVM Classifiers, Proceedings of the International Conference on Machine Learning, 2000 1 Geoetrical

More information

Shannon Sampling II. Connections to Learning Theory

Shannon Sampling II. Connections to Learning Theory Shannon Sapling II Connections to Learning heory Steve Sale oyota echnological Institute at Chicago 147 East 60th Street, Chicago, IL 60637, USA E-ail: sale@athberkeleyedu Ding-Xuan Zhou Departent of Matheatics,

More information

Generalization of the PAC-Bayesian Theory

Generalization of the PAC-Bayesian Theory Generalization of the PACBayesian Theory and Applications to SemiSupervised Learning Pascal Germain INRIA Paris (SIERRA Team) Modal Seminar INRIA Lille January 24, 2017 Dans la vie, l essentiel est de

More information

Learnability and Stability in the General Learning Setting

Learnability and Stability in the General Learning Setting Learnability and Stability in the General Learning Setting Shai Shalev-Shwartz TTI-Chicago shai@tti-c.org Ohad Shair The Hebrew University ohadsh@cs.huji.ac.il Nathan Srebro TTI-Chicago nati@uchicago.edu

More information

Lecture 21. Interior Point Methods Setup and Algorithm

Lecture 21. Interior Point Methods Setup and Algorithm Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine CSCI699: Topics in Learning and Gae Theory Lecture October 23 Lecturer: Ilias Scribes: Ruixin Qiang and Alana Shine Today s topic is auction with saples. 1 Introduction to auctions Definition 1. In a single

More information

Distributed Subgradient Methods for Multi-agent Optimization

Distributed Subgradient Methods for Multi-agent Optimization 1 Distributed Subgradient Methods for Multi-agent Optiization Angelia Nedić and Asuan Ozdaglar October 29, 2007 Abstract We study a distributed coputation odel for optiizing a su of convex objective functions

More information

Exact tensor completion with sum-of-squares

Exact tensor completion with sum-of-squares Proceedings of Machine Learning Research vol 65:1 54, 2017 30th Annual Conference on Learning Theory Exact tensor copletion with su-of-squares Aaron Potechin Institute for Advanced Study, Princeton David

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011) E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how

More information

Lower Bounds for Quantized Matrix Completion

Lower Bounds for Quantized Matrix Completion Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

Chapter 6 1-D Continuous Groups

Chapter 6 1-D Continuous Groups Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:

More information

Stability Bounds for Non-i.i.d. Processes

Stability Bounds for Non-i.i.d. Processes tability Bounds for Non-i.i.d. Processes Mehryar Mohri Courant Institute of Matheatical ciences and Google Research 25 Mercer treet New York, NY 002 ohri@cis.nyu.edu Afshin Rostaiadeh Departent of Coputer

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

La théorie PAC-Bayes en apprentissage supervisé

La théorie PAC-Bayes en apprentissage supervisé La théorie PAC-Bayes en apprentissage supervisé Présentation au LRI de l université Paris XI François Laviolette, Laboratoire du GRAAL, Université Laval, Québec, Canada 14 dcembre 2010 Summary Aujourd

More information

PAC-Bayes Risk Bounds for Sample-Compressed Gibbs Classifiers

PAC-Bayes Risk Bounds for Sample-Compressed Gibbs Classifiers PAC-Bayes Ris Bounds for Sample-Compressed Gibbs Classifiers François Laviolette Francois.Laviolette@ift.ulaval.ca Mario Marchand Mario.Marchand@ift.ulaval.ca Département d informatique et de génie logiciel,

More information

Ch 12: Variations on Backpropagation

Ch 12: Variations on Backpropagation Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith

More information

Support Vector Machines MIT Course Notes Cynthia Rudin

Support Vector Machines MIT Course Notes Cynthia Rudin Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance

More information

A note on the realignment criterion

A note on the realignment criterion A note on the realignent criterion Chi-Kwong Li 1, Yiu-Tung Poon and Nung-Sing Sze 3 1 Departent of Matheatics, College of Willia & Mary, Williasburg, VA 3185, USA Departent of Matheatics, Iowa State University,

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung

More information

A Note on the Applied Use of MDL Approximations

A Note on the Applied Use of MDL Approximations A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention

More information

A Theoretical Framework for Deep Transfer Learning

A Theoretical Framework for Deep Transfer Learning A Theoretical Fraewor for Deep Transfer Learning Toer Galanti The School of Coputer Science Tel Aviv University toer22g@gail.co Lior Wolf The School of Coputer Science Tel Aviv University wolf@cs.tau.ac.il

More information

Deflation of the I-O Series Some Technical Aspects. Giorgio Rampa University of Genoa April 2007

Deflation of the I-O Series Some Technical Aspects. Giorgio Rampa University of Genoa April 2007 Deflation of the I-O Series 1959-2. Soe Technical Aspects Giorgio Rapa University of Genoa g.rapa@unige.it April 27 1. Introduction The nuber of sectors is 42 for the period 1965-2 and 38 for the initial

More information

On the Use of A Priori Information for Sparse Signal Approximations

On the Use of A Priori Information for Sparse Signal Approximations ITS TECHNICAL REPORT NO. 3/4 On the Use of A Priori Inforation for Sparse Signal Approxiations Oscar Divorra Escoda, Lorenzo Granai and Pierre Vandergheynst Signal Processing Institute ITS) Ecole Polytechnique

More information

Asynchronous Gossip Algorithms for Stochastic Optimization

Asynchronous Gossip Algorithms for Stochastic Optimization Asynchronous Gossip Algoriths for Stochastic Optiization S. Sundhar Ra ECE Dept. University of Illinois Urbana, IL 680 ssrini@illinois.edu A. Nedić IESE Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu

More information

Machine Learning: Fisher s Linear Discriminant. Lecture 05

Machine Learning: Fisher s Linear Discriminant. Lecture 05 Machine Learning: Fisher s Linear Discriinant Lecture 05 Razvan C. Bunescu chool of Electrical Engineering and Coputer cience bunescu@ohio.edu Lecture 05 upervised Learning ask learn an (unkon) function

More information

Supplement to: Subsampling Methods for Persistent Homology

Supplement to: Subsampling Methods for Persistent Homology Suppleent to: Subsapling Methods for Persistent Hoology A. Technical results In this section, we present soe technical results that will be used to prove the ain theores. First, we expand the notation

More information

Variational Adaptive-Newton Method

Variational Adaptive-Newton Method Variational Adaptive-Newton Method Mohaad Etiyaz Khan Wu Lin Voot Tangkaratt Zuozhu Liu Didrik Nielsen AIP, RIKEN, Tokyo Abstract We present a black-box learning ethod called the Variational Adaptive-Newton

More information

ON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD

ON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical and Matheatical Sciences 04,, p. 7 5 ON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD M a t h e a t i c s Yu. A. HAKOPIAN, R. Z. HOVHANNISYAN

More information

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning JMLR: Workshop and Conference Proceedings vol (1) 1 15 New Bounds for Learning Intervals with Iplications for Sei-Supervised Learning David P. Helbold dph@soe.ucsc.edu Departent of Coputer Science, University

More information

Semicircle law for generalized Curie-Weiss matrix ensembles at subcritical temperature

Semicircle law for generalized Curie-Weiss matrix ensembles at subcritical temperature Seicircle law for generalized Curie-Weiss atrix ensebles at subcritical teperature Werner Kirsch Fakultät für Matheatik und Inforatik FernUniversität in Hagen, Gerany Thoas Kriecherbauer Matheatisches

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information