PAC-Bayes Analysis Of Maximum Entropy Learning

Size: px
Start display at page:

Download "PAC-Bayes Analysis Of Maximum Entropy Learning"

Transcription

1 PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E 6BT Abstract We extend and apply the PAC-Bayes theore to the analysis of axiu entropy learning by considering axiu entropy classification. The theory introduces a ultiple sapling technique that controls an effective argin of the bound. We further develop a dual ipleentation of the convex optiisation that optiises the bound. This algorith is tested on soe siple datasets and the value of the bound copared with the test error. 1 INTRODUCTION Maxiising the entropy of a distribution subject to certain constraints is a well-established ethod of regularising learning or general odelling both in statistics (Kapur & Kesevan, 1992) and achine learning (Wang et al., 2004; Dudik & Schapire, 2006). The otivation for the approach is that axiising the entropy is a ethod of preventing overspecialisation and hence overfitting of the odel. Despite this clear otivation for and interest in the technique there are to our knowledge no statistical learning theory results that bound the perforance of such heuristics and hence otivate the use of this approach. There is an intriguing suggestion that the KL divergence appearing in the PAC-Bayes bound (Langford, 2005) could relate to the entropy of the distribution. It was this possibility that otivated the work presented in this paper. The difficulty with applying the PAC- Bayes fraework was that there did not see to be a natural way to define the distribution over hypotheses in such a way that the KL divergence easured Appearing in Proceedings of the 12 th International Conference on Artificial Intelligence and Statistics (AISTATS) 2009, Clearwater Beach, Florida, USA. Volue 5 of JMLR: W&CP 5. Copyright 2009 by the authors. the entropy of the distribution, while the error would relate to the loss of the corresponding function. We ake this connection through a novel ethod of using the distribution to generate outputs that on the one hand ensures the probability of isclassification can be bounded by twice the stochastic loss, while on the other hand the epirical loss is related to the argin on the training data. Note that we concentrate on the classification case, though we believe that this work will lay the foundations for the ore interesting proble of density function odelling using the axiu entropy principle (Dudik & Schapire, 2006). While this work does not prove that using axiu entropy regularisation is preferable to other ethods currently ipleented in coonly used algoriths, as for exaple 2-nor regularisation in Support Vector Machines, or 1-nor regularisation in boosting and LASSO ethods, it places this principle on a fir foundation in statistical learning theory and in this sense places it on a par with these other ethods. We do present very preliinary experients, but the question of whether axiu entropy regularisation as an approach to classification has a significant role to play in practical achine learning is left for future research. The paper is organised as follows. Section 2 introduces the PAC-Bayes approach to bounding generalisation and gives the application to bounding error in ters of the entropy of the posterior distribution. Section 3 takes the results of this analysis to create a convex optiisation for which a dual for is derived that can be ipleented efficiently. In Section 4 we present results coparing the bound values with test set errors for soe UCI datasets. Despite the inclusion of these results we wish to ephasise that the ain contribution of the paper is the developent of new techniques that enable the PAC-Bayes analysis to be used to analyse a very different learning algorith and proises to enable its application to ore general odelling tasks. 480

2 PAC-Bayes Analysis Of Maxiu Entropy Classification 2 ERROR ANALYSIS We first state the general PAC-Bayes result following (McAllester, 1998, 1999; Seeger, 2003; Langford, 2005) after giving two relevant definitions. We assue a class C of classifiers with a prior distribution P over C and posterior distribution Q and a distribution D governing the generation of the input/output saples. We use e Q to denote the expected error under the distribution Q over classifiers: e Q = E (x,y) D,c Q [I [c(x) y]]. Given a training saple S = {(x 1, y 1 ),..., (x, y )}, we siilarly define ê Q = 1 E c Q [I [c(x i ) y i ]]. Theore 2.1 ((Langford, 2005)) Fix an arbitrary D, arbitrary prior P, and confidence δ, then with probability at least 1 δ over saples S D, all posteriors Q satisfy KL(ê Q e Q ) KL(Q P) + ln(( + 1)/δ) where KL is the KL divergence between distributions [ KL(Q P) = E c Q ln Q(c) ] P(c) with ê Q and e Q considered as distributions on {0, +1}. We consider the following function class { ( N ) } F = f w : x X sgn w i x i : w 1 1, where we assue that X is a subset of the l ball of radius 1, that is all coponents of x in the support of the distribution have absolute value bounded by 1. We are considering a frequentist style bound, so that we posit a fixed but unknown distribution D that governs the generation of the input data, be it i.i.d. in the training set or as a test exaple. We would like to apply a argin based PAC-Bayes bound to such 1-nor regularised classifiers. For a given choice of weight vector w with w 1 = 1, for which we ay expect any coponents to be equal to zero, we wish to create a posterior distribution Q(w) such that we can bound P (x,y) D (f w (x) y) 2e Q(w), where e Q(w) is the expected error under the distribution Q(w): e Q(w) = E (x,y) D,q Q(w) [I [q(x) y]]. Given a training saple S = {(x 1, y 1 ),..., (x, y )}, we siilarly define ê Q(w) = 1 E q Q(w) [I [q(x i ) y i ]]. We first define the posterior distribution Q(w). The general for of a classifier q will involve a rando weight vector W R N together with a rando threshold Θ and the output will be q W,Θ (x) = sgn( W, x Θ). The distribution Q(w) of W will be discrete with W = sgn(w i )e i ; with probability w i, i = 1,...,N, where e i is the unit vector with 1 in diension i and zeros in all other diensions. The distribution of Θ is unifor on the interval [ 1, 1]. Proposition 2.2 With the above definitions, we have for w satisfying w 1 = 1, that for any (x, y) X { 1, +1}, P q Q(w) (q(x) y) = 0.5(1 y w, x ). Proof Consider a fixed (x, y). We have = = P q Q(w) (q(x) y) w i P Θ (sgn (sgn(w i ) e i, x Θ) y) w i P Θ (sgn (sgn(w i )x i Θ) y) = 0.5 w i (1 ysgn(w i )x i ) = 0.5(1 y w, x ), as required. Proposition 2.3 With the above definitions, we have for w satisfying w 1 = 1, that P (x,y) D (f w (x) y) 2e Q(w). Proof By Proposition 2.2 we have P q Q(w) (q(x) y) = 0.5(1 y w, x ). Hence, it follows that P q Q(w) (q(x) y) > 0.5 f w (x) y. 481

3 Shawe-Taylor, Hardoon We can now estiate the error of the stochastic classifier e Q(w) = E (x,y) D,q Q(w) [I [q(x) y]] as required. = E (x,y) D E q Q(w) [I [q(x) y]] 0.5P (x,y) D (f w (x) y), We can now ake the application of the PAC-Bayes bound to 1-nor regularised linear classifiers. Theore 2.4 Let X be a subset of the l 1-ball in R N, and let D be a distribution with support X. With probability at least 1 δ over the draw of training sets of size, we have for all w satisfying w 1 = 1 that KL(ê Q(w) e Q(w) ) [ 1 N w i ln w i + ln(2n) + ln(( + 1)/δ) Proof The result follows fro an application of Proposition 2.1 by choosing the prior to be unifor on all of the vectors ±e i, i = 1,...,N. The iplicit bound given by the KL divergence is not always easy to read, so we introduce the following notation for the inversion of the KL divergence. KL 1 (ê, A) = ax{e : KL(ê e) A}, e iplying that KL 1 (ê, A) is the largest value satisfying KL ( ê, KL 1 (ê, A) ) A. Corollary 2.5 Let X be a subset of the l 1-ball in R N, and let D be a distribution with support X. With probability at least 1 δ over the draw of training sets of size, we have for all w satisfying w 1 = 1 that e Q(w) KL 1 ( ê Q(w), H ) where N H = w i ln( w i ) + ln(2n) + ln(( + 1)/δ) The expression given by Proposition 2.2 for the epirical error is too weak to obtain a strong bound. We will therefore boost the power of discriination by sapling T copies of the distribution q Q T (w) and then use the classification ( T q W,Θ (x) = sgn sgn ( W t, x Θ t)), (1) ] or in other words taking a ajority vote of the T saples to decide the classification given to the input x. The effect on the KL divergence is a siple ultiplication by the factor T, while the bound on the true error given by Proposition 2.3 reains valid. Hence, we arrive at the following further corollary. Corollary 2.6 Let X be a subset of the l 1-ball in R N, and let D be a distribution with support X. With probability at least 1 δ over the draw of training sets of size, we have for all w satisfying w 1 = 1 that P (x,y) D (f w (x) y) 2KL 1 ( ê Q T (w), H ), where H = T P N wi ln( wi )+T ln(2n)+ln((+1)/δ) and ê QT (w) is the epirical error of the classifier given by equation (1). Finally, note that it is straightforward to copute ê QT (w) using Proposition 2.2 as the following derivation shows: ê Q T (w) = = T/2 t=0 t=0 ( ) T (1 P(q(x) y)) t t P(q(x) y) T t T/2 ( ) T (1 0.5(1 y w, x )) t t (0.5(1 y w, x )) T t T/2 = 0.5 T t=0 (1 y w, x ) T t. ( ) T (1 + y w, x ) t t The function ê Q T (w) exhibits a sharper reverse sigoid like behaviour as y w, x increases and with T controlling the steepness of the cutoff. 3 ALGORITHMICS The generalisation bound otivates the following axiu entropy optiisation. in w,ρ,ξ w j ln w j Cρ + D subject to: y i w, x i ρ ξ i, 1 i, w 1 1, ξ i 0, 1 i. This section will investigate solution of this convex optiisation proble using duality ethods. We will derive an algorith that has been ipleented in soe sall experients in the next section. ξ i 482

4 PAC-Bayes Analysis Of Maxiu Entropy Classification Using the decoposition of w j = w j + w j with w j +, w j 0, we for the Langrangian L = (w + j + w j )ln(w+ j + w j ) Cρ + D +λ (w + j + w j ) 1 α i N y i (w + j w j )x ij ρ + ξ i N ξ i β i η + j w+ j η j w j Taking derivatives with respect to the priary variables and setting equal to zero gives w + j w j = ln(w j + + w j ) + 1 α i y i x ij + λ η + j = 0, (2) = ln(w + j + w j ) α i y i x ij + λ η j = 0 (3) = D α i β i = 0, α i D (4) ξ i = C + α i = 0, α i = C. (5) ρ Furtherore, adding equations (2) and (3) gives w + j + w j = exp ( η + j + η j 2 while subtracting the gives 1 λ ) ξ i, (6) α i y i x i = 1 ( η η +). (7) 2 Finally, suing over j and adding w j + ties equation (2) plus wj ties equation (3) gives N (w + j + w j )ln(w+ j + w j ) + (w + j + w j ) +λ (w + j + w j ) N α i y i (w + j w j )x ij N η + j w+ j η j w j = 0. Subtracting this fro the objective we obtain L = (w + j + w j ) λ. Using equation (6) we obtain the dual proble ( η + j ax L = exp + ) η j 1 λ α,η +,η 2 subject to: α i y i x i = 1 ( η η +), 2 λ η j +, η j 0, 1 j N, α i = C, 0 α i D, 1 i. Finally, using equation (7) we can eliinate η + and η to obtain a siplified expression ( ) ax L = exp α i y i x ij 1 λ α subject to: λ α i = C 0 α i D, 1 i. Taking a gradient ascent algorith we can update α along the gradient given by ( ) α α + ζ (8) α where ( ) ( = w j sgn α i y i x ij )y t x tj + µ, (9) α t where we have introduced a Lagrange ultiplier µ for the 1-nor constraint on α and ( ) w j = Aexp α i y i x ij 1 (10) with A = exp( λ) chosen so that N w j = 1. We should only involve those α i in the update for which 0 < α i < D, or if α i = 0 and the gradient is positive, or if α i = D and the gradient is negative. After the update with sall learning rate ζ we ove each α i back into the interval [0, D] and update µ by ( ) µ µ τ α i C, (11) for a saller learning rate τ. Note that by equation (7) and the definition of η + and η ( ) sgn(w j ) = sgn α i y i x ij, (12) 483

5 Shawe-Taylor, Hardoon since w j positive iplies η j η+ j > 0. Hence, the resulting algorith is given as follows. Algorith Maxiu Entropy Classification Input: Matrix of training exaples X, Paraeters C, D. Output: Vector of weights w 1. Initialise α i = 1/C for all i 2. Initialise µ = 0 3. Copute w using equation (10) with A chosen so 1-nor is Copute gradients using equation (9) 5. I = (1 : ) 6. repeat 7. repeat 8. update α(i) using equation (8) 9. if α i [0, D] reove i fro I and set α i to 0 or D. 10. Copute w using equation (10) with A chosen so 1-nor is Copute gradients for indices I using equation (9) 12. until no change 13. Update µ using equation (11) 14. Copute all gradients using equation (9) 15. Include in I any i for which α i = 0 but gradient positive, or α i = D and gradient negative 16. until no change; 17. Use equation (12) to adjust the sign of w j. 3.1 KERNEL MAXIMUM ENTROPY We are able to extend the current fraework to nonlinear features through using the kernel trick and coputing a Cholesky decoposition of the resulting kernel atrix. K = X X = R Q QR = R R The coputation of R ij corresponds to evaluating the inner product between φ(x i ) with the new basis vector q j for j < i. The basis vector q j are the result of an iplicit Gra-Schidt orthonoalisation of the data in the feature space. We are able to view the new representation as a new projection function into a lower diensional subspace. This new representation of the data in the coluns of atrix R, r i, which gives the exact sae kernel atrix. ˆφ : φ(x i ) r i. where r i is the ith colun of R. The resulting kernel axiu entropy axiisation is ( ) ax L = exp α i y i r i 1 λ α subject to: λ α i = C 0 α i D, 1 i. 4 EXPERIMENTS 4.1 TESTING THE BOUND Initially we test the algorith and resulting bound on two of UCI datasets suarised in Table 1. We are interested in observing the behaviour of the bound for a given training and testing set in the linear case. In this following experient we fix C = 1 and D = l where l is the nuber of saples. Table 1: Datasets considered Data Features Training Testing Ionosphere Breast The algorith was run on each of these sets and the bound of Corollary 2.6 was coputed for different values of T. Figure 1 shows a plot of the value of the bound for the Ionosphere data as a function of T. As indicated in Section 2, T controls the rate at which the values dip as we ove away fro the argin, but we pay a penalty in the ultiplication of the KL divergence. Hence, its role is very siilar to the argin paraeter in SVM learning bounds. Note that we should introduce an extra penalty of ln(40)/ as we have effectively applied the theore 40 ties for the different values of T, but we ignore this detail in our reported results. A plot for the Breast Cancer dataset is given in Figure 2. Finally, Table 2 shows the test error with that given by the iniu bound (over T) on these sall datasets. Table 2: Test errors and bound values Data Error Bound Ionosphere Breast While the bound values are far fro non-trivial, they do show that the bound is able to deliver interesting values, particularly in the case of the Breast cancer dataset. Note that the factor of 2 has been included, soething frequently oitted in reporting PAC-Bayes siulation results. 484

6 PAC-Bayes Analysis Of Maxiu Entropy Classification Table 3: Datasets description: Each row contains the nae of the dataset, the nuber of saples and features (i.e. attributes) as well as the total nuber of positive and negative saples. Dataset # Saples # Features # Positive Saples # Negative Saples Votes Glass Haberan Bupa Credit Pia BreastW Ionosphere Bound on Ionosphere 0.9 Bound on Breast Bound value 1 Bound value Value of T Value of T Figure 1: Plot of bound as a function of T for Ionosphere dataset Figure 2: Plot of bound as a function of T for Breast Cancer dataset 4.2 LINEAR & NONLINEAR RESULTS We now test the algorith in its linear and nonlinear variation on the eight UCI datasets suarised in Table 3. Furtherore, we are able to observe that the bound in Corollary 2.5 does not depend on C, D therefore allowing us to choose values that iniise the bound. We first consider using the linear feature space and copare the axiu entropy algorith with a Support Vector Machine (SVM) (with linear kernel) using cross-validation to deterine the best C value, while the C and D for the axiu entropy ethod were chosen to optiise the bound. Table 4 gives the test errors for the two algoriths on the UCI datasets together with the bound value for the axiu entropy ethod. The algorith has very siilar quality on Votes, Bupa, and Pia. The axiu entropy ethod perfors better on Glass, Haberan and Credit, while SVM is better on Ionosphere and significantly better on BreastW. Overall the results are encouraging though the BreastW results are concerning and require further investigation. In the nonlinear case we copute the Guassian kernel with a variety of values for the width paraeter γ. For the axiu entropy ethod we first perfor a coplete Cholesky decoposition of the kernel atrix and use the obtained features as the representation of the data. The γ value was deterined using crossvalidation for both algoriths, with the paraeters C for the SVM also deterined by cross-validation, but C and D for the axiu entropy ethod chosen to optiise the bound. Table 5 shows the results for both algoriths together with the bound values for the axiu entropy ethod. In this case the two ethods give siilar results for Votes only, with the axiu entropy ethod perforing slightly worse that the SVM in Glass and Bupa, but considerably worse on the other datasets. The results for the non-linear datasets are disappointing and we speculate that this is because of the poor choice of representation using the Cholesky decoposition. There is a possibility of using the bound to prioritise the choice of features by selecting features 485

7 Shawe-Taylor, Hardoon Table 4: Test errors and bound values for the linear ax entropy algorith and test errors for the SVM using a linear kernel. The C SVM paraeter was selected using cross validation while the C, D ax entropy paraeters where selected by iniising the bound. Data SVM Error Max Entropy Error Max Entropy Bound Votes ± ± ± Glass ± ± ± Haberan ± ± ± Bupa ± ± ± Credit ± ± ± Pia ± ± ± BreastW ± ± ± Ionosphere ± ± ± Table 5: Test errors and bound values for the non linear ax entropy algorith and test errors for the SVM using a Gaussian kernel. The C, γ SVM paraeter was selected using cross validation while the C, D ax entropy paraeters where selected by iniising the bound, the Gaussian γ paraeter was selected using cross-validation. Data SVM Error Max Entropy Error Max Entropy Bound Votes ± ± ± Glass ± ± ± Haberan ± ± ± Bupa ± ± ± Credit ± ± ± Pia ± ± ± BreastW ± ± ± Ionosphere ± ± ± greedily that axiise the value α i y i x ij that appears in the dual objective. This would correspond to the criterion used in boosting where the α i give a pseudo distribution over the exaples in which the weak learner ust give good correlation with the target. This could be used to drive a general boosting strategy for selecting weak learners fro a large pool. This is, however, beyond the scope of this paper and will be the subject of further research. 5 CONCLUSIONS The paper has developed new technology for applying PAC-Bayes analysis to an approach that has been of interest both in statistics and achine learning, naely the ipleentation of the principle of axiu entropy. Though developed only for classification in the current paper we believe that the extension to ore coplex tasks such as density odelling where the technique coes into its own will now be possible. In addition we have analysed the resulting convex optiisation proble that arises fro the derived bound and shown how a dual version gives rise to a siple and efficient algorithic ipleentation of the technique. It is an interesting feature of this algorith that though the weight vector w is not sparse the dual variables α are as one ight expect fro the resulting 1-nor constraint and as our experients deonstrate. Finally, we have ipleented the algorith on soe UCI datasets and deonstrated that the factor T in the bound behaves in a siilar anner to a argin paraeter. We have use the bound to drive the odel selection over the two paraeters C and D of the algorith and have deonstrated that the approach can be applied in kernel defined featured spaces using the Cholesky decoposition to generate an explicit representation.while the results for the linear feature representations are very encouraging those for the nonlinear case are soewhat disappointing. We speculate that ore care needs to be taken in the choice of representation in this case. 486

8 PAC-Bayes Analysis Of Maxiu Entropy Classification Acknowledgeents The authors would like to acknowledge financial support fro the EPSRC project Le Stru 1, EP- D References Dudik, M., & Schapire, R. E. (2006). Maxiu entropy distribution estiation with generalized regularization. In Proceedings of the conference on learning theory (colt) (pp ). Kapur, J. N., & Kesevan, H. K. (1992). Entropy optiization principles with applications. Acadeic Press. Langford, J. (2005). Tutorial on practical prediction theory for classification., 6, McAllester, D. A. (1998). Soe PAC-Bayesian theores. In Proceedings of the 11th annual conference on coputational learning theory (pp ). ACM Press. McAllester, D. A. (1999). PAC-Bayesian odel averaging. In Proceedings of the 12th annual conference on coputational learning theory (pp ). ACM Press. Seeger, M. (2003). Bayesian gaussian process odels: Pac-bayesian generalisation error bounds and sparse approxiations. Unpublished doctoral dissertation, University of Edinburgh. Wang, S., Schuurans, D., Peng, F., & Zhao, Y. (2004). Learning ixture odels with the regularized axiu entropy principle. IEEE Transactions on Neural Networks, 15(4),

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

PAC-Bayesian Learning of Linear Classifiers

PAC-Bayesian Learning of Linear Classifiers Pascal Gerain Pascal.Gerain.@ulaval.ca Alexandre Lacasse Alexandre.Lacasse@ift.ulaval.ca François Laviolette Francois.Laviolette@ift.ulaval.ca Mario Marchand Mario.Marchand@ift.ulaval.ca Départeent d inforatique

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Bayes Decision Rule and Naïve Bayes Classifier

Bayes Decision Rule and Naïve Bayes Classifier Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)

More information

Support Vector Machines. Maximizing the Margin

Support Vector Machines. Maximizing the Margin Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the

More information

Soft-margin SVM can address linearly separable problems with outliers

Soft-margin SVM can address linearly separable problems with outliers Non-linear Support Vector Machines Non-linearly separable probles Hard-argin SVM can address linearly separable probles Soft-argin SVM can address linearly separable probles with outliers Non-linearly

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago

More information

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

Support Vector Machines MIT Course Notes Cynthia Rudin

Support Vector Machines MIT Course Notes Cynthia Rudin Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space Journal of Machine Learning Research 3 (2003) 1333-1356 Subitted 5/02; Published 3/03 Grafting: Fast, Increental Feature Selection by Gradient Descent in Function Space Sion Perkins Space and Reote Sensing

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

Geometrical intuition behind the dual problem

Geometrical intuition behind the dual problem Based on: Geoetrical intuition behind the dual proble KP Bennett, EJ Bredensteiner, Duality and Geoetry in SVM Classifiers, Proceedings of the International Conference on Machine Learning, 2000 1 Geoetrical

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data Suppleentary to Learning Discriinative Bayesian Networks fro High-diensional Continuous Neuroiaging Data Luping Zhou, Lei Wang, Lingqiao Liu, Philip Ogunbona, and Dinggang Shen Proposition. Given a sparse

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

Support Vector Machines. Goals for the lecture

Support Vector Machines. Goals for the lecture Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

In this chapter, we consider several graph-theoretic and probabilistic models

In this chapter, we consider several graph-theoretic and probabilistic models THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions

More information

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair Proceedings of the 6th SEAS International Conference on Siulation, Modelling and Optiization, Lisbon, Portugal, Septeber -4, 006 0 A Siplified Analytical Approach for Efficiency Evaluation of the eaving

More information

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon Lecture 2: Enseble Methods Isabelle Guyon guyoni@inf.ethz.ch Introduction Book Chapter 7 Weighted Majority Mixture of Experts/Coittee Assue K experts f, f 2, f K (base learners) x f (x) Each expert akes

More information

Robustness and Regularization of Support Vector Machines

Robustness and Regularization of Support Vector Machines Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

Domain-Adversarial Neural Networks

Domain-Adversarial Neural Networks Doain-Adversarial Neural Networks Hana Ajakan, Pascal Gerain 2, Hugo Larochelle 3, François Laviolette 2, Mario Marchand 2,2 Départeent d inforatique et de génie logiciel, Université Laval, Québec, Canada

More information

HIGH RESOLUTION NEAR-FIELD MULTIPLE TARGET DETECTION AND LOCALIZATION USING SUPPORT VECTOR MACHINES

HIGH RESOLUTION NEAR-FIELD MULTIPLE TARGET DETECTION AND LOCALIZATION USING SUPPORT VECTOR MACHINES ICONIC 2007 St. Louis, O, USA June 27-29, 2007 HIGH RESOLUTION NEAR-FIELD ULTIPLE TARGET DETECTION AND LOCALIZATION USING SUPPORT VECTOR ACHINES A. Randazzo,. A. Abou-Khousa 2,.Pastorino, and R. Zoughi

More information

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA)

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA) Bayesian Learning Chapter 6: Bayesian Learning CS 536: Machine Learning Littan (Wu, TA) [Read Ch. 6, except 6.3] [Suggested exercises: 6.1, 6.2, 6.6] Bayes Theore MAP, ML hypotheses MAP learners Miniu

More information

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning JMLR: Workshop and Conference Proceedings vol (1) 1 15 New Bounds for Learning Intervals with Iplications for Sei-Supervised Learning David P. Helbold dph@soe.ucsc.edu Departent of Coputer Science, University

More information

Ensemble Based on Data Envelopment Analysis

Ensemble Based on Data Envelopment Analysis Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS

DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS N. van Erp and P. van Gelder Structural Hydraulic and Probabilistic Design, TU Delft Delft, The Netherlands Abstract. In probles of odel coparison

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

Lecture 21. Interior Point Methods Setup and Algorithm

Lecture 21. Interior Point Methods Setup and Algorithm Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and

More information

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION A eshsize boosting algorith in kernel density estiation A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION C.C. Ishiekwene, S.M. Ogbonwan and J.E. Osewenkhae Departent of Matheatics, University

More information

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis City University of New York (CUNY) CUNY Acadeic Works International Conference on Hydroinforatics 8-1-2014 Experiental Design For Model Discriination And Precise Paraeter Estiation In WDS Analysis Giovanna

More information

A Note on the Applied Use of MDL Approximations

A Note on the Applied Use of MDL Approximations A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention

More information

Ch 12: Variations on Backpropagation

Ch 12: Variations on Backpropagation Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith

More information

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

A Smoothed Boosting Algorithm Using Probabilistic Output Codes A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu

More information

Detection and Estimation Theory

Detection and Estimation Theory ESE 54 Detection and Estiation Theory Joseph A. O Sullivan Sauel C. Sachs Professor Electronic Systes and Signals Research Laboratory Electrical and Systes Engineering Washington University 11 Urbauer

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

Feedforward Networks

Feedforward Networks Feedforward Networks Gradient Descent Learning and Backpropagation Christian Jacob CPSC 433 Christian Jacob Dept.of Coputer Science,University of Calgary CPSC 433 - Feedforward Networks 2 Adaptive "Prograing"

More information

Probabilistic Machine Learning

Probabilistic Machine Learning Probabilistic Machine Learning by Prof. Seungchul Lee isystes Design Lab http://isystes.unist.ac.kr/ UNIST Table of Contents I.. Probabilistic Linear Regression I... Maxiu Likelihood Solution II... Maxiu-a-Posteriori

More information

PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification

PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification Eilie Morvant eilieorvant@lifuniv-rsfr okol Koço sokolkoco@lifuniv-rsfr Liva Ralaivola livaralaivola@lifuniv-rsfr Aix-Marseille

More information

Multi-Dimensional Hegselmann-Krause Dynamics

Multi-Dimensional Hegselmann-Krause Dynamics Multi-Diensional Hegselann-Krause Dynaics A. Nedić Industrial and Enterprise Systes Engineering Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu B. Touri Coordinated Science Laboratory

More information

Tracking using CONDENSATION: Conditional Density Propagation

Tracking using CONDENSATION: Conditional Density Propagation Tracking using CONDENSATION: Conditional Density Propagation Goal Model-based visual tracking in dense clutter at near video frae rates M. Isard and A. Blake, CONDENSATION Conditional density propagation

More information

Testing Properties of Collections of Distributions

Testing Properties of Collections of Distributions Testing Properties of Collections of Distributions Reut Levi Dana Ron Ronitt Rubinfeld April 9, 0 Abstract We propose a fraework for studying property testing of collections of distributions, where the

More information

SPECTRUM sensing is a core concept of cognitive radio

SPECTRUM sensing is a core concept of cognitive radio World Acadey of Science, Engineering and Technology International Journal of Electronics and Counication Engineering Vol:6, o:2, 202 Efficient Detection Using Sequential Probability Ratio Test in Mobile

More information

Optimal nonlinear Bayesian experimental design: an application to amplitude versus offset experiments

Optimal nonlinear Bayesian experimental design: an application to amplitude versus offset experiments Geophys. J. Int. (23) 155, 411 421 Optial nonlinear Bayesian experiental design: an application to aplitude versus offset experients Jojanneke van den Berg, 1, Andrew Curtis 2,3 and Jeannot Trapert 1 1

More information

Feedforward Networks. Gradient Descent Learning and Backpropagation. Christian Jacob. CPSC 533 Winter 2004

Feedforward Networks. Gradient Descent Learning and Backpropagation. Christian Jacob. CPSC 533 Winter 2004 Feedforward Networks Gradient Descent Learning and Backpropagation Christian Jacob CPSC 533 Winter 2004 Christian Jacob Dept.of Coputer Science,University of Calgary 2 05-2-Backprop-print.nb Adaptive "Prograing"

More information

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material Consistent Multiclass Algoriths for Coplex Perforance Measures Suppleentary Material Notations. Let λ be the base easure over n given by the unifor rando variable (say U over n. Hence, for all easurable

More information

Pseudo-marginal Metropolis-Hastings: a simple explanation and (partial) review of theory

Pseudo-marginal Metropolis-Hastings: a simple explanation and (partial) review of theory Pseudo-arginal Metropolis-Hastings: a siple explanation and (partial) review of theory Chris Sherlock Motivation Iagine a stochastic process V which arises fro soe distribution with density p(v θ ). Iagine

More information

Learnability of Gaussians with flexible variances

Learnability of Gaussians with flexible variances Learnability of Gaussians with flexible variances Ding-Xuan Zhou City University of Hong Kong E-ail: azhou@cityu.edu.hk Supported in part by Research Grants Council of Hong Kong Start October 20, 2007

More information

UNIVERSITY OF TRENTO ON THE USE OF SVM FOR ELECTROMAGNETIC SUBSURFACE SENSING. A. Boni, M. Conci, A. Massa, and S. Piffer.

UNIVERSITY OF TRENTO ON THE USE OF SVM FOR ELECTROMAGNETIC SUBSURFACE SENSING. A. Boni, M. Conci, A. Massa, and S. Piffer. UIVRSITY OF TRTO DIPARTITO DI IGGRIA SCIZA DLL IFORAZIO 3823 Povo Trento (Italy) Via Soarive 4 http://www.disi.unitn.it O TH US OF SV FOR LCTROAGTIC SUBSURFAC SSIG A. Boni. Conci A. assa and S. Piffer

More information

arxiv: v3 [stat.ml] 9 Aug 2016

arxiv: v3 [stat.ml] 9 Aug 2016 with Specialization to Linear Classifiers Pascal Gerain Aaury Habrard François Laviolette 3 ilie Morvant INRIA, SIRRA Project-Tea, 75589 Paris, France, et DI, École Norale Supérieure, 7530 Paris, France

More information

A Theoretical Framework for Deep Transfer Learning

A Theoretical Framework for Deep Transfer Learning A Theoretical Fraewor for Deep Transfer Learning Toer Galanti The School of Coputer Science Tel Aviv University toer22g@gail.co Lior Wolf The School of Coputer Science Tel Aviv University wolf@cs.tau.ac.il

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

lecture 36: Linear Multistep Mehods: Zero Stability

lecture 36: Linear Multistep Mehods: Zero Stability 95 lecture 36: Linear Multistep Mehods: Zero Stability 5.6 Linear ultistep ethods: zero stability Does consistency iply convergence for linear ultistep ethods? This is always the case for one-step ethods,

More information

Convex Programming for Scheduling Unrelated Parallel Machines

Convex Programming for Scheduling Unrelated Parallel Machines Convex Prograing for Scheduling Unrelated Parallel Machines Yossi Azar Air Epstein Abstract We consider the classical proble of scheduling parallel unrelated achines. Each job is to be processed by exactly

More information

On the Impact of Kernel Approximation on Learning Accuracy

On the Impact of Kernel Approximation on Learning Accuracy On the Ipact of Kernel Approxiation on Learning Accuracy Corinna Cortes Mehryar Mohri Aeet Talwalkar Google Research New York, NY corinna@google.co Courant Institute and Google Research New York, NY ohri@cs.nyu.edu

More information

Improved Guarantees for Agnostic Learning of Disjunctions

Improved Guarantees for Agnostic Learning of Disjunctions Iproved Guarantees for Agnostic Learning of Disjunctions Pranjal Awasthi Carnegie Mellon University pawasthi@cs.cu.edu Avri Blu Carnegie Mellon University avri@cs.cu.edu Or Sheffet Carnegie Mellon University

More information

A method to determine relative stroke detection efficiencies from multiplicity distributions

A method to determine relative stroke detection efficiencies from multiplicity distributions A ethod to deterine relative stroke detection eiciencies ro ultiplicity distributions Schulz W. and Cuins K. 2. Austrian Lightning Detection and Inoration Syste (ALDIS), Kahlenberger Str.2A, 90 Vienna,

More information

Iterative Decoding of LDPC Codes over the q-ary Partial Erasure Channel

Iterative Decoding of LDPC Codes over the q-ary Partial Erasure Channel 1 Iterative Decoding of LDPC Codes over the q-ary Partial Erasure Channel Rai Cohen, Graduate Student eber, IEEE, and Yuval Cassuto, Senior eber, IEEE arxiv:1510.05311v2 [cs.it] 24 ay 2016 Abstract In

More information

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution Testing approxiate norality of an estiator using the estiated MSE and bias with an application to the shape paraeter of the generalized Pareto distribution J. Martin van Zyl Abstract In this work the norality

More information

arxiv: v1 [cs.lg] 8 Jan 2019

arxiv: v1 [cs.lg] 8 Jan 2019 Data Masking with Privacy Guarantees Anh T. Pha Oregon State University phatheanhbka@gail.co Shalini Ghosh Sasung Research shalini.ghosh@gail.co Vinod Yegneswaran SRI international vinod@csl.sri.co arxiv:90.085v

More information

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS ISSN 1440-771X AUSTRALIA DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS An Iproved Method for Bandwidth Selection When Estiating ROC Curves Peter G Hall and Rob J Hyndan Working Paper 11/00 An iproved

More information

Feedforward Networks

Feedforward Networks Feedforward Neural Networks - Backpropagation Feedforward Networks Gradient Descent Learning and Backpropagation CPSC 533 Fall 2003 Christian Jacob Dept.of Coputer Science,University of Calgary Feedforward

More information

E. Alpaydın AERFAISS

E. Alpaydın AERFAISS E. Alpaydın AERFAISS 00 Introduction Questions: Is the error rate of y classifier less than %? Is k-nn ore accurate than MLP? Does having PCA before iprove accuracy? Which kernel leads to highest accuracy

More information

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes Radeacher Coplexity Margin Bounds for Learning with a Large Nuber of Classes Vitaly Kuznetsov Courant Institute of Matheatical Sciences, 25 Mercer street, New York, NY, 002 Mehryar Mohri Courant Institute

More information

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness A Note on Scheduling Tall/Sall Multiprocessor Tasks with Unit Processing Tie to Miniize Maxiu Tardiness Philippe Baptiste and Baruch Schieber IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights,

More information

RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION

RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION GUANGHUI LAN AND YI ZHOU Abstract. In this paper, we consider a class of finite-su convex optiization probles defined over a distributed

More information

Multi-view Discriminative Manifold Embedding for Pattern Classification

Multi-view Discriminative Manifold Embedding for Pattern Classification Multi-view Discriinative Manifold Ebedding for Pattern Classification X. Wang Departen of Inforation Zhenghzou 450053, China Y. Guo Departent of Digestive Zhengzhou 450053, China Z. Wang Henan University

More information

Compressive Distilled Sensing: Sparse Recovery Using Adaptivity in Compressive Measurements

Compressive Distilled Sensing: Sparse Recovery Using Adaptivity in Compressive Measurements 1 Copressive Distilled Sensing: Sparse Recovery Using Adaptivity in Copressive Measureents Jarvis D. Haupt 1 Richard G. Baraniuk 1 Rui M. Castro 2 and Robert D. Nowak 3 1 Dept. of Electrical and Coputer

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

paper prepared for the 1996 PTRC Conference, September 2-6, Brunel University, UK ON THE CALIBRATION OF THE GRAVITY MODEL

paper prepared for the 1996 PTRC Conference, September 2-6, Brunel University, UK ON THE CALIBRATION OF THE GRAVITY MODEL paper prepared for the 1996 PTRC Conference, Septeber 2-6, Brunel University, UK ON THE CALIBRATION OF THE GRAVITY MODEL Nanne J. van der Zijpp 1 Transportation and Traffic Engineering Section Delft University

More information

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES Vol. 57, No. 3, 2009 Algoriths for parallel processor scheduling with distinct due windows and unit-tie obs A. JANIAK 1, W.A. JANIAK 2, and

More information

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical IEEE TRANSACTIONS ON INFORMATION THEORY Large Alphabet Source Coding using Independent Coponent Analysis Aichai Painsky, Meber, IEEE, Saharon Rosset and Meir Feder, Fellow, IEEE arxiv:67.7v [cs.it] Jul

More information

Research Article Robust ε-support Vector Regression

Research Article Robust ε-support Vector Regression Matheatical Probles in Engineering, Article ID 373571, 5 pages http://dx.doi.org/10.1155/2014/373571 Research Article Robust ε-support Vector Regression Yuan Lv and Zhong Gan School of Mechanical Engineering,

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

Biostatistics Department Technical Report

Biostatistics Department Technical Report Biostatistics Departent Technical Report BST006-00 Estiation of Prevalence by Pool Screening With Equal Sized Pools and a egative Binoial Sapling Model Charles R. Katholi, Ph.D. Eeritus Professor Departent

More information