1 Bounding the Margin

Size: px
Start display at page:

Download "1 Bounding the Margin"

Transcription

1 COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost fro the last lecture. We have shown previously, with probability at least 1 δ over the choice of S, for all f F, ( ) E D [f] ÊS[f] + 2 ˆR ln(1/δ) S (F) + O (1.1) where ÊS[f] represents the saple average. Additionally, we also want to show that if we can get a large argin fro AdaBoost, this will translate to a low generalization error. We want to use (1.1) to prove a bound on the generalization error in ters of the argins. First of all, H will denote a weak hypothesis space with VC-diension of d, and recall the convex hull of H which we defined to be: { } T T co(h) = f : f(x) = a t h t (x) T 1, a t 0, a t = 1, h 1,..., h T H t=1 t=1 (1.2) Theore 1. With training set of size, for all f co (H), and for all argin, Θ > 0, ( ) P r D [yf(x) 0] P ˆ r S [yf(x) Θ] + Õ d/θ 2 + ln 1 /δ with probability at least 1 δ. Proof. As shown in the last lecture, we state soe of the relationships known: ( ) Rˆ S (H) Õ d/ (1.3) M = {(x, y) yf(x) : f co(h)} (1.4) ˆR S (M) = ˆR S (co(h)) = ˆR S (H) (Radeacher coplexity) (1.5) We also need to note, fro last lecture, that: P r D [yf(x) 0] = E D [1{yf(x) 0}] (1.6) ˆ P r S [yf(x) Θ] = ÊS[1{yf(x) Θ}] (1.7)

2 Fro Figure 1.1, we can arrive at: where φ(u) is the Lipschitz function shown in the figure 1{u 0} φ(u) 1{u Θ} (1.8) Figure 1.1: Visualization of the indicator functions and the Lipschitz function, φ(u), where Lipschitz constant, L φ = 1 Θ (i.e. slope of line). Fro (1.6), Fro (1.7), P r D [yf(x) 0] = E D [1{yf(x) 0}] E D [φ(yf(x)] (1.9) P r D [yf(x) Θ] = ÊS[1{yf(x) Θ}] ÊS[φ(yf(x))] (1.10) By applying (1.1), for all f co(h), ( ) E D [φ(yf(x))] ÊS[φ(yf(x))] + 2 ˆR ln 1/δ S (φ M) + O (1.11) Now, the only thing is to copute 2 ˆR S (φ M). Recall fro previous lecture, the Talagrand s Lea, which states that for any hypothesis set H of real-valued functions, the following inequality holds if φ : R R is a Lipschitz function 1 : Fro (1.12), ˆR S (φ F) L φ ˆRS (F) (1.12) 1 φ is Lipschitz continuous eans that u, v, φ(u) φ(v) L φ u v, where L φ is soe Lipschitz constant 2

3 ˆR S (φ M) L φ ˆRS (M) = 1 /Θ R(H) ( ) 1 /Θ Õ d/ (1.13) Fro (1.9), P r D [yf(x) 0] = E D [1{yf(x) 0}] E D [φ(yf(x))] ( ) ÊS[φ(yf(x))] + Õ d/θ 2 + ln 1 /δ (using 1.13) ( ) P ˆ r S [yf(x) Θ] + Õ d/θ 2 + ln 1 /δ (using 1.10) 2 Support Vector Machines (SVM) We start off with labeled exaples (x 1, y 1 ),..., (x, y ), where x i R n and y i { 1, +1}, and our goal is to find a hypothesis that is consistent with all the exaples. We are assuing for now that the data is linearly separable and x i 2 1. Figure 2.1: Labeled exaples in a plane in which a separating hyperplane (i.e. shown to be passing through the origin. line) is Since the hypothesis ust be one that separates between the postive and negative labels as shown in Figure 2.1, there are in fact a lot of choices to choose fro in this case. Intuitively, the natural idea is to choose the hyperplane that axially separates between the positive and negative labels (i.e. to axiize δ). This is because we do not want the jiggering of a point to change the classification. 3

4 Assuing that the hyperplane passes through the origin and fro Figure 2.1, v is defined to be the noral vector to the hyperplane such that v 2 = 1 (i.e. L2-nor). Also the distance of a point x to the hyperplane is v x: Hence, the proble forulation becoes: > 0 if x above hyperplane v x = 0 if on hyperplane < 0 if x below hyperplane find v, δ ax δ s.t. v 2 = 1 { v x i δ if y i = +1 i v x i δ if y i = 1 (2.1) in which (2.1) is atheatically equivalent to y i (v x i ) δ (2.2) Another way to view this proble is to think in ters of increasing our confidence of the argin. Since we do not have control of the nuber of saples but only the choices of the hyperplanes, we could control the coplexities of the hyperplanes. Recall that the VC-diension of the linear threshold functions in R n equals to n, and not n + 1 because they go through the origin. Furtherore, the VC-diension of the linear threshold functions with argin δ is saller than 1 /δ 2, which eans that it is independent of n (to be proved as a corollary later). Since the VC-diension of linear threshold functions with argin δ is only dependent on δ 2, this eans that the bigger the argin δ, the less coplex the hyperplane gets. Hence, going back to the proble forulation in (2.1), we divide the ters in (2.2) by δ: If we let w = v δ, y i ( v δ x i) δ δ = 1 w = v = 1 δ δ We can then rewrite the proble forulation into a convex progra: in 1 2 w 2 2 (2.3) s.t. y i (w x i ) 1 (2.4) Since for all convex progras, there is an equivalent forulation, we can transfor it to its dual axiization for using Lagrange ultipliers α i (to be shown in later lectures!): 4

5 ax α i 1 2 i=1 s.t. i : α i 0 α i α j y i y j x i x j i=1 j=1 It turns out that: w = α i y i x i i=1 and a lot of α i will be zeros. In fact, α i will be zero only if the corresponding exaple (x i, y i ) is a support vector eaning that y i (w v) = 1. Since w does not depend on the points that are not support vectors but only on a sall subset of exaples, we can also see, fro proble 1 of hoework 4, that err(h) Õ ( k+ln 1/δ ), where k is the nuber of support vectors, δ is the argin, and error only depends on the k, and not the nuber of diensions. 2.1 Linearly Inseparable Data So far, we have been talking about linearly separable data such as the one shown in Figure 2.1, where there exists soe hyperplane that separates the positive and negative exaples. What happens when the data is linearly inseparable? Consider the following case: Figure 2.2: Linearly inseperable data on a plane with hypothesis as a line passing through the origin, where ξ 1 and ξ 2 represents the distances oved. In this siple case, to ake the data linearly separable, we can ove the data, through the distance ξ 1 and ξ 2 to the correct side of the hyperplane. However, there is a penalty to be paid for the distances oved and we will need to incorporate it to the proble forulation, (2.4): 5

6 in 1 2 w 2 +C ξ 2 i,where Cis a paraeter user has to choose i=1 s.t. y i (w v) 1 ξ i ξ i 0 The dual axiization for then becoes: ax α i 1 2 i=1 s.t. i : 0 α i C i=1 j=1 α i α j y i y j x i x j However, what if it is not even close to being linearly separable? Consider the following case: Figure 2.3: Linearly inseperable data on a plane. Support Vector Machine can still be used in this case. To do this, we ap the data to higher diensional space (i.e. fro 2 diensional space to 6 diensional space): (x 1, x 2 ) = x ψ(x) = (1, x 1, x 2, x 2 1, x 2 2, x 1 x 2 ) (2.5) However, how is this linearly separable now? Since the hyperplane equation is given by w x = 0, the hyperplane equation for the 6 diensional space becoes: a 1 + bx 1 + cx 2 + dx ex fx 1 x 2 = 0 (2.6) which turns out to be the equation of the conic section in 2 diensional space. Graphically, the hyperplane in this exaple is: 6

7 Figure 2.4: Linearly inseperable data on a plane with hypothesis as a circle in this case. The above approach could be generalized to n diensions: x R n ψ(x) However, the proble with this is that we have both coputational and statistical issues. Coputational Proble: We start with exaples in n diensions and add all ters up to degree d, then we are effectively apping to O(n d ) diensions. So, we ight start in 100 diensions, and end up with a trillion diensions or ore. So, even reading such a vector just once will take a very long tie. Statistical Proble: There ay be overfitting due to the addition of so any paraeters. For the statistical proble, Support Vector Machine is able to overcoe it since VCdiension = 1 /δ 2, and hence err(h) does not depend on the nuber of diensions. Fro the dual axiization forulation, we can see that there is only a need to copute the inner product of pairs of instances because w = i=1 α iy i x i and H(x) = sign(w x) = sign ( i=1 α iy i (x i x i )). Hence, if we are able to copute the inner product efficiently, then we can overcoe the coputational proble as well. We first consider adding constants to (2.5), which has no effect on the hyperplanes since the constants are arbitrary. Siilarly, (x 1, x 2 ) = x ψ(x) = (1, 2x 1, 2x 2, x 2 1, x 2 2, 2x 1 x 2 ) (2.7) (u 1, u 2 ) = u ψ(u) = (1, 2u 1, 2u 2, u 2 1, u 2 2, 2u 1 u 2 ) (2.8) For raising 2 diensional space to 6 diensional space, we just have to perfor the inner product for ψ(x) and ψ(u): 7

8 ψ(x) ψ(u) = 1 + 2x 1 u 1 + 2x 2 u 2 + x 2 1u x 2 2u x 1 x 2 u 1 u 2 = (1 + x 1 u 1 + x 2 u 2 ) 2 = (1 + x u) 2 Hence, adding all ters up to degree d, the coputation would be odified to becoe: (1 + x u) d. This elegant result eans that we can iplicitly copute inner product in the high diensional space by perforing all operations in the original low diensional space and then siply raising to the power of d. This ethod is called the kernel trick. In general, a kernel is a function K(x, u) that iplicitly coputes the inner product ψ(x) ψ(u) for soe apping ψ. An exaple of this is the Radial Basis Kernel, K(x, u) = exp[ ( x u 2 2 )]. To apply the kernel trick ethod, we just have to replace x i x j by K(x i, x j ) in the dual axiization forulation. It is also iportant to note the following: Recall that when we used the assuption, x i 2 1, VC-diension 1 /δ 2. However, if x i 2 R, VC-diension R2 /δ 2. This eans that there is a tradeoff between radius, R and δ, since the apping ψ ay increase δ, but ight also increase R. 8

1 Rademacher Complexity Bounds

1 Rademacher Complexity Bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability

More information

Support Vector Machines. Maximizing the Margin

Support Vector Machines. Maximizing the Margin Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Support Vector Machines MIT Course Notes Cynthia Rudin

Support Vector Machines MIT Course Notes Cynthia Rudin Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)

More information

Lecture 21. Interior Point Methods Setup and Algorithm

Lecture 21. Interior Point Methods Setup and Algorithm Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Support Vector Machines. Goals for the lecture

Support Vector Machines. Goals for the lecture Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

VC Dimension and Sauer s Lemma

VC Dimension and Sauer s Lemma CMSC 35900 (Spring 2008) Learning Theory Lecture: VC Diension and Sauer s Lea Instructors: Sha Kakade and Abuj Tewari Radeacher Averages and Growth Function Theore Let F be a class of ±-valued functions

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

2 Upper-bound of Generalization Error of AdaBoost

2 Upper-bound of Generalization Error of AdaBoost COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Haipeng Zheng March 5, 2008 1 Review of AdaBoost Algorithm Here is the AdaBoost Algorithm: input: (x 1,y 1 ),...,(x m,y

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

Bayes Decision Rule and Naïve Bayes Classifier

Bayes Decision Rule and Naïve Bayes Classifier Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.

More information

Geometrical intuition behind the dual problem

Geometrical intuition behind the dual problem Based on: Geoetrical intuition behind the dual proble KP Bennett, EJ Bredensteiner, Duality and Geoetry in SVM Classifiers, Proceedings of the International Conference on Machine Learning, 2000 1 Geoetrical

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

Understanding Machine Learning Solution Manual

Understanding Machine Learning Solution Manual Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011) E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how

More information

Introduction to Discrete Optimization

Introduction to Discrete Optimization Prof. Friedrich Eisenbrand Martin Nieeier Due Date: March 9 9 Discussions: March 9 Introduction to Discrete Optiization Spring 9 s Exercise Consider a school district with I neighborhoods J schools and

More information

The Methods of Solution for Constrained Nonlinear Programming

The Methods of Solution for Constrained Nonlinear Programming Research Inventy: International Journal Of Engineering And Science Vol.4, Issue 3(March 2014), PP 01-06 Issn (e): 2278-4721, Issn (p):2319-6483, www.researchinventy.co The Methods of Solution for Constrained

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

Symmetrization and Rademacher Averages

Symmetrization and Rademacher Averages Stat 928: Statistical Learning Theory Lecture: Syetrization and Radeacher Averages Instructor: Sha Kakade Radeacher Averages Recall that we are interested in bounding the difference between epirical and

More information

Lean Walsh Transform

Lean Walsh Transform Lean Walsh Transfor Edo Liberty 5th March 007 inforal intro We show an orthogonal atrix A of size d log 4 3 d (α = log 4 3) which is applicable in tie O(d). By applying a rando sign change atrix S to the

More information

Soft-margin SVM can address linearly separable problems with outliers

Soft-margin SVM can address linearly separable problems with outliers Non-linear Support Vector Machines Non-linearly separable probles Hard-argin SVM can address linearly separable probles Soft-argin SVM can address linearly separable probles with outliers Non-linearly

More information

1 Proving the Fundamental Theorem of Statistical Learning

1 Proving the Fundamental Theorem of Statistical Learning THEORETICAL MACHINE LEARNING COS 5 LECTURE #7 APRIL 5, 6 LECTURER: ELAD HAZAN NAME: FERMI MA ANDDANIEL SUO oving te Fundaental Teore of Statistical Learning In tis section, we prove te following: Teore.

More information

Principal Components Analysis

Principal Components Analysis Principal Coponents Analysis Cheng Li, Bingyu Wang Noveber 3, 204 What s PCA Principal coponent analysis (PCA) is a statistical procedure that uses an orthogonal transforation to convert a set of observations

More information

Chaotic Coupled Map Lattices

Chaotic Coupled Map Lattices Chaotic Coupled Map Lattices Author: Dustin Keys Advisors: Dr. Robert Indik, Dr. Kevin Lin 1 Introduction When a syste of chaotic aps is coupled in a way that allows the to share inforation about each

More information

3.8 Three Types of Convergence

3.8 Three Types of Convergence 3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to

More information

Robustness and Regularization of Support Vector Machines

Robustness and Regularization of Support Vector Machines Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Research Article Robust ε-support Vector Regression

Research Article Robust ε-support Vector Regression Matheatical Probles in Engineering, Article ID 373571, 5 pages http://dx.doi.org/10.1155/2014/373571 Research Article Robust ε-support Vector Regression Yuan Lv and Zhong Gan School of Mechanical Engineering,

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 12 Scribe: Indraneel Mukherjee March 12, 2008

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 12 Scribe: Indraneel Mukherjee March 12, 2008 COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture # 12 Scribe: Indraneel Mukherjee March 12, 2008 In the previous lecture, e ere introduced to the SVM algorithm and its basic motivation

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall

More information

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

More information

DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS

DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS N. van Erp and P. van Gelder Structural Hydraulic and Probabilistic Design, TU Delft Delft, The Netherlands Abstract. In probles of odel coparison

More information

Lecture 9: Multi Kernel SVM

Lecture 9: Multi Kernel SVM Lecture 9: Multi Kernel SVM Stéphane Canu stephane.canu@litislab.eu Sao Paulo 204 April 6, 204 Roadap Tuning the kernel: MKL The ultiple kernel proble Sparse kernel achines for regression: SVR SipleMKL:

More information

UNIVERSITY OF TRENTO ON THE USE OF SVM FOR ELECTROMAGNETIC SUBSURFACE SENSING. A. Boni, M. Conci, A. Massa, and S. Piffer.

UNIVERSITY OF TRENTO ON THE USE OF SVM FOR ELECTROMAGNETIC SUBSURFACE SENSING. A. Boni, M. Conci, A. Massa, and S. Piffer. UIVRSITY OF TRTO DIPARTITO DI IGGRIA SCIZA DLL IFORAZIO 3823 Povo Trento (Italy) Via Soarive 4 http://www.disi.unitn.it O TH US OF SV FOR LCTROAGTIC SUBSURFAC SSIG A. Boni. Conci A. assa and S. Piffer

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness A Note on Scheduling Tall/Sall Multiprocessor Tasks with Unit Processing Tie to Miniize Maxiu Tardiness Philippe Baptiste and Baruch Schieber IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights,

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

Finite fields. and we ve used it in various examples and homework problems. In these notes I will introduce more finite fields

Finite fields. and we ve used it in various examples and homework problems. In these notes I will introduce more finite fields Finite fields I talked in class about the field with two eleents F 2 = {, } and we ve used it in various eaples and hoework probles. In these notes I will introduce ore finite fields F p = {,,...,p } for

More information

Introduction to Machine Learning. Recitation 11

Introduction to Machine Learning. Recitation 11 Introduction to Machine Learning Lecturer: Regev Schweiger Recitation Fall Seester Scribe: Regev Schweiger. Kernel Ridge Regression We now take on the task of kernel-izing ridge regression. Let x,...,

More information

Introduction to Kernel methods

Introduction to Kernel methods Introduction to Kernel ethods ML Workshop, ISI Kolkata Chiranjib Bhattacharyya Machine Learning lab Dept of CSA, IISc chiru@csa.iisc.ernet.in http://drona.csa.iisc.ernet.in/~chiru 19th Oct, 2012 Introduction

More information

12 Towards hydrodynamic equations J Nonlinear Dynamics II: Continuum Systems Lecture 12 Spring 2015

12 Towards hydrodynamic equations J Nonlinear Dynamics II: Continuum Systems Lecture 12 Spring 2015 18.354J Nonlinear Dynaics II: Continuu Systes Lecture 12 Spring 2015 12 Towards hydrodynaic equations The previous classes focussed on the continuu description of static (tie-independent) elastic systes.

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes Radeacher Coplexity Margin Bounds for Learning with a Large Nuber of Classes Vitaly Kuznetsov Courant Institute of Matheatical Sciences, 25 Mercer street, New York, NY, 002 Mehryar Mohri Courant Institute

More information

Lecture #8-3 Oscillations, Simple Harmonic Motion

Lecture #8-3 Oscillations, Simple Harmonic Motion Lecture #8-3 Oscillations Siple Haronic Motion So far we have considered two basic types of otion: translation and rotation. But these are not the only two types of otion we can observe in every day life.

More information

Chapter 6 1-D Continuous Groups

Chapter 6 1-D Continuous Groups Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

Basic concept of dynamics 3 (Dynamics of a rigid body)

Basic concept of dynamics 3 (Dynamics of a rigid body) Vehicle Dynaics (Lecture 3-3) Basic concept of dynaics 3 (Dynaics of a rigid body) Oct. 1, 2015 김성수 Vehicle Dynaics Model q How to describe vehicle otion? Need Reference fraes and Coordinate systes 2 Equations

More information

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution Testing approxiate norality of an estiator using the estiated MSE and bias with an application to the shape paraeter of the generalized Pareto distribution J. Martin van Zyl Abstract In this work the norality

More information

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay A Low-Coplexity Congestion Control and Scheduling Algorith for Multihop Wireless Networks with Order-Optial Per-Flow Delay Po-Kai Huang, Xiaojun Lin, and Chih-Chun Wang School of Electrical and Coputer

More information

MA304 Differential Geometry

MA304 Differential Geometry MA304 Differential Geoetry Hoework 4 solutions Spring 018 6% of the final ark 1. The paraeterised curve αt = t cosh t for t R is called the catenary. Find the curvature of αt. Solution. Fro hoework question

More information

Graphical Models in Local, Asymmetric Multi-Agent Markov Decision Processes

Graphical Models in Local, Asymmetric Multi-Agent Markov Decision Processes Graphical Models in Local, Asyetric Multi-Agent Markov Decision Processes Ditri Dolgov and Edund Durfee Departent of Electrical Engineering and Coputer Science University of Michigan Ann Arbor, MI 48109

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

Support Vector Machines, Kernel SVM

Support Vector Machines, Kernel SVM Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

Ph 20.3 Numerical Solution of Ordinary Differential Equations

Ph 20.3 Numerical Solution of Ordinary Differential Equations Ph 20.3 Nuerical Solution of Ordinary Differential Equations Due: Week 5 -v20170314- This Assignent So far, your assignents have tried to failiarize you with the hardware and software in the Physics Coputing

More information

I. Understand get a conceptual grasp of the problem

I. Understand get a conceptual grasp of the problem MASSACHUSETTS INSTITUTE OF TECHNOLOGY Departent o Physics Physics 81T Fall Ter 4 Class Proble 1: Solution Proble 1 A car is driving at a constant but unknown velocity,, on a straightaway A otorcycle is

More information

Now multiply the left-hand-side by ω and the right-hand side by dδ/dt (recall ω= dδ/dt) to get:

Now multiply the left-hand-side by ω and the right-hand side by dδ/dt (recall ω= dδ/dt) to get: Equal Area Criterion.0 Developent of equal area criterion As in previous notes, all powers are in per-unit. I want to show you the equal area criterion a little differently than the book does it. Let s

More information

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

A Smoothed Boosting Algorithm Using Probabilistic Output Codes A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu

More information

Lecture 9 November 23, 2015

Lecture 9 November 23, 2015 CSC244: Discrepancy Theory in Coputer Science Fall 25 Aleksandar Nikolov Lecture 9 Noveber 23, 25 Scribe: Nick Spooner Properties of γ 2 Recall that γ 2 (A) is defined for A R n as follows: γ 2 (A) = in{r(u)

More information

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon Lecture 2: Enseble Methods Isabelle Guyon guyoni@inf.ethz.ch Introduction Book Chapter 7 Weighted Majority Mixture of Experts/Coittee Assue K experts f, f 2, f K (base learners) x f (x) Each expert akes

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Some Perspective. Forces and Newton s Laws

Some Perspective. Forces and Newton s Laws Soe Perspective The language of Kineatics provides us with an efficient ethod for describing the otion of aterial objects, and we ll continue to ake refineents to it as we introduce additional types of

More information

The path integral approach in the frame work of causal interpretation

The path integral approach in the frame work of causal interpretation Annales de la Fondation Louis de Broglie, Volue 28 no 1, 2003 1 The path integral approach in the frae work of causal interpretation M. Abolhasani 1,2 and M. Golshani 1,2 1 Institute for Studies in Theoretical

More information

Analyzing Simulation Results

Analyzing Simulation Results Analyzing Siulation Results Dr. John Mellor-Cruey Departent of Coputer Science Rice University johnc@cs.rice.edu COMP 528 Lecture 20 31 March 2005 Topics for Today Model verification Model validation Transient

More information

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine CSCI699: Topics in Learning and Gae Theory Lecture October 23 Lecturer: Ilias Scribes: Ruixin Qiang and Alana Shine Today s topic is auction with saples. 1 Introduction to auctions Definition 1. In a single

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

Physics 202H - Introductory Quantum Physics I Homework #12 - Solutions Fall 2004 Due 5:01 PM, Monday 2004/12/13

Physics 202H - Introductory Quantum Physics I Homework #12 - Solutions Fall 2004 Due 5:01 PM, Monday 2004/12/13 Physics 0H - Introctory Quantu Physics I Hoework # - Solutions Fall 004 Due 5:0 PM, Monday 004//3 [70 points total] Journal questions. Briefly share your thoughts on the following questions: What aspects

More information

Ensemble Based on Data Envelopment Analysis

Ensemble Based on Data Envelopment Analysis Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807

More information

a a a a a a a m a b a b

a a a a a a a m a b a b Algebra / Trig Final Exa Study Guide (Fall Seester) Moncada/Dunphy Inforation About the Final Exa The final exa is cuulative, covering Appendix A (A.1-A.5) and Chapter 1. All probles will be ultiple choice

More information

Lecture Support Vector Machine (SVM) Classifiers

Lecture Support Vector Machine (SVM) Classifiers Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in

More information

Physically Based Modeling CS Notes Spring 1997 Particle Collision and Contact

Physically Based Modeling CS Notes Spring 1997 Particle Collision and Contact Physically Based Modeling CS 15-863 Notes Spring 1997 Particle Collision and Contact 1 Collisions with Springs Suppose we wanted to ipleent a particle siulator with a floor : a solid horizontal plane which

More information

Quantum Chemistry Exam 2 Take-home Solutions

Quantum Chemistry Exam 2 Take-home Solutions Cheistry 60 Fall 07 Dr Jean M Standard Nae KEY Quantu Cheistry Exa Take-hoe Solutions 5) (0 points) In this proble, the nonlinear variation ethod will be used to deterine an approxiate solution for the

More information

26 Impulse and Momentum

26 Impulse and Momentum 6 Ipulse and Moentu First, a Few More Words on Work and Energy, for Coparison Purposes Iagine a gigantic air hockey table with a whole bunch of pucks of various asses, none of which experiences any friction

More information

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material Consistent Multiclass Algoriths for Coplex Perforance Measures Suppleentary Material Notations. Let λ be the base easure over n given by the unifor rando variable (say U over n. Hence, for all easurable

More information

3.3 Variational Characterization of Singular Values

3.3 Variational Characterization of Singular Values 3.3. Variational Characterization of Singular Values 61 3.3 Variational Characterization of Singular Values Since the singular values are square roots of the eigenvalues of the Heritian atrices A A and

More information

Support Vector Machines. Machine Learning Fall 2017

Support Vector Machines. Machine Learning Fall 2017 Support Vector Machines Machine Learning Fall 2017 1 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost 2 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost Produce

More information

Dimensions and Units

Dimensions and Units Civil Engineering Hydraulics Mechanics of Fluids and Modeling Diensions and Units You already know how iportant using the correct diensions can be in the analysis of a proble in fluid echanics If you don

More information

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Supplement to: Subsampling Methods for Persistent Homology

Supplement to: Subsampling Methods for Persistent Homology Suppleent to: Subsapling Methods for Persistent Hoology A. Technical results In this section, we present soe technical results that will be used to prove the ain theores. First, we expand the notation

More information

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data Suppleentary to Learning Discriinative Bayesian Networks fro High-diensional Continuous Neuroiaging Data Luping Zhou, Lei Wang, Lingqiao Liu, Philip Ogunbona, and Dinggang Shen Proposition. Given a sparse

More information

SUPPORT VECTOR MACHINE

SUPPORT VECTOR MACHINE SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition

More information