1 Bounding the Margin
|
|
- Louisa Robbins
- 5 years ago
- Views:
Transcription
1 COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost fro the last lecture. We have shown previously, with probability at least 1 δ over the choice of S, for all f F, ( ) E D [f] ÊS[f] + 2 ˆR ln(1/δ) S (F) + O (1.1) where ÊS[f] represents the saple average. Additionally, we also want to show that if we can get a large argin fro AdaBoost, this will translate to a low generalization error. We want to use (1.1) to prove a bound on the generalization error in ters of the argins. First of all, H will denote a weak hypothesis space with VC-diension of d, and recall the convex hull of H which we defined to be: { } T T co(h) = f : f(x) = a t h t (x) T 1, a t 0, a t = 1, h 1,..., h T H t=1 t=1 (1.2) Theore 1. With training set of size, for all f co (H), and for all argin, Θ > 0, ( ) P r D [yf(x) 0] P ˆ r S [yf(x) Θ] + Õ d/θ 2 + ln 1 /δ with probability at least 1 δ. Proof. As shown in the last lecture, we state soe of the relationships known: ( ) Rˆ S (H) Õ d/ (1.3) M = {(x, y) yf(x) : f co(h)} (1.4) ˆR S (M) = ˆR S (co(h)) = ˆR S (H) (Radeacher coplexity) (1.5) We also need to note, fro last lecture, that: P r D [yf(x) 0] = E D [1{yf(x) 0}] (1.6) ˆ P r S [yf(x) Θ] = ÊS[1{yf(x) Θ}] (1.7)
2 Fro Figure 1.1, we can arrive at: where φ(u) is the Lipschitz function shown in the figure 1{u 0} φ(u) 1{u Θ} (1.8) Figure 1.1: Visualization of the indicator functions and the Lipschitz function, φ(u), where Lipschitz constant, L φ = 1 Θ (i.e. slope of line). Fro (1.6), Fro (1.7), P r D [yf(x) 0] = E D [1{yf(x) 0}] E D [φ(yf(x)] (1.9) P r D [yf(x) Θ] = ÊS[1{yf(x) Θ}] ÊS[φ(yf(x))] (1.10) By applying (1.1), for all f co(h), ( ) E D [φ(yf(x))] ÊS[φ(yf(x))] + 2 ˆR ln 1/δ S (φ M) + O (1.11) Now, the only thing is to copute 2 ˆR S (φ M). Recall fro previous lecture, the Talagrand s Lea, which states that for any hypothesis set H of real-valued functions, the following inequality holds if φ : R R is a Lipschitz function 1 : Fro (1.12), ˆR S (φ F) L φ ˆRS (F) (1.12) 1 φ is Lipschitz continuous eans that u, v, φ(u) φ(v) L φ u v, where L φ is soe Lipschitz constant 2
3 ˆR S (φ M) L φ ˆRS (M) = 1 /Θ R(H) ( ) 1 /Θ Õ d/ (1.13) Fro (1.9), P r D [yf(x) 0] = E D [1{yf(x) 0}] E D [φ(yf(x))] ( ) ÊS[φ(yf(x))] + Õ d/θ 2 + ln 1 /δ (using 1.13) ( ) P ˆ r S [yf(x) Θ] + Õ d/θ 2 + ln 1 /δ (using 1.10) 2 Support Vector Machines (SVM) We start off with labeled exaples (x 1, y 1 ),..., (x, y ), where x i R n and y i { 1, +1}, and our goal is to find a hypothesis that is consistent with all the exaples. We are assuing for now that the data is linearly separable and x i 2 1. Figure 2.1: Labeled exaples in a plane in which a separating hyperplane (i.e. shown to be passing through the origin. line) is Since the hypothesis ust be one that separates between the postive and negative labels as shown in Figure 2.1, there are in fact a lot of choices to choose fro in this case. Intuitively, the natural idea is to choose the hyperplane that axially separates between the positive and negative labels (i.e. to axiize δ). This is because we do not want the jiggering of a point to change the classification. 3
4 Assuing that the hyperplane passes through the origin and fro Figure 2.1, v is defined to be the noral vector to the hyperplane such that v 2 = 1 (i.e. L2-nor). Also the distance of a point x to the hyperplane is v x: Hence, the proble forulation becoes: > 0 if x above hyperplane v x = 0 if on hyperplane < 0 if x below hyperplane find v, δ ax δ s.t. v 2 = 1 { v x i δ if y i = +1 i v x i δ if y i = 1 (2.1) in which (2.1) is atheatically equivalent to y i (v x i ) δ (2.2) Another way to view this proble is to think in ters of increasing our confidence of the argin. Since we do not have control of the nuber of saples but only the choices of the hyperplanes, we could control the coplexities of the hyperplanes. Recall that the VC-diension of the linear threshold functions in R n equals to n, and not n + 1 because they go through the origin. Furtherore, the VC-diension of the linear threshold functions with argin δ is saller than 1 /δ 2, which eans that it is independent of n (to be proved as a corollary later). Since the VC-diension of linear threshold functions with argin δ is only dependent on δ 2, this eans that the bigger the argin δ, the less coplex the hyperplane gets. Hence, going back to the proble forulation in (2.1), we divide the ters in (2.2) by δ: If we let w = v δ, y i ( v δ x i) δ δ = 1 w = v = 1 δ δ We can then rewrite the proble forulation into a convex progra: in 1 2 w 2 2 (2.3) s.t. y i (w x i ) 1 (2.4) Since for all convex progras, there is an equivalent forulation, we can transfor it to its dual axiization for using Lagrange ultipliers α i (to be shown in later lectures!): 4
5 ax α i 1 2 i=1 s.t. i : α i 0 α i α j y i y j x i x j i=1 j=1 It turns out that: w = α i y i x i i=1 and a lot of α i will be zeros. In fact, α i will be zero only if the corresponding exaple (x i, y i ) is a support vector eaning that y i (w v) = 1. Since w does not depend on the points that are not support vectors but only on a sall subset of exaples, we can also see, fro proble 1 of hoework 4, that err(h) Õ ( k+ln 1/δ ), where k is the nuber of support vectors, δ is the argin, and error only depends on the k, and not the nuber of diensions. 2.1 Linearly Inseparable Data So far, we have been talking about linearly separable data such as the one shown in Figure 2.1, where there exists soe hyperplane that separates the positive and negative exaples. What happens when the data is linearly inseparable? Consider the following case: Figure 2.2: Linearly inseperable data on a plane with hypothesis as a line passing through the origin, where ξ 1 and ξ 2 represents the distances oved. In this siple case, to ake the data linearly separable, we can ove the data, through the distance ξ 1 and ξ 2 to the correct side of the hyperplane. However, there is a penalty to be paid for the distances oved and we will need to incorporate it to the proble forulation, (2.4): 5
6 in 1 2 w 2 +C ξ 2 i,where Cis a paraeter user has to choose i=1 s.t. y i (w v) 1 ξ i ξ i 0 The dual axiization for then becoes: ax α i 1 2 i=1 s.t. i : 0 α i C i=1 j=1 α i α j y i y j x i x j However, what if it is not even close to being linearly separable? Consider the following case: Figure 2.3: Linearly inseperable data on a plane. Support Vector Machine can still be used in this case. To do this, we ap the data to higher diensional space (i.e. fro 2 diensional space to 6 diensional space): (x 1, x 2 ) = x ψ(x) = (1, x 1, x 2, x 2 1, x 2 2, x 1 x 2 ) (2.5) However, how is this linearly separable now? Since the hyperplane equation is given by w x = 0, the hyperplane equation for the 6 diensional space becoes: a 1 + bx 1 + cx 2 + dx ex fx 1 x 2 = 0 (2.6) which turns out to be the equation of the conic section in 2 diensional space. Graphically, the hyperplane in this exaple is: 6
7 Figure 2.4: Linearly inseperable data on a plane with hypothesis as a circle in this case. The above approach could be generalized to n diensions: x R n ψ(x) However, the proble with this is that we have both coputational and statistical issues. Coputational Proble: We start with exaples in n diensions and add all ters up to degree d, then we are effectively apping to O(n d ) diensions. So, we ight start in 100 diensions, and end up with a trillion diensions or ore. So, even reading such a vector just once will take a very long tie. Statistical Proble: There ay be overfitting due to the addition of so any paraeters. For the statistical proble, Support Vector Machine is able to overcoe it since VCdiension = 1 /δ 2, and hence err(h) does not depend on the nuber of diensions. Fro the dual axiization forulation, we can see that there is only a need to copute the inner product of pairs of instances because w = i=1 α iy i x i and H(x) = sign(w x) = sign ( i=1 α iy i (x i x i )). Hence, if we are able to copute the inner product efficiently, then we can overcoe the coputational proble as well. We first consider adding constants to (2.5), which has no effect on the hyperplanes since the constants are arbitrary. Siilarly, (x 1, x 2 ) = x ψ(x) = (1, 2x 1, 2x 2, x 2 1, x 2 2, 2x 1 x 2 ) (2.7) (u 1, u 2 ) = u ψ(u) = (1, 2u 1, 2u 2, u 2 1, u 2 2, 2u 1 u 2 ) (2.8) For raising 2 diensional space to 6 diensional space, we just have to perfor the inner product for ψ(x) and ψ(u): 7
8 ψ(x) ψ(u) = 1 + 2x 1 u 1 + 2x 2 u 2 + x 2 1u x 2 2u x 1 x 2 u 1 u 2 = (1 + x 1 u 1 + x 2 u 2 ) 2 = (1 + x u) 2 Hence, adding all ters up to degree d, the coputation would be odified to becoe: (1 + x u) d. This elegant result eans that we can iplicitly copute inner product in the high diensional space by perforing all operations in the original low diensional space and then siply raising to the power of d. This ethod is called the kernel trick. In general, a kernel is a function K(x, u) that iplicitly coputes the inner product ψ(x) ψ(u) for soe apping ψ. An exaple of this is the Radial Basis Kernel, K(x, u) = exp[ ( x u 2 2 )]. To apply the kernel trick ethod, we just have to replace x i x j by K(x i, x j ) in the dual axiization forulation. It is also iportant to note the following: Recall that when we used the assuption, x i 2 1, VC-diension 1 /δ 2. However, if x i 2 R, VC-diension R2 /δ 2. This eans that there is a tradeoff between radius, R and δ, since the apping ψ ay increase δ, but ight also increase R. 8
1 Rademacher Complexity Bounds
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability
More informationSupport Vector Machines. Maximizing the Margin
Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the
More informationE0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis
E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds
More information1 Generalization bounds based on Rademacher complexity
COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges
More informationKernel Methods and Support Vector Machines
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic
More informationSupport Vector Machines MIT Course Notes Cynthia Rudin
Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance
More information1 Proof of learning bounds
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a
More informationSupport Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab
Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a
More informationIntelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes
More informationComputational and Statistical Learning Theory
Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher
More informationFoundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)
More informationLecture 21. Interior Point Methods Setup and Algorithm
Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and
More informationEstimating Parameters for a Gaussian pdf
Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear
More informationComputational and Statistical Learning Theory
Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic
More informationCS Lecture 13. More Maximum Likelihood
CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood
More informationSupport Vector Machines. Goals for the lecture
Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,
More informationVC Dimension and Sauer s Lemma
CMSC 35900 (Spring 2008) Learning Theory Lecture: VC Diension and Sauer s Lea Instructors: Sha Kakade and Abuj Tewari Radeacher Averages and Growth Function Theore Let F be a class of ±-valued functions
More informationA Simple Regression Problem
A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where
More information2 Upper-bound of Generalization Error of AdaBoost
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Haipeng Zheng March 5, 2008 1 Review of AdaBoost Algorithm Here is the AdaBoost Algorithm: input: (x 1,y 1 ),...,(x m,y
More informationCombining Classifiers
Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October
More informationUsing EM To Estimate A Probablity Density With A Mixture Of Gaussians
Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points
More informationBayes Decision Rule and Naïve Bayes Classifier
Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.
More informationGeometrical intuition behind the dual problem
Based on: Geoetrical intuition behind the dual proble KP Bennett, EJ Bredensteiner, Duality and Geoetry in SVM Classifiers, Proceedings of the International Conference on Machine Learning, 2000 1 Geoetrical
More informationProbability Distributions
Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples
More informationUnderstanding Machine Learning Solution Manual
Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y
More informationSupport Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization
Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering
More informationE0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)
E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how
More informationIntroduction to Discrete Optimization
Prof. Friedrich Eisenbrand Martin Nieeier Due Date: March 9 9 Discussions: March 9 Introduction to Discrete Optiization Spring 9 s Exercise Consider a school district with I neighborhoods J schools and
More informationThe Methods of Solution for Constrained Nonlinear Programming
Research Inventy: International Journal Of Engineering And Science Vol.4, Issue 3(March 2014), PP 01-06 Issn (e): 2278-4721, Issn (p):2319-6483, www.researchinventy.co The Methods of Solution for Constrained
More informationPAC-Bayes Analysis Of Maximum Entropy Learning
PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E
More informationSymmetrization and Rademacher Averages
Stat 928: Statistical Learning Theory Lecture: Syetrization and Radeacher Averages Instructor: Sha Kakade Radeacher Averages Recall that we are interested in bounding the difference between epirical and
More informationLean Walsh Transform
Lean Walsh Transfor Edo Liberty 5th March 007 inforal intro We show an orthogonal atrix A of size d log 4 3 d (α = log 4 3) which is applicable in tie O(d). By applying a rando sign change atrix S to the
More informationSoft-margin SVM can address linearly separable problems with outliers
Non-linear Support Vector Machines Non-linearly separable probles Hard-argin SVM can address linearly separable probles Soft-argin SVM can address linearly separable probles with outliers Non-linearly
More information1 Proving the Fundamental Theorem of Statistical Learning
THEORETICAL MACHINE LEARNING COS 5 LECTURE #7 APRIL 5, 6 LECTURER: ELAD HAZAN NAME: FERMI MA ANDDANIEL SUO oving te Fundaental Teore of Statistical Learning In tis section, we prove te following: Teore.
More informationPrincipal Components Analysis
Principal Coponents Analysis Cheng Li, Bingyu Wang Noveber 3, 204 What s PCA Principal coponent analysis (PCA) is a statistical procedure that uses an orthogonal transforation to convert a set of observations
More informationChaotic Coupled Map Lattices
Chaotic Coupled Map Lattices Author: Dustin Keys Advisors: Dr. Robert Indik, Dr. Kevin Lin 1 Introduction When a syste of chaotic aps is coupled in a way that allows the to share inforation about each
More information3.8 Three Types of Convergence
3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to
More informationRobustness and Regularization of Support Vector Machines
Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA
More informationFeature Extraction Techniques
Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that
More informationResearch Article Robust ε-support Vector Regression
Matheatical Probles in Engineering, Article ID 373571, 5 pages http://dx.doi.org/10.1155/2014/373571 Research Article Robust ε-support Vector Regression Yuan Lv and Zhong Gan School of Mechanical Engineering,
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 12 Scribe: Indraneel Mukherjee March 12, 2008
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture # 12 Scribe: Indraneel Mukherjee March 12, 2008 In the previous lecture, e ere introduced to the SVM algorithm and its basic motivation
More informationBoosting with log-loss
Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall
More informationLinear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationDERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS
DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS N. van Erp and P. van Gelder Structural Hydraulic and Probabilistic Design, TU Delft Delft, The Netherlands Abstract. In probles of odel coparison
More informationLecture 9: Multi Kernel SVM
Lecture 9: Multi Kernel SVM Stéphane Canu stephane.canu@litislab.eu Sao Paulo 204 April 6, 204 Roadap Tuning the kernel: MKL The ultiple kernel proble Sparse kernel achines for regression: SVR SipleMKL:
More informationUNIVERSITY OF TRENTO ON THE USE OF SVM FOR ELECTROMAGNETIC SUBSURFACE SENSING. A. Boni, M. Conci, A. Massa, and S. Piffer.
UIVRSITY OF TRTO DIPARTITO DI IGGRIA SCIZA DLL IFORAZIO 3823 Povo Trento (Italy) Via Soarive 4 http://www.disi.unitn.it O TH US OF SV FOR LCTROAGTIC SUBSURFAC SSIG A. Boni. Conci A. assa and S. Piffer
More informationBlock designs and statistics
Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent
More informationA Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness
A Note on Scheduling Tall/Sall Multiprocessor Tasks with Unit Processing Tie to Miniize Maxiu Tardiness Philippe Baptiste and Baruch Schieber IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights,
More informationPolygonal Designs: Existence and Construction
Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G
More informationFinite fields. and we ve used it in various examples and homework problems. In these notes I will introduce more finite fields
Finite fields I talked in class about the field with two eleents F 2 = {, } and we ve used it in various eaples and hoework probles. In these notes I will introduce ore finite fields F p = {,,...,p } for
More informationIntroduction to Machine Learning. Recitation 11
Introduction to Machine Learning Lecturer: Regev Schweiger Recitation Fall Seester Scribe: Regev Schweiger. Kernel Ridge Regression We now take on the task of kernel-izing ridge regression. Let x,...,
More informationIntroduction to Kernel methods
Introduction to Kernel ethods ML Workshop, ISI Kolkata Chiranjib Bhattacharyya Machine Learning lab Dept of CSA, IISc chiru@csa.iisc.ernet.in http://drona.csa.iisc.ernet.in/~chiru 19th Oct, 2012 Introduction
More information12 Towards hydrodynamic equations J Nonlinear Dynamics II: Continuum Systems Lecture 12 Spring 2015
18.354J Nonlinear Dynaics II: Continuu Systes Lecture 12 Spring 2015 12 Towards hydrodynaic equations The previous classes focussed on the continuu description of static (tie-independent) elastic systes.
More informationMachine Learning Basics: Estimators, Bias and Variance
Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics
More informationRademacher Complexity Margin Bounds for Learning with a Large Number of Classes
Radeacher Coplexity Margin Bounds for Learning with a Large Nuber of Classes Vitaly Kuznetsov Courant Institute of Matheatical Sciences, 25 Mercer street, New York, NY, 002 Mehryar Mohri Courant Institute
More informationLecture #8-3 Oscillations, Simple Harmonic Motion
Lecture #8-3 Oscillations Siple Haronic Motion So far we have considered two basic types of otion: translation and rotation. But these are not the only two types of otion we can observe in every day life.
More informationChapter 6 1-D Continuous Groups
Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:
More informationPattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition
More informationPattern Recognition and Machine Learning. Artificial Neural networks
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial
More informationBasic concept of dynamics 3 (Dynamics of a rigid body)
Vehicle Dynaics (Lecture 3-3) Basic concept of dynaics 3 (Dynaics of a rigid body) Oct. 1, 2015 김성수 Vehicle Dynaics Model q How to describe vehicle otion? Need Reference fraes and Coordinate systes 2 Equations
More informationKeywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution
Testing approxiate norality of an estiator using the estiated MSE and bias with an application to the shape paraeter of the generalized Pareto distribution J. Martin van Zyl Abstract In this work the norality
More informationA Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay
A Low-Coplexity Congestion Control and Scheduling Algorith for Multihop Wireless Networks with Order-Optial Per-Flow Delay Po-Kai Huang, Xiaojun Lin, and Chih-Chun Wang School of Electrical and Coputer
More informationMA304 Differential Geometry
MA304 Differential Geoetry Hoework 4 solutions Spring 018 6% of the final ark 1. The paraeterised curve αt = t cosh t for t R is called the catenary. Find the curvature of αt. Solution. Fro hoework question
More informationGraphical Models in Local, Asymmetric Multi-Agent Markov Decision Processes
Graphical Models in Local, Asyetric Multi-Agent Markov Decision Processes Ditri Dolgov and Edund Durfee Departent of Electrical Engineering and Coputer Science University of Michigan Ann Arbor, MI 48109
More informationStochastic Subgradient Methods
Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods
More informationSupport Vector Machines, Kernel SVM
Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM
More informationModel Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon
Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential
More informationSharp Time Data Tradeoffs for Linear Inverse Problems
Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used
More informationPh 20.3 Numerical Solution of Ordinary Differential Equations
Ph 20.3 Nuerical Solution of Ordinary Differential Equations Due: Week 5 -v20170314- This Assignent So far, your assignents have tried to failiarize you with the hardware and software in the Physics Coputing
More informationI. Understand get a conceptual grasp of the problem
MASSACHUSETTS INSTITUTE OF TECHNOLOGY Departent o Physics Physics 81T Fall Ter 4 Class Proble 1: Solution Proble 1 A car is driving at a constant but unknown velocity,, on a straightaway A otorcycle is
More informationNow multiply the left-hand-side by ω and the right-hand side by dδ/dt (recall ω= dδ/dt) to get:
Equal Area Criterion.0 Developent of equal area criterion As in previous notes, all powers are in per-unit. I want to show you the equal area criterion a little differently than the book does it. Let s
More informationA Smoothed Boosting Algorithm Using Probabilistic Output Codes
A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu
More informationLecture 9 November 23, 2015
CSC244: Discrepancy Theory in Coputer Science Fall 25 Aleksandar Nikolov Lecture 9 Noveber 23, 25 Scribe: Nick Spooner Properties of γ 2 Recall that γ 2 (A) is defined for A R n as follows: γ 2 (A) = in{r(u)
More informationLecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon
Lecture 2: Enseble Methods Isabelle Guyon guyoni@inf.ethz.ch Introduction Book Chapter 7 Weighted Majority Mixture of Experts/Coittee Assue K experts f, f 2, f K (base learners) x f (x) Each expert akes
More informationPattern Recognition and Machine Learning. Artificial Neural networks
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial
More informationSome Perspective. Forces and Newton s Laws
Soe Perspective The language of Kineatics provides us with an efficient ethod for describing the otion of aterial objects, and we ll continue to ake refineents to it as we introduce additional types of
More informationThe path integral approach in the frame work of causal interpretation
Annales de la Fondation Louis de Broglie, Volue 28 no 1, 2003 1 The path integral approach in the frae work of causal interpretation M. Abolhasani 1,2 and M. Golshani 1,2 1 Institute for Studies in Theoretical
More informationAnalyzing Simulation Results
Analyzing Siulation Results Dr. John Mellor-Cruey Departent of Coputer Science Rice University johnc@cs.rice.edu COMP 528 Lecture 20 31 March 2005 Topics for Today Model verification Model validation Transient
More informationLecture October 23. Scribes: Ruixin Qiang and Alana Shine
CSCI699: Topics in Learning and Gae Theory Lecture October 23 Lecturer: Ilias Scribes: Ruixin Qiang and Alana Shine Today s topic is auction with saples. 1 Introduction to auctions Definition 1. In a single
More informationA Theoretical Analysis of a Warm Start Technique
A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful
More informationPhysics 202H - Introductory Quantum Physics I Homework #12 - Solutions Fall 2004 Due 5:01 PM, Monday 2004/12/13
Physics 0H - Introctory Quantu Physics I Hoework # - Solutions Fall 004 Due 5:0 PM, Monday 004//3 [70 points total] Journal questions. Briefly share your thoughts on the following questions: What aspects
More informationEnsemble Based on Data Envelopment Analysis
Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807
More informationa a a a a a a m a b a b
Algebra / Trig Final Exa Study Guide (Fall Seester) Moncada/Dunphy Inforation About the Final Exa The final exa is cuulative, covering Appendix A (A.1-A.5) and Chapter 1. All probles will be ultiple choice
More informationLecture Support Vector Machine (SVM) Classifiers
Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in
More informationPhysically Based Modeling CS Notes Spring 1997 Particle Collision and Contact
Physically Based Modeling CS 15-863 Notes Spring 1997 Particle Collision and Contact 1 Collisions with Springs Suppose we wanted to ipleent a particle siulator with a floor : a solid horizontal plane which
More informationQuantum Chemistry Exam 2 Take-home Solutions
Cheistry 60 Fall 07 Dr Jean M Standard Nae KEY Quantu Cheistry Exa Take-hoe Solutions 5) (0 points) In this proble, the nonlinear variation ethod will be used to deterine an approxiate solution for the
More information26 Impulse and Momentum
6 Ipulse and Moentu First, a Few More Words on Work and Energy, for Coparison Purposes Iagine a gigantic air hockey table with a whole bunch of pucks of various asses, none of which experiences any friction
More informationConsistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material
Consistent Multiclass Algoriths for Coplex Perforance Measures Suppleentary Material Notations. Let λ be the base easure over n given by the unifor rando variable (say U over n. Hence, for all easurable
More information3.3 Variational Characterization of Singular Values
3.3. Variational Characterization of Singular Values 61 3.3 Variational Characterization of Singular Values Since the singular values are square roots of the eigenvalues of the Heritian atrices A A and
More informationSupport Vector Machines. Machine Learning Fall 2017
Support Vector Machines Machine Learning Fall 2017 1 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost 2 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost Produce
More informationDimensions and Units
Civil Engineering Hydraulics Mechanics of Fluids and Modeling Diensions and Units You already know how iportant using the correct diensions can be in the analysis of a proble in fluid echanics If you don
More informationLinear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationNon-Parametric Non-Line-of-Sight Identification 1
Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,
More informationSupplement to: Subsampling Methods for Persistent Homology
Suppleent to: Subsapling Methods for Persistent Hoology A. Technical results In this section, we present soe technical results that will be used to prove the ain theores. First, we expand the notation
More informationSupplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data
Suppleentary to Learning Discriinative Bayesian Networks fro High-diensional Continuous Neuroiaging Data Luping Zhou, Lei Wang, Lingqiao Liu, Philip Ogunbona, and Dinggang Shen Proposition. Given a sparse
More informationSUPPORT VECTOR MACHINE
SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition
More information