Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Size: px
Start display at page:

Download "Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines"

Transcription

1 Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes and Hyper-planes... 3 Perceptrons...4 History... 4 Theory... 4 Perceptron argin... 6 Full perceptron algorith (Optional)... 6 Support Vector achines...7 Hard-argin SVs - a siple linear classifier Finding the Support Vectors... 8 Soft argin SV SV with Kernels...11 Radial Basis Function Kernels: Kernel Functions for Sybolic Data... 13

2 Notation x d A feature. An observed or easured value. X A vector of D features. D The nuber of diensions for the vector X { x } Training saples for learning. {y } The indicator variable for each training saple, y = +1 for exaples of the target pattern (class 1) y = 1 for all other exaples (class 2) w = b w 1 w 2 w D & The nuber of training saples. The coefficients for a linear odel the odel. bias so that w T x 0 for P and w T x + w 0 < 0 for N. IF w T x 0 THEN P ELSE N IF y w T x ( ) 0 THEN T ELSE F { ( w T x ) } argin for the classifier = in y 6-2

3 Linear odels Lines, Planes and Hyper-planes ( a quick review of basic geoetry) In a 2-D space, a line is a set of points that obeys the relation: w 1 x 1 + w 2 x 2 = 0 ( ) x 1 In vector notation: w 1 w 2 x 2 = 0 which is written : & w T x = 0 If w =1 then b is the perpendicular distance to the origin. For points that are not on the hyperplane, the linear equation gives the signed perpendicular distance to the hyperplane. d = w 1 x 1 + w 2 x 2 For points on one side of the line, d > 0. On the other side d < 0. On the line d=0. if w T x = 0 is a boundary then the sign of d can be used as a decision function. Lines, planes and hyperplanes can act as boundaries for regions of a feature space. This is called a first order hoogeneous equation because the all ters are first order. This is easily generalized to any nuber of diensions (features). In a 3D space, a plane is represented by w 1 x 1 + w 2 x 2 + w 3 x 3 = 0 In a D diensional space, linear hoogeneous equation is called a hyper-plane. The equation w T x = 0 is a hyper-plane in a D-diensional space, w 1 w 2 w = is the noral to the hyper-plane and b is a constant ter. w D & The signed distance is : d = w T X = w 1 x 1 + w 2 x w D x D indicates which side of the boundary a point lies. Note that it is soeties convenient to include the bias in the vector: x 1 ( w 1 w 2 b) x 2 = 0 1 & 6-3

4 Perceptrons History During the 1950 s, Frank Rosenblatt developed the idea to provide a trainable linar classifier for pattern recognition, called a Perceptron. The first Perceptron was a roo-sized analog coputer that ipleented Rosenblatz s learning algorith. Both the learning algorith and the resulting recognition algorith are easily ipleented as coputer progras. In 1969, arvin insky and Seyour Patert of IT published a book entitled Perceptrons, that claied to docuent the fundaental liits of the perceptron approach. For exaple, they showed that a linear classifier could not be constructed to perfor an exclusive OR. While this is true for a one-layer perceptron, it is not true for ulti-layer perceptrons. Theory The perceptron is an on-line learning ethod in which a linear classifier is iproved by its own errors. A perceptron learns a linear decision boundary (hyper-plane) that separates training saples. When the training data can be separated by a linear decision boundary, the data is said to be separable. The perceptron algorith uses errors in classifying the training data to update the decision boundary plane until there are no ore errors. The learning can be increental. New training saples can be used to update the perceptron. If the training data is non-separable, the ethod ay not converge, and ust be stopped after a certain nuber of iterations. Assue a training set of observations { x } {y } where x 1 x 2 x = and y = { 1, +1} & x D The indicator variable, {y }, for each saple, y = +1 for exaples of the target class (P) y = 1 for all others classes (N) 6-4

5 The Perceptron will learn the coefficients for a linear boundary w = w 1 w 2 w D & and b Such that for all training data, x, w T x 0 for P and w T x < 0 for N. Note that w T X 0 is the sae as w T X b. Thus b can be considered as a threshold on the product : w T X The decision function is the d(z) function: d(z) = P if z 0 N if z < 0 The algorith requires a learning rate, α. Note that a training saple is correctly classified if: y w T x ( ) 0 The perceptron algorith uses isclassified training data as a correction. The algorith will continue to loop thugh the training data until it akes an entire pass without a single iss-classified training saple. If the training data are not separable then it will continue to loop forever. Update rule: FOR = 1 TO DO IF y w T x w w + y ( ) < 0 THEN x b b + y R where R is a scale factor for the data R = ax x If the data is not separable, then the Perceptron will not converge, and continue to loop. Thus it is necessary to have a liit the nuber of iterations. The final classifier is: if w T x 0 then P else N. 6-5

6 Perceptron argin For a 2-Class classifier, the argin, γ, is the sallest separation between the two classes. The argin, γ, for each saple,, is = y w T x ( ) The argin for a classifier and a set of training data is the iniu argin of the training data = in y w T x { ( )} The quality of a perceptron is given by the histogra of the argins, h(γ) for the training data. Full perceptron algorith (Optional) For reference, here is the full perceptron algorith. Algorith: w (0) 0; b (0) 0, i 0; R = ax x WHILE update DO update FALSE; FOR = 1 TO DO IF y w (i)t x (i) ( ) < 0 THEN update TRUE w (i+1) w (i) + y b (i+1) b (i) + y R i i + 1 END IF END FOR END WHILE. x 6-6

7 Support Vector achines Support Vector achines (SV), also known as axiu argin classifiers are popular for probles of classification, regression and novelty detection. The solution of the odel paraeters corresponds to a convex optiization proble. SVs use a inial subset of the training data (the support vectors) to define the best decision surface between two classes. We will use the two class proble, K=2, to illustrate the principle. ulti-class solutions are possible. The siplest case, the hard argin SV, require that the training data be copletely separated by at least one hyper-plane. This is generally achieved by using a Kernel to ap the features into a high diensional space were the two classes are separable. To illustrate the principle, we will first exaine a siple linear SV where the data are separable. We will then generalize with Kernels and with soft argins. We will assue that the training data is a set of training saples indicator variable, { y }, where, y is -1 or +1. { x } with an Hard-argin SVs - a siple linear classifier. The siplest case is a linear classifier trained fro separable data. g( x ) = w T x with the decision rule is: IF g( x ) 0 THEN P else N For a hard argin SV we assue that the two classes are separable for all of the training data: : y ( w T x ) 0 We will use a subset S of the training saples, { x s } x saples to define the best decision surface g( x ) = w T x. { } coposed of s training The iniu nuber of support vectors depends on the nuber of features, D: s = D+1 The s selected training saples { x s } are called the support vectors. For exaple, in a 2D feature space we need only 3 training saples to serve as support vectors. 6-7

8 To define the classifier we will look for the subset if S training saples that axiizes the argin ( ) between the two classes. argin: = in y w T x { ( )} Thus we seek a subset of s =D+1 training saples, { x s } { x } to define a decision surface and the argin. We will use these saples as support vectors. Our algorith ust choose D+1 training saples { x s } fro the Training saples in { x } such that the argin is a large as possible. This is equivalent to a search for a pair of parallel surfaces a distance γ fro the decision surface. Finding the Support Vectors For any hyperplane with noralized the coefficients: w =1 the distance of any saple point x fro the hyper-plane, w is d = y w T x ( ) The argin is the iniu distance = in{y ( w T x )} For the D+1 support vectors x { x s } d = γ For all other training saples x { x s }: d γ The scale of the argin is deterined by w. To find the support vectors, we can arbitrarily define the argin as γ = 1 and then renoralize w =1 once the support vectors have been discovered. With γ = 1, we will look for two separate hyper-planes that bound the decision surface, such that for points on these surfaces: 6-8

9 w T x =1 and w T x = 1 The distance between these two planes is 2 w We will add the constraint that for all training saples that are support vectors w T x 1 while for all other saples: w T x < 1 This can be written as: y ( w T x ) 1 This is known as the Canonical Representation for the decision hyperplane. This gives we have an optiization algorith that iniizes w subject to y ( w T x ) 1. For points on the argin: y ( w T x ) =1 for all other points y ( w T x ) >1 The training saple where y ( w T x ) =1 are said to be active. All other training saples are inactive. 1 Thus the optiization proble is to axiize arg in w 2 & while iniizing the w,b 2 nuber of active points, S. (The factor of ½ is a convenience for analysis.) This can be solved as a quadratic optiization proble using Lagrange ultipliers, with one ultiplier, a 0, for each constraint. The Lagrangian function is: L( w,b, a ) = 1 2 Setting the derivatives to zero, we obtain: L w = 0 w = a y x =1 w 2 a y ( w T x ) 1 =1 { } L b = 0 b = 1 w T x y s x S 6-9

10 Soft argin SV We can extend the SV to the case where the training data are not separable by introducing the hinge loss function. { } hinge loss: L = ax 0,1 y ( w T x ) This loss is 0 is the data is separable, and non-zero if the data is non-separable. Using the hinge loss, we can iniize 1 ax 0,1 y ( & { w T x ) }) + * ( =1 w 2 To set this up as an optiization proble, we note that L = ax 0,1 y ( w T x ) { } is the sallest non-negative that satisfies y ( w T x ) 1 L we rewrite the optiization as iniize 1 L + w 2 subject to y ( w T x ) 1 L and L 0 for all. =1 This was classically solved as a quadratic optiization proble with Lagrange ultipliers. axiize subject to a 1 =1 2 i=1 j=1 y i a i ( x T i x j ) y j a j a y = 0 and 0 a 1 for all 2 =1 The resulting decision surface is : w = a y =1 and the bias is found by solving for: b = w T x y for any support vector: x { x s } odern approaches use Gradient Descent to solve for w and b, offering iportant advantages for large data sets. x 6-10

11 SV with Kernels Support Vector achines can be generalized for use with non-linear decision surfaces using kernel functions. This provides a very general and efficient classifier. A kernel function, f(z), transfors the data into a space where the two classes are separate. Instead of a decision surface: g( x ) = w T x We use a decision surface g( x ) = w T f ( x ) the coefficients w and the bias b are provided by the D+1 support vectors. For this we need to include the where w = a y f ( x ) =1 is learned fro the transfored training data. As before, S = x s { } are the support vectors, and a is a variable learned fro the training data that is a 0 for support vectors and 0 for all others. 6-11

12 Radial Basis Function Kernels: Radial Basis functions are a popular kernel function: A radial basis function (RBF) is a real-valued function whose value depends only on the distance fro the origin. For Exaple, the Gaussian function f (z) = e is a popular Radial Basis Function, and is often used as a kernel for support vector achines, with radial basis functions: z = X x s for each support vectors. We can use a su of s radial basis functions to define a discriinant function, where the support vectors are drawn fro the training saples. This gives a discriinant function g( X ) = a y f ( X x ), =1 The training saples X for which a 0are the support vectors. Depending on σ, this can provide a good fit or an over fit to the data. If σ is large copared to the distance between the classes, this can give an overly flat discriinant surface. If σ is sall copared to the distance between classes, this will over-fit the saples. A good choice for σ will be coparable to the distance between the closest ebers of the two classes. z (iages fro A Coputational Biology Exaple using Support Vector achines, Suzy Fei, 2009) Each Radial Basis Function is a diension in a high diensional basis space. 6-12

13 Kernel Functions for Sybolic Data Kernel functions can be defined over graphs, sets, strings and text Consider for exaple, a non-vector space coposed of a set of words {W}. We can select a subset of discriinant words {S} {W} Now given a set of words (a probe), {A} {W} We can define a kernel function of A and S using the intersection operation. k(a,s) = 2 AS where. denotes the cardinality (the nuber of eleents) of a set. 6-13

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

Support Vector Machines. Maximizing the Margin

Support Vector Machines. Maximizing the Margin Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the

More information

Support Vector Machines MIT Course Notes Cynthia Rudin

Support Vector Machines MIT Course Notes Cynthia Rudin Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance

More information

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a

More information

Support Vector Machines. Goals for the lecture

Support Vector Machines. Goals for the lecture Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed

More information

Bayes Decision Rule and Naïve Bayes Classifier

Bayes Decision Rule and Naïve Bayes Classifier Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Soft-margin SVM can address linearly separable problems with outliers

Soft-margin SVM can address linearly separable problems with outliers Non-linear Support Vector Machines Non-linearly separable probles Hard-argin SVM can address linearly separable probles Soft-argin SVM can address linearly separable probles with outliers Non-linearly

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016/2017 Lessons 9 11 Jan 2017 Outline Artificial Neural networks Notation...2 Convolutional Neural Networks...3

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Geometrical intuition behind the dual problem

Geometrical intuition behind the dual problem Based on: Geoetrical intuition behind the dual proble KP Bennett, EJ Bredensteiner, Duality and Geoetry in SVM Classifiers, Proceedings of the International Conference on Machine Learning, 2000 1 Geoetrical

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

UNIVERSITY OF TRENTO ON THE USE OF SVM FOR ELECTROMAGNETIC SUBSURFACE SENSING. A. Boni, M. Conci, A. Massa, and S. Piffer.

UNIVERSITY OF TRENTO ON THE USE OF SVM FOR ELECTROMAGNETIC SUBSURFACE SENSING. A. Boni, M. Conci, A. Massa, and S. Piffer. UIVRSITY OF TRTO DIPARTITO DI IGGRIA SCIZA DLL IFORAZIO 3823 Povo Trento (Italy) Via Soarive 4 http://www.disi.unitn.it O TH US OF SV FOR LCTROAGTIC SUBSURFAC SSIG A. Boni. Conci A. assa and S. Piffer

More information

Least Squares Fitting of Data

Least Squares Fitting of Data Least Squares Fitting of Data David Eberly, Geoetric Tools, Redond WA 98052 https://www.geoetrictools.co/ This work is licensed under the Creative Coons Attribution 4.0 International License. To view a

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Ensemble Based on Data Envelopment Analysis

Ensemble Based on Data Envelopment Analysis Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807

More information

Machine Learning: Fisher s Linear Discriminant. Lecture 05

Machine Learning: Fisher s Linear Discriminant. Lecture 05 Machine Learning: Fisher s Linear Discriinant Lecture 05 Razvan C. Bunescu chool of Electrical Engineering and Coputer cience bunescu@ohio.edu Lecture 05 upervised Learning ask learn an (unkon) function

More information

HIGH RESOLUTION NEAR-FIELD MULTIPLE TARGET DETECTION AND LOCALIZATION USING SUPPORT VECTOR MACHINES

HIGH RESOLUTION NEAR-FIELD MULTIPLE TARGET DETECTION AND LOCALIZATION USING SUPPORT VECTOR MACHINES ICONIC 2007 St. Louis, O, USA June 27-29, 2007 HIGH RESOLUTION NEAR-FIELD ULTIPLE TARGET DETECTION AND LOCALIZATION USING SUPPORT VECTOR ACHINES A. Randazzo,. A. Abou-Khousa 2,.Pastorino, and R. Zoughi

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

Ch 12: Variations on Backpropagation

Ch 12: Variations on Backpropagation Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith

More information

Predictive Vaccinology: Optimisation of Predictions Using Support Vector Machine Classifiers

Predictive Vaccinology: Optimisation of Predictions Using Support Vector Machine Classifiers Predictive Vaccinology: Optiisation of Predictions Using Support Vector Machine Classifiers Ivana Bozic,2, Guang Lan Zhang 2,3, and Vladiir Brusic 2,4 Faculty of Matheatics, University of Belgrade, Belgrade,

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

arxiv: v2 [cs.lg] 30 Mar 2017

arxiv: v2 [cs.lg] 30 Mar 2017 Batch Renoralization: Towards Reducing Minibatch Dependence in Batch-Noralized Models Sergey Ioffe Google Inc., sioffe@google.co arxiv:1702.03275v2 [cs.lg] 30 Mar 2017 Abstract Batch Noralization is quite

More information

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011) E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how

More information

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space Journal of Machine Learning Research 3 (2003) 1333-1356 Subitted 5/02; Published 3/03 Grafting: Fast, Increental Feature Selection by Gradient Descent in Function Space Sion Perkins Space and Reote Sensing

More information

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis City University of New York (CUNY) CUNY Acadeic Works International Conference on Hydroinforatics 8-1-2014 Experiental Design For Model Discriination And Precise Paraeter Estiation In WDS Analysis Giovanna

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)

More information

Robustness and Regularization of Support Vector Machines

Robustness and Regularization of Support Vector Machines Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA

More information

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t. CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when

More information

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data Suppleentary to Learning Discriinative Bayesian Networks fro High-diensional Continuous Neuroiaging Data Luping Zhou, Lei Wang, Lingqiao Liu, Philip Ogunbona, and Dinggang Shen Proposition. Given a sparse

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

Rotational Prior Knowledge for SVMs

Rotational Prior Knowledge for SVMs Rotational Prior Knowledge for SVMs Arkady Epshteyn and Gerald DeJong University of Illinois at Urbana-Chapaign, Urbana, IL 68, USA {aepshtey,dejong}@uiuc.edu Abstract. Incorporation of prior knowledge

More information

Lecture 9: Multi Kernel SVM

Lecture 9: Multi Kernel SVM Lecture 9: Multi Kernel SVM Stéphane Canu stephane.canu@litislab.eu Sao Paulo 204 April 6, 204 Roadap Tuning the kernel: MKL The ultiple kernel proble Sparse kernel achines for regression: SVR SipleMKL:

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

Ştefan ŞTEFĂNESCU * is the minimum global value for the function h (x)

Ştefan ŞTEFĂNESCU * is the minimum global value for the function h (x) 7Applying Nelder Mead s Optiization Algorith APPLYING NELDER MEAD S OPTIMIZATION ALGORITHM FOR MULTIPLE GLOBAL MINIMA Abstract Ştefan ŞTEFĂNESCU * The iterative deterinistic optiization ethod could not

More information

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

A Smoothed Boosting Algorithm Using Probabilistic Output Codes A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu

More information

Feedforward Networks. Gradient Descent Learning and Backpropagation. Christian Jacob. CPSC 533 Winter 2004

Feedforward Networks. Gradient Descent Learning and Backpropagation. Christian Jacob. CPSC 533 Winter 2004 Feedforward Networks Gradient Descent Learning and Backpropagation Christian Jacob CPSC 533 Winter 2004 Christian Jacob Dept.of Coputer Science,University of Calgary 2 05-2-Backprop-print.nb Adaptive "Prograing"

More information

A method to determine relative stroke detection efficiencies from multiplicity distributions

A method to determine relative stroke detection efficiencies from multiplicity distributions A ethod to deterine relative stroke detection eiciencies ro ultiplicity distributions Schulz W. and Cuins K. 2. Austrian Lightning Detection and Inoration Syste (ALDIS), Kahlenberger Str.2A, 90 Vienna,

More information

Feedforward Networks

Feedforward Networks Feedforward Networks Gradient Descent Learning and Backpropagation Christian Jacob CPSC 433 Christian Jacob Dept.of Coputer Science,University of Calgary CPSC 433 - Feedforward Networks 2 Adaptive "Prograing"

More information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information Cite as: Straub D. (2014). Value of inforation analysis with structural reliability ethods. Structural Safety, 49: 75-86. Value of Inforation Analysis with Structural Reliability Methods Daniel Straub

More information

Approximation in Stochastic Scheduling: The Power of LP-Based Priority Policies

Approximation in Stochastic Scheduling: The Power of LP-Based Priority Policies Approxiation in Stochastic Scheduling: The Power of -Based Priority Policies Rolf Möhring, Andreas Schulz, Marc Uetz Setting (A P p stoch, r E( w and (B P p stoch E( w We will assue that the processing

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup)

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup) Recovering Data fro Underdeterined Quadratic Measureents (CS 229a Project: Final Writeup) Mahdi Soltanolkotabi Deceber 16, 2011 1 Introduction Data that arises fro engineering applications often contains

More information

Symbolic Analysis as Universal Tool for Deriving Properties of Non-linear Algorithms Case study of EM Algorithm

Symbolic Analysis as Universal Tool for Deriving Properties of Non-linear Algorithms Case study of EM Algorithm Acta Polytechnica Hungarica Vol., No., 04 Sybolic Analysis as Universal Tool for Deriving Properties of Non-linear Algoriths Case study of EM Algorith Vladiir Mladenović, Miroslav Lutovac, Dana Porrat

More information

Kinematics and dynamics, a computational approach

Kinematics and dynamics, a computational approach Kineatics and dynaics, a coputational approach We begin the discussion of nuerical approaches to echanics with the definition for the velocity r r ( t t) r ( t) v( t) li li or r( t t) r( t) v( t) t for

More information

Spine Fin Efficiency A Three Sided Pyramidal Fin of Equilateral Triangular Cross-Sectional Area

Spine Fin Efficiency A Three Sided Pyramidal Fin of Equilateral Triangular Cross-Sectional Area Proceedings of the 006 WSEAS/IASME International Conference on Heat and Mass Transfer, Miai, Florida, USA, January 18-0, 006 (pp13-18) Spine Fin Efficiency A Three Sided Pyraidal Fin of Equilateral Triangular

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

Linear Classifiers as Pattern Detectors

Linear Classifiers as Pattern Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2014/2015 Lesson 16 8 April 2015 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

Topic 5a Introduction to Curve Fitting & Linear Regression

Topic 5a Introduction to Curve Fitting & Linear Regression /7/08 Course Instructor Dr. Rayond C. Rup Oice: A 337 Phone: (95) 747 6958 E ail: rcrup@utep.edu opic 5a Introduction to Curve Fitting & Linear Regression EE 4386/530 Coputational ethods in EE Outline

More information

6.2 Grid Search of Chi-Square Space

6.2 Grid Search of Chi-Square Space 6.2 Grid Search of Chi-Square Space exaple data fro a Gaussian-shaped peak are given and plotted initial coefficient guesses are ade the basic grid search strateg is outlined an actual anual search is

More information

Multilayer Neural Networks

Multilayer Neural Networks Multilayer Neural Networs Brain s. Coputer Designed to sole logic and arithetic probles Can sole a gazillion arithetic and logic probles in an hour absolute precision Usually one ery fast procesor high

More information

Learning Methods for Linear Detectors

Learning Methods for Linear Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2011/2012 Lesson 20 27 April 2012 Contents Learning Methods for Linear Detectors Learning Linear Detectors...2

More information

Convolutional Codes. Lecture Notes 8: Trellis Codes. Example: K=3,M=2, rate 1/2 code. Figure 95: Convolutional Encoder

Convolutional Codes. Lecture Notes 8: Trellis Codes. Example: K=3,M=2, rate 1/2 code. Figure 95: Convolutional Encoder Convolutional Codes Lecture Notes 8: Trellis Codes In this lecture we discuss construction of signals via a trellis. That is, signals are constructed by labeling the branches of an infinite trellis with

More information

Ph 20.3 Numerical Solution of Ordinary Differential Equations

Ph 20.3 Numerical Solution of Ordinary Differential Equations Ph 20.3 Nuerical Solution of Ordinary Differential Equations Due: Week 5 -v20170314- This Assignent So far, your assignents have tried to failiarize you with the hardware and software in the Physics Coputing

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION A eshsize boosting algorith in kernel density estiation A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION C.C. Ishiekwene, S.M. Ogbonwan and J.E. Osewenkhae Departent of Matheatics, University

More information

Fairness via priority scheduling

Fairness via priority scheduling Fairness via priority scheduling Veeraruna Kavitha, N Heachandra and Debayan Das IEOR, IIT Bobay, Mubai, 400076, India vavitha,nh,debayan}@iitbacin Abstract In the context of ulti-agent resource allocation

More information

ZISC Neural Network Base Indicator for Classification Complexity Estimation

ZISC Neural Network Base Indicator for Classification Complexity Estimation ZISC Neural Network Base Indicator for Classification Coplexity Estiation Ivan Budnyk, Abdennasser Сhebira and Kurosh Madani Iages, Signals and Intelligent Systes Laboratory (LISSI / EA 3956) PARIS XII

More information

a a a a a a a m a b a b

a a a a a a a m a b a b Algebra / Trig Final Exa Study Guide (Fall Seester) Moncada/Dunphy Inforation About the Final Exa The final exa is cuulative, covering Appendix A (A.1-A.5) and Chapter 1. All probles will be ultiple choice

More information

CHARACTER RECOGNITION USING A SELF-ADAPTIVE TRAINING

CHARACTER RECOGNITION USING A SELF-ADAPTIVE TRAINING CHARACTER RECOGNITION USING A SELF-ADAPTIVE TRAINING Dr. Eng. Shasuddin Ahed $ College of Business and Econoics (AACSB Accredited) United Arab Eirates University, P O Box 7555, Al Ain, UAE. and $ Edith

More information

Homework 3 Solutions CSE 101 Summer 2017

Homework 3 Solutions CSE 101 Summer 2017 Hoework 3 Solutions CSE 0 Suer 207. Scheduling algoriths The following n = 2 jobs with given processing ties have to be scheduled on = 3 parallel and identical processors with the objective of iniizing

More information

Optimal nonlinear Bayesian experimental design: an application to amplitude versus offset experiments

Optimal nonlinear Bayesian experimental design: an application to amplitude versus offset experiments Geophys. J. Int. (23) 155, 411 421 Optial nonlinear Bayesian experiental design: an application to aplitude versus offset experients Jojanneke van den Berg, 1, Andrew Curtis 2,3 and Jeannot Trapert 1 1

More information

Variations on Backpropagation

Variations on Backpropagation 2 Variations on Backpropagation 2 Variations Heuristic Modifications Moentu Variable Learning Rate Standard Nuerical Optiization Conjugate Gradient Newton s Method (Levenberg-Marquardt) 2 2 Perforance

More information

Qualitative Modelling of Time Series Using Self-Organizing Maps: Application to Animal Science

Qualitative Modelling of Time Series Using Self-Organizing Maps: Application to Animal Science Proceedings of the 6th WSEAS International Conference on Applied Coputer Science, Tenerife, Canary Islands, Spain, Deceber 16-18, 2006 183 Qualitative Modelling of Tie Series Using Self-Organizing Maps:

More information

List Scheduling and LPT Oliver Braun (09/05/2017)

List Scheduling and LPT Oliver Braun (09/05/2017) List Scheduling and LPT Oliver Braun (09/05/207) We investigate the classical scheduling proble P ax where a set of n independent jobs has to be processed on 2 parallel and identical processors (achines)

More information

Introduction to Discrete Optimization

Introduction to Discrete Optimization Prof. Friedrich Eisenbrand Martin Nieeier Due Date: March 9 9 Discussions: March 9 Introduction to Discrete Optiization Spring 9 s Exercise Consider a school district with I neighborhoods J schools and

More information

The Methods of Solution for Constrained Nonlinear Programming

The Methods of Solution for Constrained Nonlinear Programming Research Inventy: International Journal Of Engineering And Science Vol.4, Issue 3(March 2014), PP 01-06 Issn (e): 2278-4721, Issn (p):2319-6483, www.researchinventy.co The Methods of Solution for Constrained

More information

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness A Note on Scheduling Tall/Sall Multiprocessor Tasks with Unit Processing Tie to Miniize Maxiu Tardiness Philippe Baptiste and Baruch Schieber IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights,

More information

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon Lecture 2: Enseble Methods Isabelle Guyon guyoni@inf.ethz.ch Introduction Book Chapter 7 Weighted Majority Mixture of Experts/Coittee Assue K experts f, f 2, f K (base learners) x f (x) Each expert akes

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution Testing approxiate norality of an estiator using the estiated MSE and bias with an application to the shape paraeter of the generalized Pareto distribution J. Martin van Zyl Abstract In this work the norality

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

arxiv: v1 [cs.ds] 29 Jan 2012

arxiv: v1 [cs.ds] 29 Jan 2012 A parallel approxiation algorith for ixed packing covering seidefinite progras arxiv:1201.6090v1 [cs.ds] 29 Jan 2012 Rahul Jain National U. Singapore January 28, 2012 Abstract Penghui Yao National U. Singapore

More information

OBJECTIVES INTRODUCTION

OBJECTIVES INTRODUCTION M7 Chapter 3 Section 1 OBJECTIVES Suarize data using easures of central tendency, such as the ean, edian, ode, and idrange. Describe data using the easures of variation, such as the range, variance, and

More information

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical IEEE TRANSACTIONS ON INFORMATION THEORY Large Alphabet Source Coding using Independent Coponent Analysis Aichai Painsky, Meber, IEEE, Saharon Rosset and Meir Feder, Fellow, IEEE arxiv:67.7v [cs.it] Jul

More information

Variational Adaptive-Newton Method

Variational Adaptive-Newton Method Variational Adaptive-Newton Method Mohaad Etiyaz Khan Wu Lin Voot Tangkaratt Zuozhu Liu Didrik Nielsen AIP, RIKEN, Tokyo Abstract We present a black-box learning ethod called the Variational Adaptive-Newton

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair Proceedings of the 6th SEAS International Conference on Siulation, Modelling and Optiization, Lisbon, Portugal, Septeber -4, 006 0 A Siplified Analytical Approach for Efficiency Evaluation of the eaving

More information

Optimization of ripple filter for pencil beam scanning

Optimization of ripple filter for pencil beam scanning Nuclear Science and Techniques 24 (2013) 060404 Optiization of ripple filter for pencil bea scanning YANG Zhaoxia ZHANG Manzhou LI Deing * Shanghai Institute of Applied Physics, Chinese Acadey of Sciences,

More information

RECOVERY OF A DENSITY FROM THE EIGENVALUES OF A NONHOMOGENEOUS MEMBRANE

RECOVERY OF A DENSITY FROM THE EIGENVALUES OF A NONHOMOGENEOUS MEMBRANE Proceedings of ICIPE rd International Conference on Inverse Probles in Engineering: Theory and Practice June -8, 999, Port Ludlow, Washington, USA : RECOVERY OF A DENSITY FROM THE EIGENVALUES OF A NONHOMOGENEOUS

More information

Neural Networks Trained with the EEM Algorithm: Tuning the Smoothing Parameter

Neural Networks Trained with the EEM Algorithm: Tuning the Smoothing Parameter eural etworks Trained wit te EEM Algorit: Tuning te Sooting Paraeter JORGE M. SATOS,2, JOAQUIM MARQUES DE SÁ AD LUÍS A. ALEXADRE 3 Intituto de Engenaria Bioédica, Porto, Portugal 2 Instituto Superior de

More information

Randomized Recovery for Boolean Compressed Sensing

Randomized Recovery for Boolean Compressed Sensing Randoized Recovery for Boolean Copressed Sensing Mitra Fatei and Martin Vetterli Laboratory of Audiovisual Counication École Polytechnique Fédéral de Lausanne (EPFL) Eail: {itra.fatei, artin.vetterli}@epfl.ch

More information

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE Proceeding of the ASME 9 International Manufacturing Science and Engineering Conference MSEC9 October 4-7, 9, West Lafayette, Indiana, USA MSEC9-8466 MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL

More information