Margin Distribution and Learning Algorithms

Size: px
Start display at page:

Download "Margin Distribution and Learning Algorithms"

Transcription

1 ICML 03 Margin Distributin and Learning Algrithms Ashutsh Garg IBM Almaden Research Center, San Jse, CA 9513 USA Dan Rth Department f Cmputer Science, University f Illinis, Urbana, IL USA ASHUTOSH@US.IBM.COM DANR@UIUC.EDU Abstract Recent theretical results have shwn that imprved bunds n generalizatin errr f classifiers can be btained by explicitly taking the bserved margin distributin f the training data int accunt. Currently, algrithms used in practice d nt make use f the margin distributin and are driven by ptimizatin with respect t the pints that are clsest t the hyperplane. This paper enhances earlier theretical results and derives a practical data-dependent cmplexity measure fr learning. The new cmplexity measure is a functin f the bserved margin distributin f the data, and can be used, as we shw, as a mdel selectin criterin. We then present the Margin Distributin Optimizatin (MDO) learning algrithm, that directly ptimizes this cmplexity measure. Empirical evaluatin f MDO demnstrates that it cnsistently utperfrms SVM. 1. Intrductin The study f generalizatin abilities f learning algrithms and its dependence n sample cmplexity is ne f the fundamental research effrts in learning thery. Understanding the inherent difficulty f learning prblems allws ne t evaluate the pssibility f learning in certain situatins, estimate the degree f cnfidence in the predictins made, and is crucial in understanding, analyzing, and develping imprved learning algrithms. Recent effrts in these directins (Garg et al., 00; Langfrd & Shawe-Taylr, 00) were devted t develping generalizatin bunds fr linear classifiers which make use f the actual bserved margin distributin n the training data, rather than relying nly n the distance f the pints clsest t the hyperplane (the margin f the classifier). Similar results have been btained earlier fr a restricted case, when a cnvex cmbinatin f multiple classifiers is used as the classifier (Schapire et al., 1997). At the heart f these results are analysis techniques that can use an apprpriately weighted cmbinatin f all the data pints, weighted accrding t their distance frm the hyperplane. This paper shws that these theretical results can be made practical, be used fr mdel selectin, and t drive a new learning algrithm. Building n (Garg et al., 00), which intrduced the data dependent generalizatins bunds, we first develp a mre realistic versin f the bund, that takes int accunt the classifier s bias term. An imprtant utcme f (Garg et al., 00), which we slightly mdify here, is a data dependent cmplexity measure fr learning which we call the prjectin prfile f the data. The prjectin prfile f data sampled accrding t a distributin D, is the expected amunt f errr intrduced when a classifier h is randmly prjected, alng with the data, int a k-dimensinal space. Our analysis shws that it is captured by the fllwing quantity: a k (D,h)= x D u k(x)dd, where ( ( u k (x) =min 3exp ), ) (ν(x)+b), 1 (ν(x)+b) k 8( + (ν(x)) ) (1) and ν(x) is the distance between x and the classifying hyperplane 1 defined by h, a linear classifier fr D and b is the bias f the classifier. The sequence P(D,h) = (a 1 (D,h),a (D,h),...) is the prjectin prfile f D. The prjectin prfile turns ut t be quite infrmative, bth theretically and in practice. (Garg et al., 00) prved its relevance t generalizatin perfrmance and used it t develp sample cmplexity bunds (bunds n generalizatin errr) that are mre infrmative than existing bunds fr high dimensinal learning prblems. The main cntributin f this paper is t shw that the prjectin prfile f the bserved data with respect t a learned classifier can be directly ptimized, yielding a new learning algrithm fr linear classifiers, MDO (Margin Distributin Optimizatin), that attempts t be ptimal with respect t the margin distributin based cmplexity measure. Specifically, we first argue that this cmplexity measure can be 1 Our analysis des nt assume linearly separable data.

2 used fr mdel selectin. Empirically, we shw that the margin distributin f the data with respect t a classifier behaves differently than the margin and bserve that it is bth better crrelated with its accuracy and is mre stable in terms f measurements ver the training data and expected values. With this as mtivatin we develp an apprximatin t Eqn. 1 that is used t drive an algrithm that learns a linear classifier which directly ptimizes this measure. We use several data sets t cmpare the resulting classifiers t thse achieved by ptimizing the margin, an in SVM, and shw that MDO yields better classifiers. The paper is rganized as fllws. In the next sectin we intrduce the ntatins and sme definitins that will be used in later sectins. Sec. 3 intrduces ur enhanced versin f the data dependent generalizatin bund based n the margin distributin f the training data and discusses its implicatins. In Sec. 4 we shw that the prjectin prfile is nt nly meaningful fr the purpse f generalizatin bunds but can als be used as a mdel selectin criterin. The MDO algrithms is intrduced in Sec. 5 and experimental results with it are presented in Sec. 6. We cnclude by discussing sme pen issues.. Preliminaries We study a binary classificatin prblem f : IR n { 1, 1}, a mapping frm a n dimensinal space t class labels { 1, 1}. Let S = {(x 1,y 1 ),...,(x m,y m )} dentes a sample set f m examples. The hypthesis h IR n is an n-dimensinal linear classifier and b is the bias term. That is, fr an example x IR n, the hypthesis predicts ŷ(x) = sign(h T x + b). We use n t dente the riginal (high) dimensinality f the data, k t dente the (smaller) dimensin int which prjectins are made and m t dente the sample size f the data. The subscript will refer t the index f the example in the sample set and the superscript will refer t particular dimensin that is under cnsideratin. Definitin.1 Under 0-1 lss functin, the empirical errr Ê f h ver a sample set S and the expected errr E are given resp. by, Ê(h, S) = 1 I(ŷ(x i) y i); m i ] E(h, S) =E x [I(ŷ(x) f(x)), where I( ) is the indicatr functin which is 1 when its argument is true and 0 therwise. The expectatin E x is ver the distributin f data. We dente by the L nrm f a vectr. We will assume w.l..g that all data pints cme frm the surface f unit sphere (ie. x, x = 1), and that h = 1 (The nn unity nrm f classifier and data will be accunted fr by nrmalizing the bias b.) Let ν(x) =h T x dente the signed distance f the sample pint x frm the classifier h. When x j refers t the j th sample frm S, we dente it by ν j = ν(x j )=h T x j. With this ntatin (mitting the classifier h) the classificatin rule reduces simply t sign(ν(x)+b). Our analysis f margin distributin based bunds is based n results in the area f randm prjectin f high dimensinal data, develped in (Jhnsn & Lindenstrauss, 1984), and further studied in several ther wrks, e.g., (Arriaga & Vempala, 1999). Briefly, the methd f randm prjectin shws that with high prbability, when n dimensinal data is prjected dwn t a lwer dimensinal space f dimensin k, using a randm k n matrix, relative distances between pints are almst preserved. We will use this t shw that if the data is separable with large margin in a high dimensinal space, then it can be prjected dwn t a lw dimensinal space withut incurring much errr. Next we give the definitin f randm matrix: Definitin. (Randm Matrix) A randm prjectin matrix R is a k n matrix whse each entry r ij N(0, 1/k). Fr x IR n, we dente by x = Rx IR k the prjectin f x frm an n t a k-dimensinal space using prjectin matrix R. Similarly, fr a classifier h IR n, h dentes its prjectin t a k dimensinal space via R, S dentes the set f pints which are the prjectins f the sample S, and ν j =(h ) T x j, the signed distance in the prjected space. 3. A Margin Distributin based Bund The decisin f the classifier h is based n the sign f ν(x)+b = h T x + b. Since bth h and x are nrmalized, ν(x) can be thught f as the gemetric distance between x and the hyperplane rthgnal t h that passes thrugh the rigin. Given a distributin n data pints x, this induces a distributin n their distance frm the hyperplane induced by h, which we refer t as the margin distributin. Nte that this is different frm the margin f the sample set S with respect t a classifier h, traditinally defined in the learning cmmunity as the distance f the pint which is clsest t the hyperplane. margin f S w.r.t. h is γ(s, h) = min m i=1 ht x i + b. In (Garg et al., 00), we have develped a data dependent margin bund fr the case when the classificatin is dne as ŷ = sign(ν(x)). Althugh theretically ne can redefine x [x, 1] and h [h, 1] and btain a similar result, it

3 turns ut that in practice this may nt be a gd idea. The bias term b is related t the distance f the hyperplane frm the rigin and h is the slpe f the hyperplane. Prving the bund, as well as develping the algrithmic apprach (based n Eqn. 3 belw), require cnsidering a prjected versin f the hyperplane int a lwer dimensinal space, k, in a way that des nt significantly effects the classificatin perfrmance. At times, hwever, the abslute value f b may be much larger than the individual cmpnents f h (even after nrmalizatin with the nrm f h). In these cases, b will cntribute mre t the nise assciated with the prjectin and it may nt be a gd idea t prject b. Due t this bservatin, the algrithmic apprach that is based n Eqn. 3 necessitates that we prve a slightly mdified versin f the result given in (Garg et al., 00). Therem 3.1 Let S = {(x 1,y 1 ),...,(x m,y m )} be a set f n-dimensinal labeled examples and h a linear classifier with bias term b. Then, fr all cnstants 0 <δ<1; 0 <k, with prbability at least 1 4δ, the expected errr f h is bunded by E(h) Ê(S, h)+min k with µ k redefined as µ k = mδ m j=1 µ k + (k +)ln me k+ +ln δ m () { ( min 3 exp (νj + ) } b) k, 8( + ν j ) k(ν j + b), 1. Nte that by chsing b =0, the abve result reduces t the ne given in (Garg et al., 00). Althugh ne can prve the abve therem fllwing the steps given in (Garg et al., 00), we give the sketch f the prf as it is instrumental in develping an intuitin fr the algrithm. The bund given in Eqn. has tw main cmpnents. The first cmpnent, µ k, captures the distrtin incurred due t the randm prjectin t dimensin k; the secnd term fllws directly frm VC thery fr a classifier in k dimensinal space. The randm prjectin therem (Arriaga & Vempala, 1999) states that relative distances are (almst) preserved when prjecting t lwer dimensinal space. Therefre, we first argue that, with high prbability the image, under prjectin, f data pints that are far frm h in the riginal space, will still be far in the prjected (k dimensinal) space. The first term quantifies the penalty incurred due t data pints whse images will nt be cnsistent with the image f h. That is, this term bunds the additinal errr incurred due t prjectin t k dimensinal space. Once the data lies in the lwer dimensinal space, we can bund the expected errr f the classifier n the data as a functin f the dimensin f the space, number f samples and the empirical errr there (that is, the first cmpnent). Decreasing the dimensin f the prjected space implies increasing the cntributin f the first term, while the VC-dimensin based term decreases. T get the ptimal bund, ne has t balance these tw quantities and chse the dimensin k f the prjected space s that the generalizatin errr is minimized. The fllwing lemma is the key in prving the abve therem; it quantifies the penalty incurred when prjecting the data dwn t k dimensinal space. Fr prf, please refer t (Garg et al., 00). Lemma 3. Let h be an n-dimensinal classifier, x IR n a sample pint, such that h = x =1, and ν = h T x. Let R IR k n be a randm prjectin matrix (Def..), with h = Rh, x = Rx. Let the classificatin is dne as sign(h T x + b). Then the prbability f misclassifying x, relative t its classificatin in the riginal space, due t the randm prjectin, is [ ] P sign(h T x + b) sign(h T x + b) { ( min 3exp (ν + ) b) k 8( + ν ), k(ν + b), 1 }. The abve lemma establishes a bund n the additinal classificatin errr that is incurred when prjecting the sample dwn frm an n t a k dimensinal space. Nte that in this frmulatin, unlike the ne in (Garg et al., 00)) we nly prject h while keeping b as in the riginal space. It can be shwn that in many cases this bund is tighter than the ne given in (Garg et al., 00). Once the abve result is established, ne can prve the therem by making use f the symmetrizatin lemma (Anthny & Bartlett, 1999) and standard VC-dimensin based arguments (Vapnik, 1998). See (Garg et al., 00) fr details Existing Learning Algrithms In this sectin, we attempt t prvide sme intuitin t the significance f cnsidering the margin distributin when selecting a classifier. We nte that the discussin is general althugh dne in the cntext f linear classifiers. It has been shwn that any classifier can be mapped t a linear classifier in high dimensinal space. This suggests that analysis f linear classifiers is general enugh and can give meaningful insights. The selectin f a classifier frm amng a set f classifiers that behaves well n the training data (with respect t sme cmplexity measure) is based Vapnik s (Vapnik, 1998) structural risk minimizatin principle. The key questin then is, what is the measure that shuld guide this selectin. A cartn example is depicted in Fig. 1. Training data with crrespnding t psitive data and crrespnding

4 (a) (b) (c) (d) Figure 1. (a) A cartn shwing the training data in tw dimensinal space with crrespnding t psitive class and crrespnding t negative class. (b) Linear classifiers that can classify the data withut errr. (c) The hyperplane that learned by a large margin classifier such as SVM. (d) This hyperplane may be a better chice than the ne learned by SVM, since mre data pints are further apart frm the hyperplane. t negative data. Clearly, there are a number f linear classifiers that can classify the training data perfectly. Fig. 1(b) shws sme f these classifiers. All these classifiers have zer errr n the training data and as such any selectin criteria based n training errr nly will nt distinguish between them. One f the ppular measure that guides the selectin f classifier is the margin f a hyperplane with respect t its training data. Clearly, hwever, this measure is sensitive since it typically depends n a small number f extreme pints. This can be significant nt nly in the case f nisy data, but als in cases f linearly separable data, since this selectin may effect perfrmance n future data. In the cntext f large margin classifiers such as SVM, researchers tried t get arund this prblem by intrducing slack variables and ignring sme f the clsest pints in an ad hc fashin, but this extreme ntin f margin still drives the ptimizatin. The cartn in Fig. 1(d) shws a hyperplane that was chsen by taking int accunt all the data pints rather than thse that are clsest t the hyperplane. Intuitively, this lks like a better chice than the ne in Fig. 1(c) (which wuld be chsen by a large margin classifier), by the virtue f being a mre stable measure. In the next sectin, we argue that ne can use prjectin prfile as a mdel selectin criterin and shw that this hyperplane is the ne that wuld actually be selected if this mdel selectin criteria is used. 4. Algrithmic Implicatins The expected prbability f errr fr a k-dimensinal image f a pint x that is at distance ν(x) frm an n- dimensinal hyperplane (where the expectatin is with respect t selecting a randm prjectin matrix) is given by the prjectin prfile in Eqn. 1. This expressin measures the cntributin f a pint t the generalizatin errr as a Dataset C:Margin-E C:PrjPrfile-E Pima Breast Cancer Insphere Credit Table 1. The first clumn (C:Margin-E) gives the crrelatin cefficient between the Margin f the training data and the errr n the test set. The secnd clumn (C:PrjPrfile-E) gives the crrelatin cefficient between the weighted margin accrding t the cst functin given in Eqn. 5 and the errr n test set; all dne fr fur datasets frm UCI ML Repsitry. When cmputing the margin we lk at nly crrectly classified pints. functin f its distance frm the hyperplane (fr a fixed k). Figure (a) shws this term (Eqn. 1) fr different values f k. All pints cntribute t the errr, and the relative cntributin f a pint decays as a functin f its distance frm the hyperplane. Hypthetically, given a prbability distributin ver the instance space and a fixed classifier, ne can cmpute the margin distributin which can then be used t cmpute the prjectin prfile f the data. The bund given in Thm. 3.1 depends nly n the prjectin prfile f the data and the prjectin dimensin and is independent f the particular classifier. Of all the classifiers with the same training errr, the ne with the least prjectin prfile k value will have the best generalizatin perfrmance. Therefre, the Therem suggests a mdel selectin criterin. Of all the classifiers, chse the ne that ptimizes the prjectin prfile and nt merely the margin. A similar argument, but fr a slightly restricted prblem, was presented by (Masn et al., 000). They argued that when learning a cnvex cmbinatin f classifiers, instead f maximizing the margin ne shuld maximize a weighted functin f the distance f the pints frm the

5 Expected Prjectin Errr K increases Weight Weight β K=1000 K=100 K= α Distance frm Hyperplane Margin Margin (a) (b) (c) Figure. (a)the cntributin f data pints t the generalizatin errr as a functin f their distance frm the hyperplane. (b) The cntributin f data pints t the generalizatin errr as a functin f their distance frm the hyperplane as given by (Masn et al., 000) (c) The weight given t the data pints by the MDO algrithm as a functin f there margin. learned classifier. This was used t develp margin distributin based generalizatin bund fr cnvex cmbinatin f classifiers. The weighting functin that was prpsed is shwn in Fig. (b). The results f their algrithm shwed that a weighted margin distributin (fr cnvex cmbinatin f classifiers) is a better cmplexity measure than the extreme ntin f margin which has been typically used in large margin classificatin algrithms like SVM. Althugh the prblem addressed there is different frm urs, interestingly, the frm f their weighting functin (shwn in Fig. (b), discussed later) is very similar t urs. In ur wrk, instead f wrking with the actual prjectin prfile, we wrk with a simpler frm f it given in Eqn. 3. u(x) = exp( (ν(x)+b) k) (3) The advantage f using the prjectin prfile as a mdel selectin criteria is evident frm the experiments presented in Table 1. In particular, we tk fur data sets frm UCI ML repsitry. Fr each data set, we randmly divided it int a test and a training data set. A classifier was learned n the training set and was evaluated n the test data. This experiment was repeated 00 times (each time chsing a different subset as training data). Each time we cmputed the actual margin f the SVM that was used as the initial guess and the weighted margin accrding t a cst functin, given in Eqn. 5, fr this classifier. Table 1 gives the crrelatin between the generalizatin errr f the classifier (its perfrmance n the test data) and these quantities. The first clumn in Table 1 gives the crrelatin between the errr n the test data and the margin n the training data. The secnd clumn gives the crrelatin between the errr n the test data and the weighted margin (prjectin prfile) cmputed n the training data. Nte that here we are lking at the abslute value f the crrelatin cefficient. The crrelatin results highlights the fact that ur new cst functin is a better predictr f generalizatin errr. Further evidence is presented in Table. It presents empirical evidence t shw that the cst functin given in Eqn. 5 is a mre stable measure than the margin. The first clumn in Table gives the crrelatin between the margin (which is cmputed as the distance f the clsest crrectly classified pint frm the hyperplane) n the training and test data ver 00 runs. The secnd clumn gives the crrelatin between the weighted margin (prjectin prfile) cmputed n the training data and test data. Again, a high crrelatin in the latter case indicates the stability f this new measure. 5. The Margin Distributin Optimizatin (MDO) Algrithm In this sectin, we intrduce a new learning algrithm, the Margin Distributin Optimizatin (MDO) algrithm. MDO is driven by ptimizing with respect t the simplified frm f prjectin prfile given in Eqn. 3. The learning algrithm attempts t find a classifier that minimizes a cst functin which happens t be the prjectin prfile f the data. The cst functin in Eqn. 3 determines the cntributin we want ur selected classifier t give t the data pints. In rder t chse a classifier that ptimizes with respect t this measure, thugh, we need t change it in rder t take int accunt misclassified pints. We d that by defining a new cst functin which explicitly gives higher weights t misclassified pints. Thus, the weight given t an example x i with true class label y i and learned hyperplane h with bias b is given by { e α(ν(xi)+b) if y i (ν(x i )+b) 0, W (x i )= e βyi(ν(xi)+b) (4) if y i (ν(x i )+b) < 0. That is, fr crrect classificatin, the weight is the ne given by the prjectin prfile. Hwever, fr misclassified

6 Dataset C:Margin Tr-Te C:PrjPrfile Tr-Te Pima Breast Cancer Insphere Credit Table. The first clumn (C:Margin Tr-Te) gives the crrelatin cefficient between the margin n the training data and the margin n the test data. The secnd clumn (C:PrjPrfile Tr-Te) gives the crrelatin cefficient between weighted margin accrding t cst functin given in Eqn. 5 n the training data and n the test data fr fur datasets frm UCI ML Repsitry. When cmputing the margin we lk at nly crrectly classified pints. pints, the example is expnentially weighted. The cnstants α, β allws ne t cntrl the weight f these tw terms. α shuld be thught f as the ptimal prjectin dimensin and culd be ptimized (althugh at this pint ur algrithms determines it heuristically). Fig. (c) shws a family f weight functins as a functin f the margin ν(x) +b fr an example. In this figure, a psitive margin means crrect classificatin and negative margin crrespnds t misclassificatin. The weight given t pints which were classified with gd margin is very lw while pints which are either misclassified r very clse t the margin get higher weights. Minimizing this cst functin will essentially try t maximize the weighted margin while reducing the misclassificatin errr. Nte the clse resemblance t the weighting functin prpsed by Masn et. al. fr the cnvex cmbinatin case, shwn in Fig. (b). Given a training set S f size m, the ptimizatin prblem can be frmulated as: Find (h, b) such that the fllwing cst functin is minimized. m L(h, b) = I(ŷ i = y i )e α(ν(xi)+b) i=1 + m I(ŷ i y i )e βyi(ν(xi)+b) (5) i=1 Here I( ) is an indicatr functin which is 1 when its argument is true therwise 0. α is directly prprtinal t the cncept f prjectin dimensin and β is a related t the tradeff between the misclassificatin errr and the prjectin prfile. There are a few bservatins that needs t be made befre we prceed further. In mst practical cases, learning is dne with nn-separable training data. Standard learning algrithms like SVM handle this case by intrducing slack variables and the ptimal values f the slack variables are chsen by a search prcedure. This makes the perfrmance f the algrithm dependent n the particular search methd. Our frmulatin autmatically takes care f misclassified pints. While minimizing the abve cst functin, it gives larger penalty t pints which are in errr as against the nes which are crrectly classified. The amunt f errr that will be tlerated can be cntrlled by chsing α, β apprpriately. One pssible way t chse these parameters is by evaluating the learned classifier ver sme hld ut set. In ur case, we bserved that in mst gave gd results where ν is an estimate f average margin given by ν = ν i /m fr a data set f size m fr sme h. cases α = 1 (ν+b) and β = 1 (ν+b) The main cmputatinal difficulty in ur algrithm lies in the nn-cnvexity f the bjective functin. In general, ne wuld like t avid slving a nn-cnvex prblem as it is hard t btain a glbally ptimal slutin and there is n clsed frm expressin. Nevertheless we shw that gradient descent methds are quiet useful here. Due t nncnvexity, there are a number f lcal saddle pints and ne may nt reach a glbal ptimal pint. This makes the chice f starting pint extremely critical. In ur wrk, we suggest t use SVM learning algrithm t chse the initial starting classifier. Once an initial classifier is btained, the algrithm uses gradient descent ver the cst functin given in Eqn 5. The algrithm stps nce it has reached a lcal minima f the cst functin. Tw imprtant pints that deserves attentin are (1) the nrm f the classifier h and () the nn-differentiability f the cst functin. In general, by increasing the nrm f the classifier, we can decrease the value f the cst functin (as the margin will increase). Therefre, we dn t take int accunt the nrm f the classifier h in the cst functin and at each iteratin, after btaining the updated vectr h, we re-nrmalize it. The secnd issue is mre tricky. The derivative f the cst functin is nt defined fr thse x i which have 0 margin i.e. ν(x i )+b =0. At such pints, we cmpute bth the left and right derivatives. All the gradient directins are then evaluated and thse which lead t maximum decrease in the cst functin is selected. The algrithm terminates if n such gradient directin are fund. The algrithm given in Fig. 7 gives the implementatin details. It starts with learning a classifier using SVM learning algrithm btaining (h, b). We assume that h =1. α, β are then initialized (as abve). MAX CONST - maximum number f pints with zer margin that will be cnsidered at any given iteratin f the algrithm. Since at pints with zer margin, we are evaluating bth the left and right derivative, the number f directins cnsidered is expnential in the number f pints with zer margin, and as such we limit the number f such pints (fr which bth gradient directins are cnsidered) t MAX CONST. MDO is the main prcedure. If there are n pints with 0 If h 1, define h = h/ h, b= b/ h. The classificatin result will be unchanged.

7 7 x Prjectin Prfile Margin Training Errr Test Errr Misclassificatin Errr Iteratins f Gradient Descent Figure 3. The plt gives the experimental results fr the breast cancer dataset frm the UCI ML repsitry. It gives the prjectin prfile and margin f the learned classifier n the training data as the algrithm iterates. margin, the algrithm simply cmputes the gradients as: L(h, b) h k = α β m I(ŷ i = y i)(ν(x i)+b)x k i e α(ν(x i)+b) i=1 m I(ŷ i y i)y ix k i e βy i(ν(x i )+b) i=1 These derivatives are then used by prcedure Update t btain the updated values f (h, b) given by (h,b ). While updating, we make sure that cst functin is decreasing at each iteratin. Hwever, if there are pints with 0 margin, we randmly select f MAX CONST such pints, and at thse we cmpute all the pssible gradients (cmputing bth left and right derivatives). The ther pints with zer margin are ignred. Once we have the derivative f the cst functin, the same update prcedure, as abve, is called. While ding the experiments, we bserved that the algrithm typically cnverges in a small number f steps. 6. Results In this sectin, we prvide the empirical cmparisn f the MDO with SVM. Nte that althugh, in ur experiments, MDO s starting pint is the SVM classifier, the results btained by MDO culd be wrse than SVM. Indeed, the selectin f the new classifier is driven by ur new ptimizatin functin and it culd lead t larger errr. Interestingly, when ding the experiments we bserved that in many cases we btained better generalizatin at the cst f slightly pr perfrmance n the training data. T understand this we analyzed in detail the results fr the breast cancer dataset frm UCI ML repsitry. In Fig. 3 we plt the margin and the value f the cst functin given in Eqn. 5 (6) Iteratins f Gradient Descent Figure 4. The plt gives the experimental results fr the breast cancer dataset frm the UCI ML repsitry. This plt gives the training errr and test errr at each iteratin. With the iteratins, training errr increased frm the starting pint - 0th iteratin while test errr went dwn. Datasets SVM MDO % Reductin Credit (UCI) Breast (UCI) Insphere (UCI) Heart (UCI) Liver (UCI) Mushrm (UCI) peace & piece being & begin Table 3. Results cmparing the perfrmance f SVM and MDO. The first six datasets are frm UCI ML repsitry. The last tw datasets are frm the cntext sensitive spelling crrectin prblem. The clumn SVM gives the % crrect classificatin fr test data when a classifier learned using SVM was used and clumn MDO gives the same fr MDO. The last clumn gives the percentage reductin in errr frm SVM t MDO. As seen, in all cases, MDO s results are cmparable r better than the nes btained using SVM. These results are averaged ver 100 runs. ver the gradient descent iteratins. As expected, while the cst functin f MDO imprved (decreased in value) the margin deterirated (since we started with SVM, the margin at 0 th iteratin is maximal). Fig. 4 is even mre revealing. We shw hw the training and test errr changed with the iteratins f the gradient descent. Interestingly, the errr n the training errr went up frm 5% at 0 th iteratin t 9%. The algrithm traded the increase in training errr with the decrease in prjectin prfile. The secnd plt gives the test errr and as evident, the test errr went dwn with mre iteratins. The gap between the training errr and test errr als decreased which shws that the new classifier generalized well and the cst functin prpsed in Eqn. 5 is a better measure f learning cmplexity.

8 In Table 3, we cmpare the classificatin perfrmance f MDO and SVM n a number f datasets frm UCI ML repsitry and n tw real wrld prblems related t cntext sensitive spelling crrectin. The spelling data is taken frm the pruned data set f (Glding & Rth, 1999). Tw particular examples f cntext sensitive spelling crrectin, being & begin and peace & piece, are cnsidered. Fr details n the dataset see (Murphy, 1994). In all experiments, data was randmly divided int training ( 60%), and test set (40%). The experiments were repeated 100 times each time chsing a different subset (randmly chsen) f training and test data. It is evident that except fr a few case (in which the imprvement is nt statistically significant), in mst cases MDO utperfrms SVM. 7. Cnclusins We presented a new, practical, margin distributin based cmplexity measure fr learning classifiers that we derived by enhancing a recently develped methd fr develping margin distributin based generalizatin bunds. We have shwn that this theretically justified cmplexity measure can be used rbustly as a mdel selectin criterin and, mst imprtantly, used it t drive a new learning algrithm fr linear classifiers, that is selected by ptimizing with respect t this cmplexity measure. The results are based n a nvel use f the margin distributin f the data relative t the learned classifier, that is different than the typical use f the ntin f margin in machine learning. Cnsequently, the resulting algrithm is nt sensitive t small number f samples in determining the ptimal hyperplane. Althugh the bund presented here is tighter than existing bunds and is smetimes infrmative fr real prblems, it is still lse and mre research is needed t match bserved perfrmance n real data. Algrithmically, althugh we have given an implementatin f the new algrithm MDO, ne f the main directin f future research is t study it further, as well as t investigate ther algrithmic implicatins f the ideas presented here. Acknwledgments: This research is supprted by NSF grants ITR-IIS , ITR-IIS and IIS and an IBM fellwship t Ashutsh Garg. References Anthny, M., & Bartlett, P. L. (1999). Neural netwrk learning: Theretical fundatins. Cambridge. Arriaga, R. I., & Vempala, S. (1999). An algrithmic thery f learning: Rbust cncepts and randm prjectin. Prc. f the 40th Fundatins f Cmputer Science (pp ). Garg, A., Har-Peled, S., & Rth, D. (00). On generalizatin bunds, prjectin prfile, and margin distributin. Prc. f the Internatinal Cnference n Machine Learning. Here we initialize the cnstants and btain initial estimate f the classifier Glbal : Training Set: S = {(x 1,y 1),..., (x m,y m)} Classifier: Learn (h, b) using linear SVM Where h, b : y i(h T x i + b) > 0 (fr all the crrectly classified training examples.) Cnstants: MAX CONST,α,β,c inf This is the main functin Prcedure MDO(h, b, S) Fr all (x i,y i) S, f h (x i)=y i(h T x i + b) C(h, b) = i:f h (x i ) 0 e αf h (x i ) + i:f h (x i )<0 e βf h (x i ) A := {(x i,y i) S : f h (x i)=0} (pints with 0 margin) if (A = φ) N pints with 0 margin g h := h C(h, b); g b := b C(h, b) Update (h,b,h,b,g h,g b ) else A is a set f pints with 0 margin if ( A >MAX CONST) then B is randm subset f A f size MAX CONST, else B = A N = B fr each (b 1,..., b N ) {±1} N {g (b 1,..,b N ) h,g (b 1,..,b N ) b } = { (b 1,..,b N ) h, (b 1,..,b N ) b }C(h, b); where (b 1,...,b N ) means the left r right derivative at the i th example in B accrding as b i =+1r 1. Update (h,b,h,b,g h,g b ) c (b 1,...,b N ) = C(h, b) C(h,b ) end Chse h,b which had the maximum reductin in cst end if C(h, b) C(h,b ) <c inf, then BREAK else h = h ; b = b ; MDO(h, b, S) End This prcedure gives the updated parameters based n the ld parameters and gradients Prcedure Update(h,b,h,b,g h,g b ) h = h + ɛg h,b = b + ɛg b ; h = h / h, b = b / h where ɛ is maximum psitive real number such that C(h, b) C(h,b ) 0. end Figure 5. Sketch f the Margin Distributin Optimizatin (MDO) Algrithm Glding, A. R., & Rth, D. (1999). A Winnw based apprach t cntext-sensitive spelling crrectin. Machine Learning, 34, Jhnsn, W. B., & Lindenstrauss, J. (1984). Extensins f lipschitz mappings int a hilbert space. Cnference in mdern analysis and prbability (pp ). Langfrd, J., & Shawe-Taylr, J. (00). Pac-Bayes and margins. Advances in Neural Infrmatin Prcessing Systems. Masn, L., Bartlett, P., & Baxter, J. (000). Imprved generalizatin thrugh explicit ptimizatin f margins. Machine Learning, 38-3, Murphy, P. M. (1994). UCI repsitry f machine learning databases [ mlearn] (Technical Reprt). Irvine, CA: University f Califrnia.

9 Schapire, R. E., Freund, Y., Bartlett, P., & Lee, W. S. (1997). Bsting the margin: a new explanatin fr the effectiveness f vting methds. Prc. 14th Internatinal Cnference n Machine Learning (pp ). Mrgan Kaufmann. Vapnik, V. N. (1998). Statistical learning thery. New Yrk: Jhn Wiley and Sns Inc.

Pattern Recognition 2014 Support Vector Machines

Pattern Recognition 2014 Support Vector Machines Pattern Recgnitin 2014 Supprt Vectr Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 1 / 55 Overview 1 Separable Case 2 Kernel Functins 3 Allwing Errrs (Sft

More information

The blessing of dimensionality for kernel methods

The blessing of dimensionality for kernel methods fr kernel methds Building classifiers in high dimensinal space Pierre Dupnt Pierre.Dupnt@ucluvain.be Classifiers define decisin surfaces in sme feature space where the data is either initially represented

More information

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines COMP 551 Applied Machine Learning Lecture 11: Supprt Vectr Machines Instructr: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted fr this curse

More information

Support-Vector Machines

Support-Vector Machines Supprt-Vectr Machines Intrductin Supprt vectr machine is a linear machine with sme very nice prperties. Haykin chapter 6. See Alpaydin chapter 13 fr similar cntent. Nte: Part f this lecture drew material

More information

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d)

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d) COMP 551 Applied Machine Learning Lecture 9: Supprt Vectr Machines (cnt d) Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Class web page: www.cs.mcgill.ca/~hvanh2/cmp551 Unless therwise

More information

IAML: Support Vector Machines

IAML: Support Vector Machines 1 / 22 IAML: Supprt Vectr Machines Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester 1 2 / 22 Outline Separating hyperplane with maimum margin Nn-separable training data Epanding the input int

More information

What is Statistical Learning?

What is Statistical Learning? What is Statistical Learning? Sales 5 10 15 20 25 Sales 5 10 15 20 25 Sales 5 10 15 20 25 0 50 100 200 300 TV 0 10 20 30 40 50 Radi 0 20 40 60 80 100 Newspaper Shwn are Sales vs TV, Radi and Newspaper,

More information

Chapter 3: Cluster Analysis

Chapter 3: Cluster Analysis Chapter 3: Cluster Analysis } 3.1 Basic Cncepts f Clustering 3.1.1 Cluster Analysis 3.1. Clustering Categries } 3. Partitining Methds 3..1 The principle 3.. K-Means Methd 3..3 K-Medids Methd 3..4 CLARA

More information

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression 3.3.4 Prstate Cancer Data Example (Cntinued) 3.4 Shrinkage Methds 61 Table 3.3 shws the cefficients frm a number f different selectin and shrinkage methds. They are best-subset selectin using an all-subsets

More information

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data Outline IAML: Lgistic Regressin Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester Lgistic functin Lgistic regressin Learning lgistic regressin Optimizatin The pwer f nn-linear basis functins Least-squares

More information

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical mdel fr micrarray data analysis David Rssell Department f Bistatistics M.D. Andersn Cancer Center, Hustn, TX 77030, USA rsselldavid@gmail.cm

More information

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression 4th Indian Institute f Astrphysics - PennState Astrstatistics Schl July, 2013 Vainu Bappu Observatry, Kavalur Crrelatin and Regressin Rahul Ry Indian Statistical Institute, Delhi. Crrelatin Cnsider a tw

More information

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007 CS 477/677 Analysis f Algrithms Fall 2007 Dr. Gerge Bebis Curse Prject Due Date: 11/29/2007 Part1: Cmparisn f Srting Algrithms (70% f the prject grade) The bjective f the first part f the assignment is

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

Enhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme

Enhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme Enhancing Perfrmance f / Neural Classifiers via an Multivariate Data Distributin Scheme Halis Altun, Gökhan Gelen Nigde University, Electrical and Electrnics Engineering Department Nigde, Turkey haltun@nigde.edu.tr

More information

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa There are tw parts t this lab. The first is intended t demnstrate hw t request and interpret the spatial diagnstics f a standard OLS regressin mdel using GeDa. The diagnstics prvide infrmatin abut the

More information

Kinetic Model Completeness

Kinetic Model Completeness 5.68J/10.652J Spring 2003 Lecture Ntes Tuesday April 15, 2003 Kinetic Mdel Cmpleteness We say a chemical kinetic mdel is cmplete fr a particular reactin cnditin when it cntains all the species and reactins

More information

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank CAUSAL INFERENCE Technical Track Sessin I Phillippe Leite The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Phillippe Leite fr the purpse f this wrkshp Plicy questins are causal

More information

Least Squares Optimal Filtering with Multirate Observations

Least Squares Optimal Filtering with Multirate Observations Prc. 36th Asilmar Cnf. n Signals, Systems, and Cmputers, Pacific Grve, CA, Nvember 2002 Least Squares Optimal Filtering with Multirate Observatins Charles W. herrien and Anthny H. Hawes Department f Electrical

More information

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) > Btstrap Methd > # Purpse: understand hw btstrap methd wrks > bs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(bs) > mean(bs) [1] 21.64625 > # estimate f lambda > lambda = 1/mean(bs);

More information

Tree Structured Classifier

Tree Structured Classifier Tree Structured Classifier Reference: Classificatin and Regressin Trees by L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stne, Chapman & Hall, 98. A Medical Eample (CART): Predict high risk patients

More information

In SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw:

In SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw: In SMV I IAML: Supprt Vectr Machines II Nigel Gddard Schl f Infrmatics Semester 1 We sa: Ma margin trick Gemetry f the margin and h t cmpute it Finding the ma margin hyperplane using a cnstrained ptimizatin

More information

, which yields. where z1. and z2

, which yields. where z1. and z2 The Gaussian r Nrmal PDF, Page 1 The Gaussian r Nrmal Prbability Density Functin Authr: Jhn M Cimbala, Penn State University Latest revisin: 11 September 13 The Gaussian r Nrmal Prbability Density Functin

More information

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017 Resampling Methds Crss-validatin, Btstrapping Marek Petrik 2/21/2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins in R (Springer, 2013) with

More information

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint Biplts in Practice MICHAEL GREENACRE Prfessr f Statistics at the Pmpeu Fabra University Chapter 13 Offprint CASE STUDY BIOMEDICINE Cmparing Cancer Types Accrding t Gene Epressin Arrays First published:

More information

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction T-61.5060 Algrithmic methds fr data mining Slide set 6: dimensinality reductin reading assignment LRU bk: 11.1 11.3 PCA tutrial in mycurses (ptinal) ptinal: An Elementary Prf f a Therem f Jhnsn and Lindenstrauss,

More information

Sequential Allocation with Minimal Switching

Sequential Allocation with Minimal Switching In Cmputing Science and Statistics 28 (1996), pp. 567 572 Sequential Allcatin with Minimal Switching Quentin F. Stut 1 Janis Hardwick 1 EECS Dept., University f Michigan Statistics Dept., Purdue University

More information

Checking the resolved resonance region in EXFOR database

Checking the resolved resonance region in EXFOR database Checking the reslved resnance regin in EXFOR database Gttfried Bertn Sciété de Calcul Mathématique (SCM) Oscar Cabells OECD/NEA Data Bank JEFF Meetings - Sessin JEFF Experiments Nvember 0-4, 017 Bulgne-Billancurt,

More information

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA Mdelling f Clck Behaviur Dn Percival Applied Physics Labratry University f Washingtn Seattle, Washingtn, USA verheads and paper fr talk available at http://faculty.washingtn.edu/dbp/talks.html 1 Overview

More information

Hypothesis Tests for One Population Mean

Hypothesis Tests for One Population Mean Hypthesis Tests fr One Ppulatin Mean Chapter 9 Ala Abdelbaki Objective Objective: T estimate the value f ne ppulatin mean Inferential statistics using statistics in rder t estimate parameters We will be

More information

Part 3 Introduction to statistical classification techniques

Part 3 Introduction to statistical classification techniques Part 3 Intrductin t statistical classificatin techniques Machine Learning, Part 3, March 07 Fabi Rli Preamble ØIn Part we have seen that if we knw: Psterir prbabilities P(ω i / ) Or the equivalent terms

More information

Admin. MDP Search Trees. Optimal Quantities. Reinforcement Learning

Admin. MDP Search Trees. Optimal Quantities. Reinforcement Learning Admin Reinfrcement Learning Cntent adapted frm Berkeley CS188 MDP Search Trees Each MDP state prjects an expectimax-like search tree Optimal Quantities The value (utility) f a state s: V*(s) = expected

More information

NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION

NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION NUROP Chinese Pinyin T Chinese Character Cnversin NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION CHIA LI SHI 1 AND LUA KIM TENG 2 Schl f Cmputing, Natinal University f Singapre 3 Science

More information

On Huntsberger Type Shrinkage Estimator for the Mean of Normal Distribution ABSTRACT INTRODUCTION

On Huntsberger Type Shrinkage Estimator for the Mean of Normal Distribution ABSTRACT INTRODUCTION Malaysian Jurnal f Mathematical Sciences 4(): 7-4 () On Huntsberger Type Shrinkage Estimatr fr the Mean f Nrmal Distributin Department f Mathematical and Physical Sciences, University f Nizwa, Sultanate

More information

Fall 2013 Physics 172 Recitation 3 Momentum and Springs

Fall 2013 Physics 172 Recitation 3 Momentum and Springs Fall 03 Physics 7 Recitatin 3 Mmentum and Springs Purpse: The purpse f this recitatin is t give yu experience wrking with mmentum and the mmentum update frmula. Readings: Chapter.3-.5 Learning Objectives:.3.

More information

Methods for Determination of Mean Speckle Size in Simulated Speckle Pattern

Methods for Determination of Mean Speckle Size in Simulated Speckle Pattern 0.478/msr-04-004 MEASUREMENT SCENCE REVEW, Vlume 4, N. 3, 04 Methds fr Determinatin f Mean Speckle Size in Simulated Speckle Pattern. Hamarvá, P. Šmíd, P. Hrváth, M. Hrabvský nstitute f Physics f the Academy

More information

Name: Block: Date: Science 10: The Great Geyser Experiment A controlled experiment

Name: Block: Date: Science 10: The Great Geyser Experiment A controlled experiment Science 10: The Great Geyser Experiment A cntrlled experiment Yu will prduce a GEYSER by drpping Ments int a bttle f diet pp Sme questins t think abut are: What are yu ging t test? What are yu ging t measure?

More information

Simple Linear Regression (single variable)

Simple Linear Regression (single variable) Simple Linear Regressin (single variable) Intrductin t Machine Learning Marek Petrik January 31, 2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins

More information

1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp

1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp THE POWER AND LIMIT OF NEURAL NETWORKS T. Y. Lin Department f Mathematics and Cmputer Science San Jse State University San Jse, Califrnia 959-003 tylin@cs.ssu.edu and Bereley Initiative in Sft Cmputing*

More information

Homology groups of disks with holes

Homology groups of disks with holes Hmlgy grups f disks with hles THEOREM. Let p 1,, p k } be a sequence f distinct pints in the interir unit disk D n where n 2, and suppse that fr all j the sets E j Int D n are clsed, pairwise disjint subdisks.

More information

This section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving.

This section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving. Sectin 3.2: Many f yu WILL need t watch the crrespnding vides fr this sectin n MyOpenMath! This sectin is primarily fcused n tls t aid us in finding rts/zers/ -intercepts f plynmials. Essentially, ur fcus

More information

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels Mtivating Example Memry-Based Learning Instance-Based Learning K-earest eighbr Inductive Assumptin Similar inputs map t similar utputs If nt true => learning is impssible If true => learning reduces t

More information

Support Vector Machines and Flexible Discriminants

Support Vector Machines and Flexible Discriminants 12 Supprt Vectr Machines and Flexible Discriminants This is page 417 Printer: Opaque this 12.1 Intrductin In this chapter we describe generalizatins f linear decisin bundaries fr classificatin. Optimal

More information

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS 1 Influential bservatins are bservatins whse presence in the data can have a distrting effect n the parameter estimates and pssibly the entire analysis,

More information

COMP 551 Applied Machine Learning Lecture 4: Linear classification

COMP 551 Applied Machine Learning Lecture 4: Linear classification COMP 551 Applied Machine Learning Lecture 4: Linear classificatin Instructr: Jelle Pineau (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted

More information

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came. MATH 1342 Ch. 24 April 25 and 27, 2013 Page 1 f 5 CHAPTER 24: INFERENCE IN REGRESSION Chapters 4 and 5: Relatinships between tw quantitative variables. Be able t Make a graph (scatterplt) Summarize the

More information

Differentiation Applications 1: Related Rates

Differentiation Applications 1: Related Rates Differentiatin Applicatins 1: Related Rates 151 Differentiatin Applicatins 1: Related Rates Mdel 1: Sliding Ladder 10 ladder y 10 ladder 10 ladder A 10 ft ladder is leaning against a wall when the bttm

More information

Combining Dialectical Optimization and Gradient Descent Methods for Improving the Accuracy of Straight Line Segment Classifiers

Combining Dialectical Optimization and Gradient Descent Methods for Improving the Accuracy of Straight Line Segment Classifiers Cmbining Dialectical Optimizatin and Gradient Descent Methds fr Imprving the Accuracy f Straight Line Segment Classifiers Rsari A. Medina Rdriguez and Rnald Fumi Hashimt University f Sa Paul Institute

More information

ENSC Discrete Time Systems. Project Outline. Semester

ENSC Discrete Time Systems. Project Outline. Semester ENSC 49 - iscrete Time Systems Prject Outline Semester 006-1. Objectives The gal f the prject is t design a channel fading simulatr. Upn successful cmpletin f the prject, yu will reinfrce yur understanding

More information

Math Foundations 20 Work Plan

Math Foundations 20 Work Plan Math Fundatins 20 Wrk Plan Units / Tpics 20.8 Demnstrate understanding f systems f linear inequalities in tw variables. Time Frame December 1-3 weeks 6-10 Majr Learning Indicatrs Identify situatins relevant

More information

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation III-l III. A New Evaluatin Measure J. Jiner and L. Werner Abstract The prblems f evaluatin and the needed criteria f evaluatin measures in the SMART system f infrmatin retrieval are reviewed and discussed.

More information

the results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must

the results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must M.E. Aggune, M.J. Dambrg, M.A. El-Sharkawi, R.J. Marks II and L.E. Atlas, "Dynamic and static security assessment f pwer systems using artificial neural netwrks", Prceedings f the NSF Wrkshp n Applicatins

More information

INSTRUMENTAL VARIABLES

INSTRUMENTAL VARIABLES INSTRUMENTAL VARIABLES Technical Track Sessin IV Sergi Urzua University f Maryland Instrumental Variables and IE Tw main uses f IV in impact evaluatin: 1. Crrect fr difference between assignment f treatment

More information

Admissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs

Admissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs Admissibility Cnditins and Asympttic Behavir f Strngly Regular Graphs VASCO MOÇO MANO Department f Mathematics University f Prt Oprt PORTUGAL vascmcman@gmailcm LUÍS ANTÓNIO DE ALMEIDA VIEIRA Department

More information

Determining the Accuracy of Modal Parameter Estimation Methods

Determining the Accuracy of Modal Parameter Estimation Methods Determining the Accuracy f Mdal Parameter Estimatin Methds by Michael Lee Ph.D., P.E. & Mar Richardsn Ph.D. Structural Measurement Systems Milpitas, CA Abstract The mst cmmn type f mdal testing system

More information

Distributions, spatial statistics and a Bayesian perspective

Distributions, spatial statistics and a Bayesian perspective Distributins, spatial statistics and a Bayesian perspective Dug Nychka Natinal Center fr Atmspheric Research Distributins and densities Cnditinal distributins and Bayes Thm Bivariate nrmal Spatial statistics

More information

BOUNDED UNCERTAINTY AND CLIMATE CHANGE ECONOMICS. Christopher Costello, Andrew Solow, Michael Neubert, and Stephen Polasky

BOUNDED UNCERTAINTY AND CLIMATE CHANGE ECONOMICS. Christopher Costello, Andrew Solow, Michael Neubert, and Stephen Polasky BOUNDED UNCERTAINTY AND CLIMATE CHANGE ECONOMICS Christpher Cstell, Andrew Slw, Michael Neubert, and Stephen Plasky Intrductin The central questin in the ecnmic analysis f climate change plicy cncerns

More information

Pure adaptive search for finite global optimization*

Pure adaptive search for finite global optimization* Mathematical Prgramming 69 (1995) 443-448 Pure adaptive search fr finite glbal ptimizatin* Z.B. Zabinskya.*, G.R. Wd b, M.A. Steel c, W.P. Baritmpa c a Industrial Engineering Prgram, FU-20. University

More information

Trigonometric Ratios Unit 5 Tentative TEST date

Trigonometric Ratios Unit 5 Tentative TEST date 1 U n i t 5 11U Date: Name: Trignmetric Ratis Unit 5 Tentative TEST date Big idea/learning Gals In this unit yu will extend yur knwledge f SOH CAH TOA t wrk with btuse and reflex angles. This extensin

More information

initially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur

initially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur Cdewrd Distributin fr Frequency Sensitive Cmpetitive Learning with One Dimensinal Input Data Aristides S. Galanpuls and Stanley C. Ahalt Department f Electrical Engineering The Ohi State University Abstract

More information

Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Key Wrds: Autregressive, Mving Average, Runs Tests, Shewhart Cntrl Chart

Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Key Wrds: Autregressive, Mving Average, Runs Tests, Shewhart Cntrl Chart Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Sandy D. Balkin Dennis K. J. Lin y Pennsylvania State University, University Park, PA 16802 Sandy Balkin is a graduate student

More information

Resampling Methods. Chapter 5. Chapter 5 1 / 52

Resampling Methods. Chapter 5. Chapter 5 1 / 52 Resampling Methds Chapter 5 Chapter 5 1 / 52 1 51 Validatin set apprach 2 52 Crss validatin 3 53 Btstrap Chapter 5 2 / 52 Abut Resampling An imprtant statistical tl Pretending the data as ppulatin and

More information

Computational modeling techniques

Computational modeling techniques Cmputatinal mdeling techniques Lecture 4: Mdel checing fr ODE mdels In Petre Department f IT, Åb Aademi http://www.users.ab.fi/ipetre/cmpmd/ Cntent Stichimetric matrix Calculating the mass cnservatin relatins

More information

We say that y is a linear function of x if. Chapter 13: The Correlation Coefficient and the Regression Line

We say that y is a linear function of x if. Chapter 13: The Correlation Coefficient and the Regression Line Chapter 13: The Crrelatin Cefficient and the Regressin Line We begin with a sme useful facts abut straight lines. Recall the x, y crdinate system, as pictured belw. 3 2 1 y = 2.5 y = 0.5x 3 2 1 1 2 3 1

More information

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9. Sectin 7 Mdel Assessment This sectin is based n Stck and Watsn s Chapter 9. Internal vs. external validity Internal validity refers t whether the analysis is valid fr the ppulatin and sample being studied.

More information

The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition

The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition The Kullback-Leibler Kernel as a Framewrk fr Discriminant and Lcalized Representatins fr Visual Recgnitin Nun Vascncels Purdy H Pedr Mren ECE Department University f Califrnia, San Dieg HP Labs Cambridge

More information

MATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank

MATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank MATCHING TECHNIQUES Technical Track Sessin VI Emanuela Galass The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Emanuela Galass fr the purpse f this wrkshp When can we use

More information

Particle Size Distributions from SANS Data Using the Maximum Entropy Method. By J. A. POTTON, G. J. DANIELL AND B. D. RAINFORD

Particle Size Distributions from SANS Data Using the Maximum Entropy Method. By J. A. POTTON, G. J. DANIELL AND B. D. RAINFORD 3 J. Appl. Cryst. (1988). 21,3-8 Particle Size Distributins frm SANS Data Using the Maximum Entrpy Methd By J. A. PTTN, G. J. DANIELL AND B. D. RAINFRD Physics Department, The University, Suthamptn S9

More information

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter Midwest Big Data Summer Schl: Machine Learning I: Intrductin Kris De Brabanter kbrabant@iastate.edu Iwa State University Department f Statistics Department f Cmputer Science June 24, 2016 1/24 Outline

More information

Churn Prediction using Dynamic RFM-Augmented node2vec

Churn Prediction using Dynamic RFM-Augmented node2vec Churn Predictin using Dynamic RFM-Augmented nde2vec Sandra Mitrvić, Jchen de Weerdt, Bart Baesens & Wilfried Lemahieu Department f Decisin Sciences and Infrmatin Management, KU Leuven 18 September 2017,

More information

Phys. 344 Ch 7 Lecture 8 Fri., April. 10 th,

Phys. 344 Ch 7 Lecture 8 Fri., April. 10 th, Phys. 344 Ch 7 Lecture 8 Fri., April. 0 th, 009 Fri. 4/0 8. Ising Mdel f Ferrmagnets HW30 66, 74 Mn. 4/3 Review Sat. 4/8 3pm Exam 3 HW Mnday: Review fr est 3. See n-line practice test lecture-prep is t

More information

Performance Bounds for Detect and Avoid Signal Sensing

Performance Bounds for Detect and Avoid Signal Sensing Perfrmance unds fr Detect and Avid Signal Sensing Sam Reisenfeld Real-ime Infrmatin etwrks, University f echnlgy, Sydney, radway, SW 007, Australia samr@uts.edu.au Abstract Detect and Avid (DAA) is a Cgnitive

More information

Lyapunov Stability Stability of Equilibrium Points

Lyapunov Stability Stability of Equilibrium Points Lyapunv Stability Stability f Equilibrium Pints 1. Stability f Equilibrium Pints - Definitins In this sectin we cnsider n-th rder nnlinear time varying cntinuus time (C) systems f the frm x = f ( t, x),

More information

Floating Point Method for Solving Transportation. Problems with Additional Constraints

Floating Point Method for Solving Transportation. Problems with Additional Constraints Internatinal Mathematical Frum, Vl. 6, 20, n. 40, 983-992 Flating Pint Methd fr Slving Transprtatin Prblems with Additinal Cnstraints P. Pandian and D. Anuradha Department f Mathematics, Schl f Advanced

More information

Lecture 3: Principal Components Analysis (PCA)

Lecture 3: Principal Components Analysis (PCA) Lecture 3: Principal Cmpnents Analysis (PCA) Reading: Sectins 6.3.1, 10.1, 10.2, 10.4 STATS 202: Data mining and analysis Jnathan Taylr, 9/28 Slide credits: Sergi Bacallad 1 / 24 The bias variance decmpsitin

More information

NUMBERS, MATHEMATICS AND EQUATIONS

NUMBERS, MATHEMATICS AND EQUATIONS AUSTRALIAN CURRICULUM PHYSICS GETTING STARTED WITH PHYSICS NUMBERS, MATHEMATICS AND EQUATIONS An integral part t the understanding f ur physical wrld is the use f mathematical mdels which can be used t

More information

Localized Model Selection for Regression

Localized Model Selection for Regression Lcalized Mdel Selectin fr Regressin Yuhng Yang Schl f Statistics University f Minnesta Church Street S.E. Minneaplis, MN 5555 May 7, 007 Abstract Research n mdel/prcedure selectin has fcused n selecting

More information

Lab 1 The Scientific Method

Lab 1 The Scientific Method INTRODUCTION The fllwing labratry exercise is designed t give yu, the student, an pprtunity t explre unknwn systems, r universes, and hypthesize pssible rules which may gvern the behavir within them. Scientific

More information

Aerodynamic Separability in Tip Speed Ratio and Separability in Wind Speed- a Comparison

Aerodynamic Separability in Tip Speed Ratio and Separability in Wind Speed- a Comparison Jurnal f Physics: Cnference Series OPEN ACCESS Aerdynamic Separability in Tip Speed Rati and Separability in Wind Speed- a Cmparisn T cite this article: M L Gala Sants et al 14 J. Phys.: Cnf. Ser. 555

More information

Optimization Programming Problems For Control And Management Of Bacterial Disease With Two Stage Growth/Spread Among Plants

Optimization Programming Problems For Control And Management Of Bacterial Disease With Two Stage Growth/Spread Among Plants Internatinal Jurnal f Engineering Science Inventin ISSN (Online): 9 67, ISSN (Print): 9 676 www.ijesi.rg Vlume 5 Issue 8 ugust 06 PP.0-07 Optimizatin Prgramming Prblems Fr Cntrl nd Management Of Bacterial

More information

Lead/Lag Compensator Frequency Domain Properties and Design Methods

Lead/Lag Compensator Frequency Domain Properties and Design Methods Lectures 6 and 7 Lead/Lag Cmpensatr Frequency Dmain Prperties and Design Methds Definitin Cnsider the cmpensatr (ie cntrller Fr, it is called a lag cmpensatr s K Fr s, it is called a lead cmpensatr Ntatin

More information

BASD HIGH SCHOOL FORMAL LAB REPORT

BASD HIGH SCHOOL FORMAL LAB REPORT BASD HIGH SCHOOL FORMAL LAB REPORT *WARNING: After an explanatin f what t include in each sectin, there is an example f hw the sectin might lk using a sample experiment Keep in mind, the sample lab used

More information

How do scientists measure trees? What is DBH?

How do scientists measure trees? What is DBH? Hw d scientists measure trees? What is DBH? Purpse Students develp an understanding f tree size and hw scientists measure trees. Students bserve and measure tree ckies and explre the relatinship between

More information

Slide04 (supplemental) Haykin Chapter 4 (both 2nd and 3rd ed): Multi-Layer Perceptrons

Slide04 (supplemental) Haykin Chapter 4 (both 2nd and 3rd ed): Multi-Layer Perceptrons Slide04 supplemental) Haykin Chapter 4 bth 2nd and 3rd ed): Multi-Layer Perceptrns CPSC 636-600 Instructr: Ynsuck Che Heuristic fr Making Backprp Perfrm Better 1. Sequential vs. batch update: fr large

More information

Video Encoder Control

Video Encoder Control Vide Encder Cntrl Thmas Wiegand Digital Image Cmmunicatin 1 / 41 Outline Intrductin Encder Cntrl using Lagrange multipliers Lagrangian ptimizatin Lagrangian bit allcatin Lagrangian Optimizatin in Hybrid

More information

A Matrix Representation of Panel Data

A Matrix Representation of Panel Data web Extensin 6 Appendix 6.A A Matrix Representatin f Panel Data Panel data mdels cme in tw brad varieties, distinct intercept DGPs and errr cmpnent DGPs. his appendix presents matrix algebra representatins

More information

V. Balakrishnan and S. Boyd. (To Appear in Systems and Control Letters, 1992) Abstract

V. Balakrishnan and S. Boyd. (To Appear in Systems and Control Letters, 1992) Abstract On Cmputing the WrstCase Peak Gain f Linear Systems V Balakrishnan and S Byd (T Appear in Systems and Cntrl Letters, 99) Abstract Based n the bunds due t Dyle and Byd, we present simple upper and lwer

More information

Revision: August 19, E Main Suite D Pullman, WA (509) Voice and Fax

Revision: August 19, E Main Suite D Pullman, WA (509) Voice and Fax .7.4: Direct frequency dmain circuit analysis Revisin: August 9, 00 5 E Main Suite D Pullman, WA 9963 (509) 334 6306 ice and Fax Overview n chapter.7., we determined the steadystate respnse f electrical

More information

Application of ILIUM to the estimation of the T eff [Fe/H] pair from BP/RP

Application of ILIUM to the estimation of the T eff [Fe/H] pair from BP/RP Applicatin f ILIUM t the estimatin f the T eff [Fe/H] pair frm BP/RP prepared by: apprved by: reference: issue: 1 revisin: 1 date: 2009-02-10 status: Issued Cryn A.L. Bailer-Jnes Max Planck Institute fr

More information

SURVIVAL ANALYSIS WITH SUPPORT VECTOR MACHINES

SURVIVAL ANALYSIS WITH SUPPORT VECTOR MACHINES 1 SURVIVAL ANALYSIS WITH SUPPORT VECTOR MACHINES Wlfgang HÄRDLE Ruslan MORO Center fr Applied Statistics and Ecnmics (CASE), Humbldt-Universität zu Berlin Mtivatin 2 Applicatins in Medicine estimatin f

More information

making triangle (ie same reference angle) ). This is a standard form that will allow us all to have the X= y=

making triangle (ie same reference angle) ). This is a standard form that will allow us all to have the X= y= Intrductin t Vectrs I 21 Intrductin t Vectrs I 22 I. Determine the hrizntal and vertical cmpnents f the resultant vectr by cunting n the grid. X= y= J. Draw a mangle with hrizntal and vertical cmpnents

More information

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets Department f Ecnmics, University f alifrnia, Davis Ecn 200 Micr Thery Prfessr Giacm Bnann Insurance Markets nsider an individual wh has an initial wealth f. ith sme prbability p he faces a lss f x (0

More information

Document for ENES5 meeting

Document for ENES5 meeting HARMONISATION OF EXPOSURE SCENARIO SHORT TITLES Dcument fr ENES5 meeting Paper jintly prepared by ECHA Cefic DUCC ESCOM ES Shrt Titles Grup 13 Nvember 2013 OBJECTIVES FOR ENES5 The bjective f this dcument

More information

Thermodynamics and Equilibrium

Thermodynamics and Equilibrium Thermdynamics and Equilibrium Thermdynamics Thermdynamics is the study f the relatinship between heat and ther frms f energy in a chemical r physical prcess. We intrduced the thermdynamic prperty f enthalpy,

More information

Physics 2B Chapter 23 Notes - Faraday s Law & Inductors Spring 2018

Physics 2B Chapter 23 Notes - Faraday s Law & Inductors Spring 2018 Michael Faraday lived in the Lndn area frm 1791 t 1867. He was 29 years ld when Hand Oersted, in 1820, accidentally discvered that electric current creates magnetic field. Thrugh empirical bservatin and

More information

Chapter 3 Kinematics in Two Dimensions; Vectors

Chapter 3 Kinematics in Two Dimensions; Vectors Chapter 3 Kinematics in Tw Dimensins; Vectrs Vectrs and Scalars Additin f Vectrs Graphical Methds (One and Tw- Dimensin) Multiplicatin f a Vectr b a Scalar Subtractin f Vectrs Graphical Methds Adding Vectrs

More information

Broadcast Program Generation for Unordered Queries with Data Replication

Broadcast Program Generation for Unordered Queries with Data Replication Bradcast Prgram Generatin fr Unrdered Queries with Data Replicatin Jiun-Lng Huang and Ming-Syan Chen Department f Electrical Engineering Natinal Taiwan University Taipei, Taiwan, ROC E-mail: jlhuang@arbr.ee.ntu.edu.tw,

More information

WRITING THE REPORT. Organizing the report. Title Page. Table of Contents

WRITING THE REPORT. Organizing the report. Title Page. Table of Contents WRITING THE REPORT Organizing the reprt Mst reprts shuld be rganized in the fllwing manner. Smetime there is a valid reasn t include extra chapters in within the bdy f the reprt. 1. Title page 2. Executive

More information

A New Approach to Increase Parallelism for Dependent Loops

A New Approach to Increase Parallelism for Dependent Loops A New Apprach t Increase Parallelism fr Dependent Lps Yeng-Sheng Chen, Department f Electrical Engineering, Natinal Taiwan University, Taipei 06, Taiwan, Tsang-Ming Jiang, Arctic Regin Supercmputing Center,

More information