Bounds on the Generalization Performance of Kernel Machines Ensembles

Size: px
Start display at page:

Download "Bounds on the Generalization Performance of Kernel Machines Ensembles"

Transcription

1 Bounds on the Generalzaton Performance of Kernel Machnes Ensembles Theodoros Evgenou Lus Perez-Breva Massmlano Pontl Tomaso Poggo Center for Bologcal and Computatonal Learnng, MIT, 45 Carleton Street E25-201, Cambrdge, MA Abstract We study the problem of learnng usng combnatons of machnes. In partcular we present new theoretcal bounds on the generalzaton performance of votng ensembles of kernel machnes. Specal cases consdered are baggng and support vector machnes. We present expermental results supportng the theoretcal bounds, and descrbe characterstcs of kernel machnes ensembles suggested from the expermental fndngs. We also show how such ensembles can be used for fast tranng wth very large datasets. 1. Introducton Two major recent advances n learnng theory are support vector machnes (SVM) (Vapnk, 1998) and ensemble methods such as boostng and baggng (Breman, 1996; Schapre et al., 1998). Dstrbuton ndependent bounds on the generalzaton performance of these two technques have been suggested recently (Shawe-Taylor & Crstann, 1998; Bartlett, 1998; Schapre et al., 1998), and smlartes between these bounds n terms of a geometrc quantty known as the margn have been proposed. More recently bounds on the generalzaton performance of SVM based on crossvaldaton have been derved (Vapnk, 1998; Chapelle & Vapnk, 1999). These bounds depend also on geometrc quanttes other than the margn (such as the radus of the smallest sphere contanng the support vectors). In ths paper we study the generalzaton performance of ensembles of general kernel machnes usng crossvaldaton arguments. The kernel machnes consdered are learnng machnes of the form f(x) = l α K(x, x), (1) =1 where (x,y ) =1...l are the tranng ponts and K s a kernel functon, for example a Gaussan - see (Vapnk, 1998; Wahba, 1990) for a number of kernels. The coeffcents α are learned by solvng the followng optmzaton problem: max α l =1 S(α) 1 2 l,j=1 ααjyykj subject to : 0 α C where S( ) s a cost functon, C a constant, and we have defned K j = K j. SVM are a partcular case of these machnes for S(α) = α. For SVM, ponts for whch α 0 are called support vectors. Notce that the bas term (threshold b n the general case of machnes f(x) = l =1 α K(x, x)+b) sncorporated n the kernel K. We study the cross-valdaton error of general votng ensembles of kernel machnes (2). Partcular cases are that of baggng kernel machnes each traned on dfferent subsamples of the ntal tranng set, or that of votng kernel machnes each usng a dfferent kernel (also dfferent subsets of features/components of the ntal nput features). We frst present bounds on the generalzaton error of such ensembles, and then dscuss experments where the derved bounds are used for model selecton. We also show expermental results showng that a valdaton set can be used for model selecton for kernel machnes and ther ensembles, wthout havng to decrease the tranng set sze n order to create a valdaton set. Fnally we show how such ensembles can be used for fast tranng wth very large data sets. 2. Generalzaton Performance of Kernel Machnes Ensembles The theoretcal results of the paper are based on the cross-valdaton (or leave-one-out) error. The crossvaldaton procedure conssts of removng from the (2)

2 tranng set one pont at a tme, tranng a machne on the remanng ponts and then testng on the removed one. The number of errors counted throughout ths process, L ((x,y 1 ),...(x l,y l )), s called the cross-valdaton error. It s known that ths quantty provdes an estmate of the generalzaton performance of a machne (Wahba, 1990; Vapnk, 1998). In partcular the expectaton of the generalzaton error of a machne traned usng l ponts s bounded by the expectaton of the cross valdaton error of a machne traned on l + 1 ponts (Luntz and Bralovsky theorem (Vapnk, 1998)). We begn wth some known results on the crossvaldaton error of kernel machnes. The followng theorem s from (Jaakkola & Haussler, 1998) : Theorem 2.1 The cross-valdaton error of a kernel machne (2) s upper bounded as: L ((x,y 1 ),...(x l,y l )) l θ(α K y f(x )) (3) =1 where θ s the Heavysde functon, and f s the optmal functon found by solvng maxmzaton problem (2). In the partcular case of SVM where the data are separable (3) can be bounded by geometrc quanttes, namely (Vapnk, 1998): l =1 θ(α K y f(x )) D2 sv ρ 2 (4) where D sv s the radus of the smallest sphere n the feature space nduced by kernel K (Wahba, 1990; Vapnk, 1998) centered at the orgn contanng the support vectors, and ρ s the margn (ρ 2 = 1 )ofthe f 2 K SVM. Usng ths result, the followng theorem s a drect applcaton of the Luntz and Bralovsky theorem (Vapnk, 1998): Theorem 2.2 The average generalzaton error of an SVM (wth zero threshold b, and n the separable case) traned on l ponts s upper bounded by 1 l +1 E ( D 2 sv(l) ρ 2 (l) ), where the expectaton E s taken wth respect to the probablty of a tranng set of sze l. Notce that ths result shows that the performance of SVM does not depend only on the margn, but also on other geometrc quanttes, namely the radus D sv. In the non-separable case, t can be shown (the proof s smlar to that of corollary 2.2 below) that equaton (4) can be wrtten as: l =1 θ(α K y f(x )) EE 1 + D2 sv ρ 2 (5) where EE 1 s the hard margn emprcal error of the SVM (the number of tranng ponts wth yf(x) < 1). We now extend these results to the case of ensembles of kernel machnes. We consder the general case where each of the machnes n the ensemble uses a dfferent kernel. Let T be the number of machnes, and let be the kernel used by machne t. Notce that, as a specal case, approprate choces of lead to machnes that may have dfferent subsets of features from the orgnal ones. Let f (t) (x) be the optmal soluton of machne t (real-valued), and α (t) the optmal weght that machne t assgns to pont (x,y ) (after solvng problem (2)). We consder ensembles that are lnear combnatons of the ndvdual machnes. In partcular, the separatng surface of the ensemble s: F (x) = c t f (t) (x) (6) and the classfcaton s done by takng the sgn of ths functon. The coeffcents c t are not learned (.e. c t = 1 T ), and T c t = 1 (for scalng reasons), c t > 0. All parameters (C s and kernels) are fxed before tranng. In the partcular case of baggng, the subsamplng of the tranng data should be determnstc. Wth ths we mean that when the bounds are used for model (parameter) selecton, for each model the same subsample sets of the data need to be used. These subsamples, however, are stll random ones. We beleve that the results presented below also hold (wth mnor modfcatons) n the general case that the subsamplng s always random. We now consder the cross-valdaton error of such ensembles. Theorem 2.3 The cross-valdaton error L ((x,y 1 ),...(x l,y l )) of a kernel machnes ensemble s upper bounded by: l θ( c t α (t) y F (x )) (7) =1 The proof of ths theorem s based on the followng lemma shown n Vapnk (1998) and n Jaakkola and Haussler (1998): Lemma 2.1 Let α be the coeffcent of the soluton f(x) of machne (2) correspondng to pont (x,y ),

3 α 0.Letf (x) be the soluton of machne (2) found when pont (x,y ) s removed from the tranng set. Then: y f (x ) y f(x ) α K. Usnglemma2.1wecannowprovetheorem2.3. Proof of theorem 2.3: Let F (x) = T c tf (t) (x) be the fnal machne traned wth all ntal tranng data except (x,y ). Lemma 2.1 gves that y y F (x )=y c tf (t) (x) θ( y F (x )) θ( = y F (x ) c tf (t) (x ) = y F (x )) therefore the leave one out error l =1 θ( y F (x )) s not more than l =1 θ( T y F (x )) whch proves the theorem. 2 Notce that the bound has the same form as bound (3): for each pont (x,y ) we only need to take nto account ts correspondng parameter α (t) and remove the effects of α (t) from the value of F (x ). The cross-valdaton error can also be bounded usng geometrc quanttes. To ths purpose we ntroduce one more parameter that we call the ensemble margn (n contrast to the margn of a sngle SVM). For each pont (x,y ) we defne ts ensemble margn to be smply y F (x ). Ths s exactly the defnton of margn n (Schapre et al., 1998). For any gven δ>0 we defne EE δ to be the number of tranng ponts wth ensemble margn <δ(emprcal error wth margn δ), and by N δ the set of the remanng tranng ponts - the ones wth ensemble margn δ. Fnally, we note by D t(δ) to be the radus of the smallest sphere n the feature space nduced by kernel centered at the orgn contanng the ponts of machne t wth α (t) 0 and ensemble margn larger than δ (n the case of SVM, these are the support vectors of machne t wth ensemble margn larger than δ). A smple consequence of theorem 2.3 and of the nequalty Dt(δ) 2 for ponts x wth α (t) 0 and ensemble margn y F (x ) δ s the followng: Corollary 2.1 For any δ>0 the cross-valdaton error L ((x,y 1 ),...(x l,y l )) of a kernel machnes ensemble s upper bounded by: ( EE δ + 1 T c t Dt(δ) 2 δ ( α (t) ) N δ ) (8) Proof: For each tranng pont (x,y ) wth ensemble margn y F (x ) <δwe upper bound θ( T y F (x )) wth 1 (ths s a trval bound). For the remanng ponts (the ponts n N δ ) we show that: If θ( y F (x )) 1 δ T y F (x ) < 0, then: θ( y F (x )) = 0 1 δ On the other hand, f T then θ( T ctα(t). ctα(t) y F (x ) δ 1 δ. y F (x ) 0, y F (x )) = 1, whle 1. So n both cases nequalty (9) holds. Therefore: l θ( y F (x )) =1 ( ) EE δ + 1 c t δ α(t) N δ ( EE δ + 1 T ) c td 2 δ t(δ)( α (t) ) N δ whch proves the corollary. 2 Notce that equaton (8) holds for any δ>0, so the best bound s obtaned for the mnmum of the rght hand sde wth respect to δ > 0. Usng the Luntz and Bralovsky theorem, theorems 2.3 and 2.1 provde bounds on the generalzaton performance of general kernel machnes ensembles lke that of theorem 2.2. We now consder the partcular case of SVM ensembles. In ths case, for example choosng δ = 1 (8) becomes: Corollary 2.2 The leave-one-out error of an ensemble of SVMs s upper bounded by: L ((x,y 1 ),...(x l,y l )) EE 1 + c t D 2 t ρ 2 t (9)

4 where EE 1 s the margn emprcal error wth ensemble margn 1, D t s the radus of the smallest sphere centered at the orgn, n the feature space nduced by kernel, contanng the support vectors of machne t, andρ t s the margn of SVM t. Ths s because clearly D t D t(δ) for any δ, and N δ α (t) l =1 α(t) = 1 (see (Vapnk, 1998) for ρ 2 t a proof of ths equalty). A number of remarks can be made from equaton (9). Frst notce that the generalzaton performance of the SVM ensemble now depends on the average (convex combnaton of) D2 ρ of the ndvdual machnes. In 2 some cases ths may be smaller than the D2 ρ of a sngle 2 SVM. For example, suppose we tran many SVMs on dfferent subsamples of the tranng ponts and we want to compare such an ensemble wth a sngle SVM usng all the ponts. If all SVMs (the sngle one, as well as the ndvdual ones of the ensemble) use most of ther tranng ponts as support vectors, then clearly the D 2 of each SVM n the ensemble s smaller than that of the sngle SVM. Moreover the margn of each SVM n the ensemble s expected to be larger than that of the sngle SVM usng all the ponts. So the average D2 ρ 2 n ths case s expected to be smaller than that of the sngle SVM. Another case where an ensemble of SVMs may be better than a sngle SVM s the one where there are outlers among the tranng data: f the ndvdual SVMs are traned on subsamples of the tranng data, some of the machnes may have smaller D2 ρ because 2 they do not use some outlers. In general t s not clear when ensembles of kernel machnes are better than sngle machnes. The bounds n ths secton may provde some nsght to ths queston. Notce also how the ensemble margn δ plays a role for the generalzaton performance of kernel machne ensembles. Ths margn s also shown to be mportant for boostng (Schapre et al., 1998). Fnally, notce that all the results dscussed hold for the case that there s no bas (threshold b), or the case where the bas s ncluded n the kernel (as dscussed n the ntroducton). In the experments dscussed below we use the results also for cases where the bas s not regularzed, whch s common n practce. It may be possble to use recent theoretcal results (Chapelle & Vapnk, 1999) on the leave-one-out bounds of SVM when the bas b s taken nto account n order to study the generalzaton performance of kernel machnes ensembles wth the bas b. 3. Experments To test how tght the bounds we presented are, we conducted a number of experments usng data sets from UCI, 1 as well as the US Postal Servce (USPS) data set (LeCun et al., 1990). We show results for some of the sets n Fgures 1-2. For each data set we splt the overall set n tranng and testng (the szes are shown n the fgures) n 50 dfferent (random) ways, and for each splt: 1. We traned one SVM wth b = 0 usng all tranng data, computed the leave-one-bound gven by theorem 2.1, and then compute the test performance usng the test set. 2. We repeated (1) ths tme wth wth b We traned 30 SVMs wth b = 0 each usng a random subsample of sze 40% of the tranng data (baggng), computed the leave-one-bound gven by theorem 2.3 usng c t = 1 30, and then compute the test performance usng the test set. 4. We repeated (3) ths tme wth wth b 0. We then averaged over the 50 tranng-testng splts the test performances and the leave-one-out bounds found, and computed the standard devatons. All machnes were traned usng a Gaussan kernel, and we repeated the procedure for a number of dfferent σ s of the Gaussan, and for a fxed C (show n the fgures). We show the averages and standard devatons of the results n the fgures. In all fgures we use the followng notaton: top left fgure: baggng wth b =0; top rght fgure: sngle SVM wth b = 0; bottom left fgure: baggng wth b 0; and bottom rght fgure: sngle SVM wth b 0. In all plots the sold lne s the mean test performance and the dashed lne s the error bound computed usng the leave-one-out theorems (theorems 2.1 and 2.3). The dotted lne s the valdaton set error dscussed below. For smplcty, only one error bar (standard devaton over the 50 tranngtestng splts) s shown (the others were smlar). The cost parameter C used s gven n each of the fgures. The horzontal axs s the natural logarthm of the σ of the Gaussan kernel used, whle the vertcal axs s the error. An nterestng observaton s that the bounds are always tghter for the case of baggng than they are for thecaseofasnglesvm. Ths s an nterestng expermental fndng for whch we do not have a theoretcal explanaton. It may be because the generalzaton performance of a machne s related to the expected leave-one-out error of the machne (Vapnk, 1 mlearn/mlrepostory.html

5 Fgure 1. Breast cancer data: see text for descrpton. 1998), and by combnng many machnes each usng a dfferent (random) subset of the tranng data we better approxmate the expected leave-one-out than we do when we only compute the leave-one-out of a sngle machne. Ths fndng can practcally justfy the use of ensembles of machnes for model selecton: parameter selecton usng the leave-one-out bounds presented n ths paper s easer for ensembles of machnes than t s for sngle machnes. Another nterestng observaton s that the bounds seem to work smlarly n the case that the bas b s not 0. In ths case, as before, the bounds are tghter for ensembles of machnes than they are for sngle machnes. In all our experments we always found that the bounds presented here (equatons (3) and (7)) get looser n the case that the parameter C used durng the tranng, s large. A representatve example s shown n fgure 3. Ths result can be understood lookng at the leave-one-out bound for a sngle SVM (equaton (3)). Let (x,y )beasupportvectorfor whch y f(x ) < 1. It s known (Vapnk, 1998) that for these support vectors the coeffcent α s C. If C s such that CK > 1 (for the Gaussan kernel ths reduces to C>1, as K(x, x) = 1), then clearly θ(ck y f(x )) = 1. In ths case the bound n equaton (3) effectvely counts all support vectors wth margn less then one (plus some of the ones on the margn - yf(x) = 1). Ths means that for large C (n the case of Gaussan kernels ths can be for example for any C>1), the bounds of ths paper effectvely are smlar (not larger than) to another known leaveone-out bound for SVMs, namely one that uses the number of all support vectors to bound generalzaton Fgure 2. USPS data: see text for descrpton. performance (Vapnk, 1998). So effectvely our expermental results show that the number of support vectors does not provde a good estmate of the generalzaton performance of the SVMs and ther ensembles. 4. Valdaton Set for Model Selecton Instead of usng bounds on the generalzaton performance of learnng machnes lke the ones dscussed above, an alternatve approach for model selecton s to use a valdaton set to choose the parameters of the machnes. We consder frst the smple case where we have N machnes and we choose the best one based on the error they make on a fxed valdaton set of sze V. Ths can be thought of as a specal case where we consder as our hypothess space to be the set of the N machnes, and then we tran by smply pckng the machne wth the smallest emprcal error (n ths case ths s the valdaton error). It s known that f VE s the valdaton error of machne and TE s ts true test error, then for all N machnes smultaneously the followng bound holds wth probablty 1 η (Devroye et al., 1996; Vapnk, 1998) : log(n) log( η 4 TE VE + ) (10) V So how accurately we pck the best machne usng the valdaton set depends, as expected, on the number of machnes N andontheszev of the valdaton set. The bound suggests that a valdaton set can be used to accurately estmate the generalzaton performance of a relatvely small number of machnes (.e. small number of parameter values examned), as done often n practce.

6 error Fgure 3. USPS data: usng a large C (C=50). In ths case the bounds do not work; see text for an explanaton. We used ths observaton for parameter selecton for SVM and for ther ensembles. Expermentally we followed a slghtly dfferent procedure from what s suggested by bound (10): for each machne (that s, for each σ of the Gaussan kernel n our case, both for a sngle SVM and for an ensemble of machnes) we splt the tranng set (for each tranng-testng splt of the overall data set as descrbed above) nto a smaller tranng set and a valdaton set (70-30% respectvely). We traned each machne usng the new, smaller tranng set, and measured the performance of the machne on the valdaton set. Unlke what bound (10) suggests, nstead of comparng the valdaton performance found wth the generalzaton performance of the machnes traned on the smaller tranng set (whch s the case for whch bound (10) holds), we compared the valdaton performance wth the test performance of the machne traned usng all the ntal (larger) tranng set. Ths way we dd not have to use less ponts for tranng the machnes, whch s a typcal drawback of usng a valdaton set, and we could compare the valdaton performance wth the leave-one-out bounds and the test performance of the exact same machnes we used n the prevous secton. We show the results of these experments n fgures 1-2: see the dotted lnes n the plots. We observe that although the valdaton error s that of a machne traned on a smaller tranng set, t stll provdes a very good estmate of the test performance of the machnes traned on the whole tranng set. In all cases, ncludng the case of C>1for whch the leave-one-out bounds dscussed above dd not work well, the valdaton set error provded a very good estmate of the test sgma Fgure 4. When the coeffcents of the second layer are learned usng a lnear SVM the system s less senstve to changes of the σ of the Gaussan kernel used by the ndvdual machnes of the ensemble. Sold lne s one SVM, dotted s ensemble of 30 SVMs wth fxed c t = 1 30,and dashed lne s ensemble of 30 SVMs wth the coeffcents c t learned. The horzontal axs shows the natural logarthm of the σ of the Gaussan kernel. The data use s the heart data set. The threshold b s non-zero for ths experment. performance of the machnes. 5. Other Ensembles The ensemble kernel machnes (6) consdered so far are votng combnatons where the coeffcents c t n (6) of the lnear combnaton of the machnes are fxed. We now consder the case where these coeffcents are also learned from the tranng subsets. In partcular we consder the followng archtecture: a. A number T of kernel machnes s traned as before (for example usng dfferent tranng data, or dfferent parameters). b. The T outputs (real valued n our experments, but could also be thresholded - bnary) of the machnes at each of the tranng ponts are computed. c. A lnear machne (.e. lnear SVM) s traned usng as nputs the outputs of the T machnes on the tranng data, and as labels the orgnal tranng labels. The soluton s used as the coeffcents c t of the lnear combnaton of the T machnes. Notce that for ths type of machnes the leave-one-out bound of theorem 2.3 does not hold snce the theorem assumes fxed coeffcents c t. A valdaton set can stll be used for model selecton for these machnes. On the other hand, an mportant characterstc of ths type of ensembles s that ndependent of what kernels/parameters each of the ndvdual machnes of the ensemble use, the second layer machne (whch fnds coeffcents c t ) uses always a lnear kernel. Ths may

7 mply that the overall archtecture may not be very senstve to the kernel/parameters of the machnes of the ensemble. We tested ths hypothess expermentally by comparng how the test performance of ths type of machnes changes wth the σ of the Gaussan kernel used from the ndvdual machnes of the ensemble, and compared the behavor wth that of sngle machnes and ensembles of machnes wth fxed c t.in fgure 5 we show two example. In our experments, for all data sets except from one, learnng the coeffcents c t of the combnaton of the machnes usng a lnear machne (we used a lnear SVM) made the overall machne less senstve to changes of the parameters of the ndvdual machnes (σ of the Gaussan kernel). Ths can be practcally a useful characterstc of the archtecture outlned n ths secton: for example the choce of the kernel parameters of the machnes of the ensembles need not be tuned accurately. 6. Ensembles versus Sngle Machnes So far we concentrated on the theoretcal and expermental characterstcs of ensembles of kernel machnes. We now dscuss how ensembles compare wth sngle machnes. Table 1 shows the test performance of one SVM compared wth that of an ensemble of 30 SVMs combned wth c t = 1 30 and an ensemble of 30 SVMs combned usng a lnear SVM for some UCI data sets (characterstc results). For the tables of ths secton we use, for convenence, the followng notaton: VCC stands for Votng Combnatons of Classfers, meanng that the coeffcents c t of the combnaton of the machnes are fxed. ACC stands for Adaptve Combnatons of Classfers, meanng that the coeffcents c t of the combnaton of the machnes are learned-adapted. We only consder SVM and ensembles of SVMs wth the threshold b. The table shows mean test errors and standard devatons for the best (decded usng the valdaton set performance n ths case) parameters of the machnes (σ s of Gaussans and parameter C - hence dfferent from fgures 1-2 whch where for a gven C). As the results show, the best SVM and the best ensembles we found have about the same test performance. Therefore, wth approprate tunng of the parameters of the machnes, combnng SVM s does not lead to performance mprovement compared to a sngle SVM. Although the best SVM and the best ensemble (that s, after accurate parameter tunng) perform smlarly, an mportant dfference of the ensembles compared to a sngle machne s that the Table 1. Average errors and standard devatons (percentages) of the best machnes (best σ of the Gaussan kernel and best C) - chosen accordng to the valdaton set performances. The performances of the machnes are about the same. VCC and ACC use 30 SVM classfers. Data Set SVM VCC ACC Breast 25.5 ± ± ± 4 thyrod 5.1 ± ± ± 2.7 dabetes 23 ± ± ± 1.8 heart 15.4 ± ± ± 3.2 Table 2. Comparson between error rates of a sngle SVM v.s. error rates of VCC and ACC of 100 SVMs for dfferent percentages of subsampled data. The last data set s from (Osuna et al., 1997). Data Set VCC 5% VCC 1% SVM Dabetes ± 1.6 Thyrod ± 2.5 Faces tranng of the ensemble conssts of a large number of (parallelzable) small-tranng-set kernel machnes - n the case of baggng. Ths mples that one can gan performance smlar to that of a sngle machne by tranng many faster machnes usng smaller tranng sets. Ths can be an mportant practcal advantage of ensembles of machnes especally n the case of large data sets. Table 2 compares the test performance of a sngle SVM wth that of an ensemble of SVM each traned wth as low as 1% of the ntal tranng set (for one data set). For fxed c t the performance decreases only slghtly n all cases (thyrod, that we show, was the only data set we found n our experments for whch the change was sgnfcant for the case of VCC), whle n the case of the archtecture of secton 5 even wth 1% tranng data the performance does not decrease: ths s because the lnear machne used to learn coeffcents c t uses all the tranng data. Even n ths last case the overall machne can stll be faster than a sngle machne, snce the second layer learnng machne s a lnear one, and fast tranng methods for the partcular case of lnear machnes exst (Platt, 1998). Fnally, t may be the case that ensembles of machnes perform better for some problems n the presence of outlers (as dscussed n secton 3.1), or, f the ensemble conssts of machnes that use dfferent kernels and/or dfferent nput features, n the presence of rrelevant features. The leave-one-out bounds presented n ths paper may be used for fndng these cases and for better understandng how baggng and general ensemble methods work (Breman, 1996; Schapre et al., 1998).

8 7. Conclusons We presented theoretcal bounds on the generalzaton error of ensembles of kernel machnes. Our results apply to the qute general case where each of the machnes n the ensemble s traned on dfferent subsets of the tranng data and/or uses dfferent kernels or nput features. Expermental results supportng our theoretcal fndngs have been presented. A number of observatons have been made from the experments. We summarze some of them below: 1. The leave-one-out bounds for ensembles of machnes have a form smlar to that of sngle machnes. In the partcular case of SVMs, the bounds are based on an average geometrc quantty of the ndvdual SVMs of the ensemble (average margn and average radus of the sphere contanng the support vectors). 2. The leave-one-out bounds presented are expermentally found to be tghter than the equvalent ones for sngle machnes. 3. For SVM, the leave-one-out bounds based on the number of support vectors are expermentally found not to be tght. 4. Expermentally we found that a valdaton set can be used for accurate model selecton wthout havng to decrease the sze of the tranng set used n order to create a valdaton set. 5. Wth accurate parameter tunng (model selecton) sngle SVMs and ensembles of SVMs perform smlarly. 6. Ensembles of machnes for whch the coeffcents of combnng the machnes are also learned from the data are less senstve to changes of parameters (.e. kernel) than sngle machnes are. 7. Fast (parallel) tranng wthout sgnfcant loss of performance relatvely to sngle whole-large-tranngset machnes can be acheved usng ensembles of machnes. A number of questons and research drectons are open. An mportant theoretcal queston s how the bounds and experments presented n ths paper are related wth Breman s (1996) bas-varance analyss of baggng. For example, the fact that sngle SVM perform about the same as ensembles of SVMs may be because SVMs are stable machnes,.e., they have small varance (Breman, 1996). Experments on the basvarance decomposton of SVM and kernel machnes may lead to further understandng of these machnes. The theoretcal results we presented here approach the problem of learnng wth ensembles of machnes from a dfferent perspectve - from that of studyng the leaveone-out geometrc bounds of these machnes. On the practcal sde, further experments usng very large data sets are needed to support our expermental fndng that the ensembles of machnes can be used for fast tranng wthout sgnfcant loss n performance. Fnally, other theoretcal questons are how to extend the bounds of secton 2 to the type of machnes dscussed n secton 5, and how to use more recent leaveone-out bounds for SVM (Chapelle & Vapnk, 1999) to better characterze the performance of ensembles of machnes. References Bartlett, P. (1998). The sample complexty of pattern classfcaton wth neural networks: the sze of the weghts s more mportant that the sze of the network. IEEE Transactons on Informaton Theory, 44, Breman, L. (1996). Baggng predctors. Machne Learnng, 26, Chapelle, O., & Vapnk, V. (1999). Model selecton for support vector machnes. Proceedngs of the Twelfth Conference on Neural Informaton Processng Systems. Devroye, L., Györf, L., & Lugos, G. (1996). Aprobablstc theory of pattern recognton. New York, NY: Sprnger. Jaakkola, T., & Haussler, D. (1998). Explotng generatve models n dscrmnatve classfers. Proceedngs of the Eleventh Conference on Neural Informaton Processng Systems. LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R.E., Hubbard, W., & Jackel, L.J. (1989). Backpropagaton appled to handwrtten zp code recognton. Neural Computaton, 1, Osuna, E., Freund, R., & Gros, F. (1997). Support vector machnes: Tranng and applcatons. A.I. Memo 1602, AI Laboratory, Massachusetts Insttute of Technology, Cambrdge, MA. Platt, J. (1998). Fast tranng of support vector machnes usng sequental mnmal optmzaton. In C. Burges & B. Scholkopf, (Eds.), Advances n kernel methods support vector learnng. Cambrdge, MA: MIT Press. Schapre, R., Freund, Y., Bartlett, P. & Lee, W.S. (1998). Boostng the margn: A new explanaton for the effectveness of votng methods. The Annals of Statstcs, 26, Shawe-Taylor, J. & Crstann, N. (1998). Robust bounds on generalzaton from the margn dstrbuton (Techncal Report NC2-TR ). NeuroCOLT2, Royal Holloway, Unversty of London, UK. Vapnk, V. (1998). Statstcal learnng theory. New York, NY: Wley. Wahba, G. (1990). Splnes models for observatonal data. Seres n Appled Mathematcs, Vol. 59, SIAM, Phladelpha.

Leave One Out Error, Stability, and Generalization of Voting Combinations of Classifiers

Leave One Out Error, Stability, and Generalization of Voting Combinations of Classifiers Machne Learnng, 55, 71 97, 2004 c 2004 Kluwer Academc Publshers. Manufactured n The Netherlands. Leave One Out Error, Stablty, and Generalzaton of Votng Combnatons of Classfers THEODOROS EVGENIOU theodoros.evgenou@nsead.edu

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

CSC 411 / CSC D11 / CSC C11

CSC 411 / CSC D11 / CSC C11 18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t

More information

Evaluation of simple performance measures for tuning SVM hyperparameters

Evaluation of simple performance measures for tuning SVM hyperparameters Evaluaton of smple performance measures for tunng SVM hyperparameters Kabo Duan, S Sathya Keerth, Aun Neow Poo Department of Mechancal Engneerng, Natonal Unversty of Sngapore, 0 Kent Rdge Crescent, 960,

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA 4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

Linear Feature Engineering 11

Linear Feature Engineering 11 Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19

More information

Vapnik-Chervonenkis theory

Vapnik-Chervonenkis theory Vapnk-Chervonenks theory Rs Kondor June 13, 2008 For the purposes of ths lecture, we restrct ourselves to the bnary supervsed batch learnng settng. We assume that we have an nput space X, and an unknown

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

Chapter 11: Simple Linear Regression and Correlation

Chapter 11: Simple Linear Regression and Correlation Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014 COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #16 Scrbe: Yannan Wang Aprl 3, 014 1 Introducton The goal of our onlne learnng scenaro from last class s C comparng wth best expert and

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

x = , so that calculated

x = , so that calculated Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

Appendix B: Resampling Algorithms

Appendix B: Resampling Algorithms 407 Appendx B: Resamplng Algorthms A common problem of all partcle flters s the degeneracy of weghts, whch conssts of the unbounded ncrease of the varance of the mportance weghts ω [ ] of the partcles

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

STAT 3008 Applied Regression Analysis

STAT 3008 Applied Regression Analysis STAT 3008 Appled Regresson Analyss Tutoral : Smple Lnear Regresson LAI Chun He Department of Statstcs, The Chnese Unversty of Hong Kong 1 Model Assumpton To quantfy the relatonshp between two factors,

More information

Chapter 13: Multiple Regression

Chapter 13: Multiple Regression Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to

More information

/ n ) are compared. The logic is: if the two

/ n ) are compared. The logic is: if the two STAT C141, Sprng 2005 Lecture 13 Two sample tests One sample tests: examples of goodness of ft tests, where we are testng whether our data supports predctons. Two sample tests: called as tests of ndependence

More information

Affine transformations and convexity

Affine transformations and convexity Affne transformatons and convexty The purpose of ths document s to prove some basc propertes of affne transformatons nvolvng convex sets. Here are a few onlne references for background nformaton: http://math.ucr.edu/

More information

Chapter 6. Supplemental Text Material

Chapter 6. Supplemental Text Material Chapter 6. Supplemental Text Materal S6-. actor Effect Estmates are Least Squares Estmates We have gven heurstc or ntutve explanatons of how the estmates of the factor effects are obtaned n the textboo.

More information

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution Department of Statstcs Unversty of Toronto STA35HS / HS Desgn and Analyss of Experments Term Test - Wnter - Soluton February, Last Name: Frst Name: Student Number: Instructons: Tme: hours. Ads: a non-programmable

More information

Case A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k.

Case A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k. THE CELLULAR METHOD In ths lecture, we ntroduce the cellular method as an approach to ncdence geometry theorems lke the Szemeréd-Trotter theorem. The method was ntroduced n the paper Combnatoral complexty

More information

Edge Isoperimetric Inequalities

Edge Isoperimetric Inequalities November 7, 2005 Ross M. Rchardson Edge Isopermetrc Inequaltes 1 Four Questons Recall that n the last lecture we looked at the problem of sopermetrc nequaltes n the hypercube, Q n. Our noton of boundary

More information

Multigradient for Neural Networks for Equalizers 1

Multigradient for Neural Networks for Equalizers 1 Multgradent for Neural Netorks for Equalzers 1 Chulhee ee, Jnook Go and Heeyoung Km Department of Electrcal and Electronc Engneerng Yonse Unversty 134 Shnchon-Dong, Seodaemun-Ku, Seoul 1-749, Korea ABSTRACT

More information

Finding Dense Subgraphs in G(n, 1/2)

Finding Dense Subgraphs in G(n, 1/2) Fndng Dense Subgraphs n Gn, 1/ Atsh Das Sarma 1, Amt Deshpande, and Rav Kannan 1 Georga Insttute of Technology,atsh@cc.gatech.edu Mcrosoft Research-Bangalore,amtdesh,annan@mcrosoft.com Abstract. Fndng

More information

Linear Classification, SVMs and Nearest Neighbors

Linear Classification, SVMs and Nearest Neighbors 1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush

More information

Natural Language Processing and Information Retrieval

Natural Language Processing and Information Retrieval Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011 Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES

TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES SVANTE JANSON Abstract. We gve explct bounds for the tal probabltes for sums of ndependent geometrc or exponental varables, possbly wth dfferent

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

COS 521: Advanced Algorithms Game Theory and Linear Programming

COS 521: Advanced Algorithms Game Theory and Linear Programming COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

1 Matrix representations of canonical matrices

1 Matrix representations of canonical matrices 1 Matrx representatons of canoncal matrces 2-d rotaton around the orgn: ( ) cos θ sn θ R 0 = sn θ cos θ 3-d rotaton around the x-axs: R x = 1 0 0 0 cos θ sn θ 0 sn θ cos θ 3-d rotaton around the y-axs:

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

We present the algorithm first, then derive it later. Assume access to a dataset {(x i, y i )} n i=1, where x i R d and y i { 1, 1}.

We present the algorithm first, then derive it later. Assume access to a dataset {(x i, y i )} n i=1, where x i R d and y i { 1, 1}. CS 189 Introducton to Machne Learnng Sprng 2018 Note 26 1 Boostng We have seen that n the case of random forests, combnng many mperfect models can produce a snglodel that works very well. Ths s the dea

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

The Order Relation and Trace Inequalities for. Hermitian Operators

The Order Relation and Trace Inequalities for. Hermitian Operators Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence

More information

Online Classification: Perceptron and Winnow

Online Classification: Perceptron and Winnow E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015 CS 3710: Vsual Recognton Classfcaton and Detecton Adrana Kovashka Department of Computer Scence January 13, 2015 Plan for Today Vsual recognton bascs part 2: Classfcaton and detecton Adrana s research

More information

This column is a continuation of our previous column

This column is a continuation of our previous column Comparson of Goodness of Ft Statstcs for Lnear Regresson, Part II The authors contnue ther dscusson of the correlaton coeffcent n developng a calbraton for quanttatve analyss. Jerome Workman Jr. and Howard

More information

CHAPTER IV RESEARCH FINDING AND DISCUSSIONS

CHAPTER IV RESEARCH FINDING AND DISCUSSIONS CHAPTER IV RESEARCH FINDING AND DISCUSSIONS A. Descrpton of Research Fndng. The Implementaton of Learnng Havng ganed the whole needed data, the researcher then dd analyss whch refers to the statstcal data

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests Smulated of the Cramér-von Mses Goodness-of-Ft Tests Steele, M., Chaselng, J. and 3 Hurst, C. School of Mathematcal and Physcal Scences, James Cook Unversty, Australan School of Envronmental Studes, Grffth

More information

Sparse Gaussian Processes Using Backward Elimination

Sparse Gaussian Processes Using Backward Elimination Sparse Gaussan Processes Usng Backward Elmnaton Lefeng Bo, Lng Wang, and Lcheng Jao Insttute of Intellgent Informaton Processng and Natonal Key Laboratory for Radar Sgnal Processng, Xdan Unversty, X an

More information

Ensemble of GA based Selective Neural Network Ensembles

Ensemble of GA based Selective Neural Network Ensembles Ensemble of GA based Selectve eural etwork Ensembles Jan-Xn WU Zh-Hua ZHOU Zhao-Qan CHE atonal Laboratory for ovel Software Technology anjng Unversty anjng, 0093, P.R.Chna wujx@a.nju.edu.cn {zhouzh, chenzq}@nju.edu.cn

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

1 The Mistake Bound Model

1 The Mistake Bound Model 5-850: Advanced Algorthms CMU, Sprng 07 Lecture #: Onlne Learnng and Multplcatve Weghts February 7, 07 Lecturer: Anupam Gupta Scrbe: Bryan Lee,Albert Gu, Eugene Cho he Mstake Bound Model Suppose there

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far Supervsed machne learnng Lnear models Least squares regresson Fsher s dscrmnant, Perceptron, Logstc model Non-lnear

More information

On the Multicriteria Integer Network Flow Problem

On the Multicriteria Integer Network Flow Problem BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 5, No 2 Sofa 2005 On the Multcrtera Integer Network Flow Problem Vassl Vasslev, Marana Nkolova, Maryana Vassleva Insttute of

More information

Transfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system

Transfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system Transfer Functons Convenent representaton of a lnear, dynamc model. A transfer functon (TF) relates one nput and one output: x t X s y t system Y s The followng termnology s used: x y nput output forcng

More information

Explaining the Stein Paradox

Explaining the Stein Paradox Explanng the Sten Paradox Kwong Hu Yung 1999/06/10 Abstract Ths report offers several ratonale for the Sten paradox. Sectons 1 and defnes the multvarate normal mean estmaton problem and ntroduces Sten

More information

COS 511: Theoretical Machine Learning

COS 511: Theoretical Machine Learning COS 5: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #0 Scrbe: José Sões Ferrera March 06, 203 In the last lecture the concept of Radeacher coplexty was ntroduced, wth the goal of showng that

More information

9 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations

9 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations Physcs 171/271 - Chapter 9R -Davd Klenfeld - Fall 2005 9 Dervaton of Rate Equatons from Sngle-Cell Conductance (Hodgkn-Huxley-lke) Equatons We consder a network of many neurons, each of whch obeys a set

More information

Comparison of Regression Lines

Comparison of Regression Lines STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence

More information

Lecture 20: November 7

Lecture 20: November 7 0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:

More information

Learning Theory: Lecture Notes

Learning Theory: Lecture Notes Learnng Theory: Lecture Notes Lecturer: Kamalka Chaudhur Scrbe: Qush Wang October 27, 2012 1 The Agnostc PAC Model Recall that one of the constrants of the PAC model s that the data dstrbuton has to be

More information

Negative Binomial Regression

Negative Binomial Regression STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...

More information

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression 11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING

More information

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Structure and Drive Paul A. Jensen Copyright July 20, 2003 Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.

More information

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran

More information

The internal structure of natural numbers and one method for the definition of large prime numbers

The internal structure of natural numbers and one method for the definition of large prime numbers The nternal structure of natural numbers and one method for the defnton of large prme numbers Emmanul Manousos APM Insttute for the Advancement of Physcs and Mathematcs 3 Poulou str. 53 Athens Greece Abstract

More information

Turbulence classification of load data by the frequency and severity of wind gusts. Oscar Moñux, DEWI GmbH Kevin Bleibler, DEWI GmbH

Turbulence classification of load data by the frequency and severity of wind gusts. Oscar Moñux, DEWI GmbH Kevin Bleibler, DEWI GmbH Turbulence classfcaton of load data by the frequency and severty of wnd gusts Introducton Oscar Moñux, DEWI GmbH Kevn Blebler, DEWI GmbH Durng the wnd turbne developng process, one of the most mportant

More information

Excess Error, Approximation Error, and Estimation Error

Excess Error, Approximation Error, and Estimation Error E0 370 Statstcal Learnng Theory Lecture 10 Sep 15, 011 Excess Error, Approxaton Error, and Estaton Error Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton So far, we have consdered the fnte saple

More information

Fundamental loop-current method using virtual voltage sources technique for special cases

Fundamental loop-current method using virtual voltage sources technique for special cases Fundamental loop-current method usng vrtual voltage sources technque for specal cases George E. Chatzaraks, 1 Marna D. Tortorel 1 and Anastasos D. Tzolas 1 Electrcal and Electroncs Engneerng Departments,

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far So far Supervsed machne learnng Lnear models Non-lnear models Unsupervsed machne learnng Generc scaffoldng So far

More information

Uncertainty as the Overlap of Alternate Conditional Distributions

Uncertainty as the Overlap of Alternate Conditional Distributions Uncertanty as the Overlap of Alternate Condtonal Dstrbutons Olena Babak and Clayton V. Deutsch Centre for Computatonal Geostatstcs Department of Cvl & Envronmental Engneerng Unversty of Alberta An mportant

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens THE CHINESE REMAINDER THEOREM KEITH CONRAD We should thank the Chnese for ther wonderful remander theorem. Glenn Stevens 1. Introducton The Chnese remander theorem says we can unquely solve any par of

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

Lecture 3 Stat102, Spring 2007

Lecture 3 Stat102, Spring 2007 Lecture 3 Stat0, Sprng 007 Chapter 3. 3.: Introducton to regresson analyss Lnear regresson as a descrptve technque The least-squares equatons Chapter 3.3 Samplng dstrbuton of b 0, b. Contnued n net lecture

More information

1 Definition of Rademacher Complexity

1 Definition of Rademacher Complexity COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #9 Scrbe: Josh Chen March 5, 2013 We ve spent the past few classes provng bounds on the generalzaton error of PAClearnng algorths for the

More information

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law: CE304, Sprng 2004 Lecture 4 Introducton to Vapor/Lqud Equlbrum, part 2 Raoult s Law: The smplest model that allows us do VLE calculatons s obtaned when we assume that the vapor phase s an deal gas, and

More information

A new construction of 3-separable matrices via an improved decoding of Macula s construction

A new construction of 3-separable matrices via an improved decoding of Macula s construction Dscrete Optmzaton 5 008 700 704 Contents lsts avalable at ScenceDrect Dscrete Optmzaton journal homepage: wwwelsevercom/locate/dsopt A new constructon of 3-separable matrces va an mproved decodng of Macula

More information

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

Learning with Tensor Representation

Learning with Tensor Representation Report No. UIUCDCS-R-2006-276 UILU-ENG-2006-748 Learnng wth Tensor Representaton by Deng Ca, Xaofe He, and Jawe Han Aprl 2006 Learnng wth Tensor Representaton Deng Ca Xaofe He Jawe Han Department of Computer

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

Hidden Markov Models

Hidden Markov Models Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING 1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques 2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N

More information