Robust and Efficient Boosting Method using the Conditional Risk

Size: px
Start display at page:

Download "Robust and Efficient Boosting Method using the Conditional Risk"

Transcription

1 1 Robust and Effcent Boostng Method usng the Condtonal Rsk Zh Xao, Zhe Luo, Bo Zhong, and Xn Dang, Member, IEEE Abstract Well-known for ts smplcty and effectveness n classfcaton, AdaBoost, however, suffers from overfttng when class-condtonal dstrbutons have sgnfcant overlap. Moreover, t s very senstve to nose that appears n the labels. Ths artcle tackles the above lmtatons smultaneously va optmzng a modfed loss functon (.e., the condtonal rsk). The proposed approach has the followng two advantages. (1) It s able to drectly take nto account label uncertanty wth an assocated label confdence. (2) It ntroduces a trustworthness measure on tranng samples va the Bayesan rsk rule, and hence the resultng classfer tends to have fnte sample performance that s superor to that of the orgnal AdaBoost when there s a large overlap between class condtonal dstrbutons. Theoretcal propertes of the proposed method are nvestgated. Extensve expermental results usng synthetc data and real-world data sets from UCI machne learnng repostory are provded. The emprcal study shows the hgh compettveness of the proposed method n predcaton accuracy and robustness when compared wth the orgnal AdaBoost and several exstng robust AdaBoost algorthms. Index Terms AdaBoostng, classfcaton, condtonal rsk, exponental loss, label nose, overfttng, robustness. I. INTRODUCTION For classfcaton, AdaBoost s well-known as a smple but effectve boostng algorthm wth the goal of constructng a strong classfer by gradually combnng weak learners [46], [12], [31]. Its mprovement on classfcaton accuracy benefts from the ablty of adaptvely samplng nstances for each base classfer n the tranng process, more specfcally n ts re-weghtng mechansm. It emphaszes the nstances that were prevously msclassfed, and t decreases the mportance of those that have been adequately traned. Ths adaptve scheme, however, causes an overfttng problem for nose data or data from overlappng class dstrbutons [9], [25], [43]. The problem stems from the uncertanty of observed labels. It s usually a great challenge to do classfcaton for the cases wth overlappng classes. Also, t s both expensve and dffcult to obtan relable labels [11]. In some applcatons (such as bomedcal data), perfect tranng labels are almost mpossble to obtan. Hence, how to make AdaBoost acheve nose robustness and avod overfts becomes an mportant task. The am of ths paper s to construct a modfed AdaBoost Z. Xao s wth the Department of Informaton Management, Chongqng Unversty, Chna. E-mal: xaozh@cqu.edu.cn. Z. Luo s wth Bank of Chna, 151 Natonal Road, Qngxu Dstrct, Nannng, Chna. E-mal: shulfang 1988@126.com. B. Zhong s wth the Department of Statstcs and Actuaral Scence, Chongqng Unversty, Chna. E-mal: zhongbo@cqu.edu.cn. X. Dang s the correspondng author and she s wth the Department of Mathematcs, Unversty of Msssspp, Unversty, MS, 38677, USA. E-mal: xdang@olemss.edu. classfcaton algorthm wth a new perspectve for tacklng those problems. A. Related Work Modfcatons to AdaBoost n dealng wth nose data can be summarzed nto three strategc categores. The frst one ntroduces some robust loss functons as new crtera to be mnmzed, rather than usng the orgnal exponental loss. The second type focuses on modfyng the re-weghtng rule n teratons n order to reduce or elmnate the effects of nosy data or outlers n the tranng sets. The thrd approach suggests more modest methods to combne weak learners that take advantage of base classfers n other ways. LogtBoost [13] s an outstandng example of a modfcaton of the frst strategc category. It uses the negatve bnomal loglkelhood loss functon, whch puts relatvely less nfluence on nstances wth large negatve margns 1 n comparson wth the exponental loss, thus LogtBoost s less affected by contamnated data [15]. Based on the concept of robust statstcs, Kanamor et al. [19] studed loss functons for robust boostng and proposed a transformaton of loss functons n order to construct boostng algorthms more robust aganst outlers. Ther usefulness has been confrmed emprcally. However, the loss functon they utlzed was derved wthout consderng effcency. Onoda [26] proposed a set of algorthms that ncorporate a normalzaton term nto the orgnal objectve functon to prevent from overfttng. Sun et al. [35] and Sun et al. [36] modfed AdaBoost usng the regularzaton method. The approaches of the frst category modfcaton manly dffer n the loss functons and optmzaton technques that are used. Sometmes, n the pursut of robustness, t s hard to balance the complexty of a loss functon wth ts computaton cost. In general, modfcaton of a loss functon leads to a new re-weghtng rule for AdaBoost, but some heurstc algorthms drectly rebuld ther weght updatng scheme to avod skewed dstrbutons of examples n the tranng set. For nstance, Domngo and Watanabe [10] proposed MadaBoost that bounds the weght assgned to every sample by ts ntal probablty. Zhang et al. [49] ntroduced a parameter nto the weght updatng formula to reduce weght changes n the tranng process. Servedo [32] provded a new boostng algorthm, SmoothBoost, whch produces only smooth dstrbutons of weghts but enables generaton of a large margn n the fnal hypothess. Utkn and Zhuk [40] took the mnmax (pessmstc) approach to search the optmal weghts at each 1 Margn s generally defned as yf(x), a negatve margn mples a msclassfcaton on an nstance

2 2 teraton n order to avod outlers beng heavly sampled n the next teraton. Snce the ensemble classfer n AdaBoost predcts a new nstance by a weghted majorty votng among weak learners, the classfer that acheves hgh tranng accuracy wll greatly mpact the predctve result because of ts large coeffcent. Ths can have a detrmental effect on the generalzaton error, especally when the tranng set tself s corrupted [30], [1]. Wth ths n mnd, the thrd strategy seeks to provde a better way to combne weak learners. Schapre and Snger [30] mproved boostng n an extended framework where each weak hypothess produces not only classfcatons but also confdence scores n order to smooth the predctons. Besdes, another method called Modest AdaBoost [42] ntends to decrease contrbutons of base learners n a modest way and forces them to work only n ther doman. The algorthms descrbed above manly focus on some robustfyng prncple, but they do not consder specfc nformaton n the tranng samples. Many other researches [37], [18], [16] ntroduced the nose level nto the loss functon and extended some of the above mentoned methods. Nevertheless, most of these algorthms do not fundamentally change the fact that msclassfed samples are weghted more than they are n the prevous stage, though the ncrement of weghts s smaller than that n AdaBoost. Thus mslabeled data may stll hurt the fnal decson and cause overfttng. In recent studes, many researchers were nclned to utlze the nstance-base method to make AdaBoost robust aganst label nose or outlers. They evaluated the relablty or usefulness of each sample usng statstcal methods, and took that nformaton nto account. Cao et al. [6] suggested a nosedetecton based loss functon that teaches AdaBoost to classfy each sample nto a mostly agreed class rather than usng ts observed label. Gao and Gao [14] set the weght of suspcous samples n each teraton to zero and elmnated ther effects n AdaBoost. Essentally, these two methods use dynamc correctng and deletng technques n the tranng process. In [43], the boostng algorthm drectly works on a reduced tranng set whose confusng samples have been removed. Zhang and Zhang [48] consdered a local boostng algorthm. Its reweghtng rule and the combnaton of multple classfers utlze more local nformaton of the tranng nstances. For handlng label nose, t s natural to delete or correct suspcous nstances frst and then take the remanng good samples as prototypes for learnng tasks. Ths dea s not just for AdaBoost but s also applcable to general methods n many felds (e.g., [39]). Some approaches am at constructng a good nose purfcaton mechansm under the framework of dfferent methods, such as ensemble methods [41], [4], [5], KNN or ts varants [29], [22], [17] and so on. Data preprocessng technque s a necessary step to mprove qualty of the predcton models n some cases [28]. However, some correct samples along wth some valuable nformaton may be dscarded, and n the meantme, some nose samples may be ncluded or some new nose samples may be ntroduced. Ths s the lmtaton of correctng and deletng technques. To overcome ths weakness, Rebbapragada and Brodley [27] tred to use the confdence on the observed label as a weght of each nstance durng the tranng process and provded a novel framework for mtgatng class noses. They showed emprcally that ths confdence weghtng approach can outperform the dscardng approach, but ths new method was only appled to tree-based C4.5 classfer. The confdencelabelng technque they utlzed fals to be a desrable label correcton method. In [45] and [50], they consdered and estmated the probablty of an nstance beng from class 1 and used t as a soft label of the nstance. B. An overvew of the proposed approach Inspred by nstance-base methods and constructon of robust algorthms, we propose a novel boostng algorthm based on label confdence, called CB-AdaBoost. The observed label of each nstance s treated as uncertan. Not only the correctness, but also the degree of correctness of the label, are evaluated accordng to a certan crteron before the tranng procedure. We ntroduce the confdence of each nstance nto the exponental loss functon. Wth such a modfcaton, the msclassfed and correctly classfed exponental losses are weghtly averaged. The weghts are ther correspondng probabltes represented by the correctness certanty parameter. In ths way, the algorthm treats nstances dfferently based on ther confdence, and thus, t moderately controls the tranng ntensty for each observaton. The modfed loss functon s ndeed the condtonal rsk or nner rsk, whch s qute dfferent from a asymmetrc loss or fuzzy loss. Our method can make a smooth transton between full acceptance and full rejecton of a sample label, thereby achevng robustness and effcency at the same tme. In addton, our label-confdence based learnng has no threshold parameter, whereas correctng and deletng technques have to defne a confdence level for suspect nstances so that they are relabeled or dscarded n the tranng procedure. We derve theoretcal results and also provde emprcal evdences to show superor performance of the proposed CB-AdaBoost. The contrbutons of ths paper are as follows. A new loss functon. We consder the condtonal rsk so that label uncertanty can be drectly dealt wth by the concept of label confdence. Ths new loss functon also leads the consderaton of the sgn of Bayesan rsk rule on each of the sample ponts at the ntalzaton of the procedure. A smple modfcaton of adaptve boostng algorthm. Based on the new exponental loss functon, AdaBoost has a smple explct optmzaton soluton at each teraton. Theoretcal and emprcal justfcatons for effcency and robustness of the proposed method. Consstency of the CB-AdaBoost s studed. Broad adaptvty. The proposed CB-AdaBoost s sutable for nose data and for class-overlappng data. C. Outlne of the paper The remander of the paper s organzed as follows. Secton II revews the orgnal AdaBoost. In Secton III, we propose a new AdaBoost algorthm. We dscuss n detal assgnment

3 3 of label confdence, the loss functon, and the algorthm as well as ts ablty of adaptve learnng n the label-confdence framework. Secton IV devotes to a study of the consstency property. In Secton V, we llustrate how the proposed algorthm works and nvestgate ts performance through emprcal studes of both synthetc and real-world data sets. Fnally, the paper concludes wth some fnal remarks n Secton VI. A proof of consstency s provded n Appendx. II. REVIEW OF ADABOOST ALGORITHM For bnary classfcaton, the man dea of AdaBoost s to produce a strong classfer by combnng weak learners. Ths s obtaned through an optmzaton that mnmzes the exponental loss crteron over the tranng set. Let L = {(x, z ) n =1 } denote a gven tranng set consstng of n ndependent tranng observatons, where x = (x 1, x 2,, x p ) T R p and z {1, 1} represent the nput attrbutes and the class label of the th nstance, respectvely. The pseudo-code of AdaBoost s gven n Algorthm 1 below. Algorthm 1: AdaBoost Algorthm Input: L = {(x, z ) n =1 } and the maxmum number of base classfers M. Intalze: For, w (1) = 1/N, D (1) = w (1) /S 1, where S 1 = n =1 w(1) s the normalzaton factor. For m = 1 To M 1 Draw nstance from L wth replacement accordng to the dstrbuton D (m) to form a tranng set L m ; 2 Tran L m wth the base learnng algorthm and obtan a weak hypothess h m ; 3 Compute ε m = n :h m(x ) z D (m) ; ( 4 Let β m = 1 2 ln 1 ε m ε m ); If β m < 0, then M = m 1 and abort loop. 5 Update w (m+1) = e zβmhm(x) ; D (m+1) = w (m+1) S m+1 = n =1 w(m+1) ; End For Output: sgn( M m=1 β mh m (x)). /S m+1 for, where In the AdaBoost Algorthm, the current classfer h m s nduced on the weghted samplng data, and the resultng weghted error ε m s computed. The ndvdual weght of each of the observatons s updated for the next teraton. AdaBoost s desgned for clean tranng data that s, each label z s the true label of x. In ths framework, any nstance was prevously msclassfed has a hgher probablty to be sampled n the next stage. In ths way, the next classfer focuses more on those msclassfed nstances, and hence, the fnal ensemble classfer acheves hgh accuracy. For mslabeled data, however, those observatons whch were msclassfed n the prevous step are weghted less, and those correctly classfed nstances are weghted more than they should. Ths leads to the next tranng set L m+1 beng serously corrupted, and those mslabeled data eventually hurt the performance of the ensemble classfer. Therefore, some modfcatons should be ntroduced to make AdaBoost nsenstve to class nose. III. LABEL-CONFIDENCE BASED BOOSTING ALGORITHM A. Label confdence For the class nose data problem, the observed label y assocated wth x may be ncorrectly assumed due to some random mechansm. For the class overlappng problem, the label y assocated wth x s a realzaton of random label from some dstrbuton. In our approach to deal wth both problems, we treat the true label Z to be random. Let y (ether 1 or -1) be the observed label assocated wth x. We defne a parameter γ as the probablty of beng correctly labeled, that s, γ = P (Z = y x) and P (Z = y x) = 1 γ for γ [0, 1]. The quantty γ (1 γ) = (2γ 1)sgn(2γ 1) measures trustworthness of label y and sgn(2γ 1) = ±1 represents confdence towards correctness or wrongness of the label. Thus we can use sgn(2γ 1)y as the trusted label wth confdence level 2γ 1. For example, for γ = 1, 2γ 1 = 1 and sgn(2γ 1) = 1 represent that we are 100% confdent about correctness of the label y, whle for γ = 0, 2γ 1 = 1 and sgn(2γ 1) = 1 represent 100% certanty about the wrongness of y so that y should be 100% trusted. The label y wth γ = 0.5 s the most unsure or fuzzy case wth 0 confdence. It s easy to see that the trusted label sgn(2γ 1)y s exactly the Bayes rule. Let η(x) = P (Z = 1 x) and hence the Bayes rule s sgn(2η(x) 1), whch s equal to sgn(2γ 1)y for both y = 1 and y = 1. For gven tranng data L = {(x 1, y 1 ),..., (x n, y n )}, let a parameter vector γ = (γ 1, γ 2,..., γ n ) represent ther probabltes of beng correctly labeled. That s, the parameter γ can be regarded as the confdence of a sample x beng correctly labeled as y. In the next subsectons, we frst ntroduce the modfed loss functon based on a gven γ, then propose the confdence based adaptve boostng method (CB-AdaBoost). At the end of the secton we dscuss the estmaton of γ. B. Condtonal-rsk loss functon Gven a clean tranng set wth correct labels z s avalable, the orgnal AdaBoost mnmzes the emprcal exponental rsk ˆ rsk(f) = 1 n exp( z f(x )) =1 (III.1) over all lnear combnatons of base classfers n the gven space H, assumng that an exhaustve weak learner returns the best weak hypothess on every round [13], [31]. Now n class nose data, the true label z s unknown. We only observe y assocated wth x. Based on the assumpton, gven x, the probablty that Z s y s γ. It s natural to consder the followng emprcal rsk: ˆR = 1 n [γ exp( y f(x )) + (1 γ ) exp(y f(x ))]. =1 (III.2)

4 4 That s, we treat the observed label y as a fuzzy label wth γ correctness confdence. In other words, we consder the modfed exponental loss functon L γ (y, f(x)) = γ exp( yf(x))+(1 γ) exp(yf(x)), (III.3) whch has a straghtforward nterpretaton. The label y assocated wth x s trusted wth γ confdence and t s corrected as y wth 1 γ confdence. It s easy to check that the loss (III.3) L γ (y, f(x)) = E z x exp( zf(x)), whch s the nner rsk defned n [33]. The reason t s called the nner rsk s because the true exponental rsk s rsk(f) = E exp( zf(x)) (III.4) = ExE z x[exp( zf(x)] = ExL γ (y, f(x)) (III.5) for y = ±1. From ths perspectve, we consder mnmzng the emprcal nner rsk of (III.5), whle the orgnal AdaBoost mnmzes the emprcal rsk of (III.4). Stenwart and Chrstmann [33] showed n ther Lemma 3.4 that the rsk can be acheved by mnmzng the nner rsks, where the expectaton s taken wth respect to the margnal dstrbuton of x, n contrast to (III.4) where the expectaton s taken wth respect to the jont dstrbuton of (x, z). Clearly, under the scenaros of overlappng class and label nose, the emprcal nner rsk (III.2) has an advantage over (III.1). In [2], (III.3) s called the condtonal ψ-rsk wth ψ beng the exponental loss functon. A classfcaton-calbrated condton on the condtonal rsk s provded to ensure a pontwse form of Fsher consstency for classfcaton. In other words, f the condton s satsfed, the 0-1 loss can be surrogated by the convex ψ loss n order to make the mnmzaton computatonally effcent. The exponental loss s classfcaton-calbrated. Our proposed method utlzes a dfferent emprcal estmator of the exponental rsk. Its consstency follows from the consstency result of AdaBoostng [3] along wth consstent estmaton of γ. More detals are presented n Secton IV. The loss (III.3) s closely related to the asymmetrc loss used n the lterature (e.g. [44], [24]), but the motvaton and goal of the two losses are qute dfferent. The asymmetrc loss treats two classes unequally. Two msclassfcaton errors produce dfferent costs. However, the costs or weghts do not necessarly sum up to 1. In asymmetrc loss, the rato of two costs s usually used to measure the degree of asymmetry and s often a constant parameter, whle n (III.3) t s a functon of x. Also the loss (III.3) takes a lnear combnaton of the exponental loss at y and y, whle the asymmetrc loss only takes one. Indeed, γ n the loss (III.3) s the posteror probablty used n [38] for the support vector machne technque. The smlarty s that we all use the sgn of the Bayes rule as the trusted label. However, we also nclude the magntude 2γ 1 n our loss functon. We assocate the trusted label wth a confdence 2γ 1, whle n [38] the confdence s always 1. The dea of label confdence s closely related to fuzzy label used n fuzzy support vector machnes [21]. The dfference s that fuzzy label only assgns an mportance weght for the observed label wthout consderng ts correctness. Next, we derve the proposed method based on the modfed exponental loss functon. C. Dervaton of our algorthm For an addtve model, f M (x) = M β m h m (x), m=1 (III.6) where h m (x) { 1, 1} s a weak classfer n the m th teraton, β m s ts coeffcent and f M (x) s an ensemble classfer. Our goal s to learn an ensemble classfer wth a forward stage-wse estmaton procedure by fttng an addtve model to mnmze the modfed loss functons. Let us consder an update from f m 1 (x) to f m (x) = f m 1 (x) + β m h m (x) by mnmzng (III.2). Ths s an optmzaton problem to fnd solutons h m and β m, that s, (β m, h m ) = arg mn β,h [γ exp( y f m (x )) =1 + (1 γ ) exp(y f m (x ))] = arg mn [ 1 exp( y βh(x )) β,h =1 + 2 exp(y βh(x ))], (III.7) where 1 = γ e yfm 1(x) and 2 = (1 γ )e yfm 1(x) are ndependent wth h m and β m. As we wll show, h m and β m can be derved separately n two steps. Let us frst optmze the weak hypothess h m. The summaton n (III.7) can be expressed alternatvely as n =1 [w(m) 1 exp( y βh(x )) + 2 exp(y βh(x ))] = n + n = n {:h(x )=y } [w(m) 1 {:h(x ) y } [w(m) 1 =1 [w(m) 1 e β + 2 e β ] e β + 2 e β ] e β + 2 e β ] +(e β e β ) N {:h(x ) y } [w(m) 1 2 ]. Therefore, for any gven value of β > 0, (III.7) s equvalent to the mnmzaton of h m = arg mn h =1 [ 1 2 ]I{h(x ) y }. (III.8) It s worthwhle to menton that the term ( 1 2 ) may be negatve, hence t cannot be drectly nterpreted as the weght of the nstance (x, y ) n the tranng set. Accordng to the analytcal soluton of h m, the base classfer s expected

5 5 to correctly predct (x, y ) n the case of 1 2 and otherwse msclassfy (x, y ). Ths s equvalent to solvng mn h =1 1 { } 2 I h(x ) sgn([ 1 2 ]y ). (III.9) In other words, h m s actually the one that mnmzes the predcton error over the set {(x, sgn([ 1 2 ]y ) n =1 } wth each nstance weghted 1 2. In each teraton, we treat sgn([ 1 2 ]y ) as the label of x and 1 2 as ts mportance. Ths provdes a theoretcal justfcaton of the samplng scheme n our proposed algorthm, whch s gven later. Next, we optmze β m. Wth h m fxed, β m mnmzes [ 1 :h m(x )=y + [ 1 :h m(x ) y e β + 2 e β ] e β + 2 e β ]. (III.10) Upon settng the dervatve of (III.10) (wth respect to β) to zero, we obtan β m = 1 2 ln :h m(x )=y 1 + :h m(x ) y 2 :h m(x ) y 1 +. :h m(x )=y 2 (III.11) Note that the condton that 1 + :h m(x )=y > :h m(x ) y :h m(x ) y 2 :h m(x )=y (III.12) should hold n order to ensure the value of β m s postve. The approxmaton on the m th teraton s then updated as f m (x) = f m 1 (x) + β m h m (x), whch leads to the followng update of 1 and w (m+1) 1 w (m+1) 2 = 1 e yβmhm(x) = 2 e yβmhm(x). and 2 : (III.13) By repeatng the procedure above, we can derve the teratve process for all rounds m 2 untl m = M or the condton (III.12) s not satsfed. The ntal values take w (1) 1 = γ and w (1) 2 = 1 γ. Now we wrte the procedure nto the pseudocode of the Algorthm 2. Algorthm 2: CB-AdaBoost Algorthm Input: L = {(x, y ) n =1 }, γ = {(γ ) n =1 } and M Intalze: For, w (1) 1 = γ, w (1) 2 = 1 γ, D (1) = w (1) 1 w (1) 2 /S 1, where S 1 = n 1 w (1) 2 =1 w(1) For m = 1 To M 1 Relabel all nstances n L to compose a new data set as L = {(x, y )n =1 }, where x L, y = sgn[(w(m) 1 2 )y ]; 2 Draw nstance from L wth replacement accordng to the dstrbuton D (m) to compose a tranng set L m ; 3 Tran L m wth the base learnng algorthm and obtan a weak hypothess h m ; 4 Let β m = 1 2 ln :hm(x )=y 1 + :hm(x ) y 2 :hm(x ) y 1 + :hm(x )=y 2 If β m < 0, then M = m 1 and abort loop. 5 Update w (m+1) 1 = w (m+1) 2 = 1 e yβmhm(x) ; 2 e yβmhm(x) ; D (m+1) = w (m+1) 1 w (m+1) 2 /S m+1 for, where S m+1 = n w (m+1) 2 ; =1 w(m+1) 1 End For Output: sgn( M m=1 β mh m (x)). D. Class nose mtgaton In ths subsecton, we study the effect of label confdence, and we nvestgate the adaptve ablty of CB-AdaBoost n the mtgaton of overfttng and class nose from aspects of ts re-weghtng procedure and classfer combnaton rule. Frst, the ntalzaton of dstrbuton shows dfferent ntal emphases on tranng nstances between Algorthm 1 and Algorthm 2. As dscussed early, γ (1 γ ) actually represents ts label certanty, and t s used as the ntal weght n Algorthm 2. The condtonal rsk type of loss functon leads ths ntalzaton and the weghtng strategy that dstngushes nstances based on ther own confdences. Consequently, the nstances wth a hgh certanty receve a prorty to be traned. Ths makes sense as these nstances are usually those dentfable from a statstcal standpont, and thus, they are more valuable n classfcaton. By contrast, Algorthm 1 treats each nstance equally at the begnnng wthout consderng the relablty on the samples. Second, we consder y = sgn(2γ 1)y as the label of x n Algorthm 2. Under the mslabeled or class overlappng scenaros, ths desgn makes sense because sgn(2γ 1) represents the confdence towards correctness or wrongness of the label y. If sgn(2γ 1) = 1, y should be trusted wth confdence 2γ 1. Nevertheless, f sgn(2γ 1) = 1, y should be trusted wth confdence 2γ 1. The orgnal AdaBoost trusts label y completely, whch s napproprate under mslabellng and class overlappng. As shown before, the trust label y n CB-AdaBoost has the same sgn as the Bayes rule at sample pont x. Intutvely, our method takes more nformaton at the ntalzaton. Thrd, we take a detaled look at the weght updatng formulas n Algorthm 2 and subsequently obtan the followng results on the frst re-weghtng process. We say that an nstance x s msclassfed at the m th teraton f h m (x ) y, where y = sgn[(wm 1 wm 2 )y ]; otherwse, t s correctly classfed. Proposton 1. The msclassfed nstance receves larger weght for the next teraton.

6 6 Proof. Two types of msclassfcaton are ether h m (x ) y wth 1 > 2 or h m (x ) y wth 1 < 2. In the frst case, w (m+1) 1 > 1 w (m+1) 2 2, whle n the second case, w (m+1) 1 > 1 w (m+1) 2 2. = 1 = In both cases, the weght ncreases. 1 e βm 2 e βm e βm 2 e βm Proposton 2. If an nstance s correctly classfed and ts certanty s hgh enough so that max{ 1, 2 } > e βm mn{ 1, 2 }, then t receves smaller weght at the next teraton. Proof. We can easly check two cases. For the case of 2 and h m (x ) = y, when 1 > e βm 2, we have w (m+1) 1 < 1 w (m+1) 2 2. For the case of 1 e βm 2, we have w (m+1) 1 < 1 w (m+1) 2 2. = < 2 1 = e βm 2 e βm 1 > and h m (x ) = y, f 1 > 1 e βm 2 e βm Propostons 1 and 2 show that on the frst mportant stage, CB-AdaBoost nherts the adaptve learnng ablty of AdaBoost and has the dstncton that t adjusts the dstrbuton of nstances accordng to the current classfcaton wth respect to the commonly agreed nformaton. Moreover, the degree of adjustment s managed by the confdence of each sample. For the followng teratons, we can magne the resamplng process. The weghts of nstances wth hgh confdence stay at a hgh level untl most of them are suffcently learned. After that, ther proporton decreases rapdly whle the proporton of nstances wth low confdence ncreases gradually. As uncertan nstances consst of most of the tranng set, the tranng process s dffcult to contnue. On the other hand, once a new classfer becomes no better than a random guess, then an early stop n the teratve process s possble. Ths s because the condton (III.12) no longer holds n that case. Thus, our proposed method effectvely prevents the ensemble classfer from overfttng. Fourth, let us scrutnze the classfer ensemble rule. Proposton 3. In the framework of Algorthm 2, defne ε m as the error rate of h m over ts tranng set L m durng the m th teraton that s, ε m = n :h m(x ) y 1 2 /S m. We then have β m < 1 2 ln ( 1 ε m ε m ). Proof. We can prove ths result by gvng an equvalent representaton of β m as: ( β m = 1 N 2 ln :h m(x )=y 1 + n :h m(x ) y ) 2 n :h m(x ) y 1 + n :h m(x )=y 2 = 1 n 2 ln :h m(x )=y c n, :h m(x ) y c where c = : 1 < : 1 > 2 2. Wth the Condton (III.12) beng satsfed, we obtan n :h m(x )=y 1 2 > n :h m(x ) y 1 2, whch mples n 1 ε m :h m(x )=y ε = n m :h m(x ) y n :h m(x )=y > 1 n :h m(x ) y c 2 + c. Thus, the proof of Proposton 3 s complete. It turns out that β m calculated n our modfed algorthm does not take nto account the full value of the odd rato for each hypothess. In fact, t s smaller than that calculated n AdaBoost, so our algorthm combnes base classfers and updates nstance weghts modestly. Ths effectvely avods the stuaton where some hypotheses domnated by substantal classfcaton nose are exaggerated by ther large coeffcents n the fnal classfer. We have studed the CB-AdaBoost algorthm n detal and compared ts advantages to the orgnal one. Next, we dscuss the remanng ssue of how to estmate label confdence. E. Assgnment of label confdence In most cases, snce t s dffcult to track the data collecton process and dentfy where corruptons wll most lkely occur, we evaluate the confdence on labels accordng to the statstcal characterstcs of the data tself. In ths regard, [27] suggested a par-wse expectaton maxmzaton method (PWEM) to compute confdence of labels. Cao et al [6] appled KNN to detect suspcons examples. However, a drect applcaton of these methods may not be effcent for data sets whose nose level s hgh. We beleve that a cleaner data set can make a better confdence estmaton. Therefore, before confdence assgnment, a nose flter shall be ntroduced to elmnate very suspcous nstances so that we are able to extract more relable statstcal characterstcs from the remanng data. Frst, a nose flter scans over the orgnal data set. Usng a smlarty measure between nstances to fnd a neghborhood of each nstance, one can compute the agreement rate for ts label from ts neghbors. The nstances wth an agreement rate below a certan threshold are elmnated. The above process can be repeated several tmes snce some suspect nstances may be exposed later when ther neghborhood changes. In

7 7 Nose Level 10% 20% 30% Normal n = 50 Clean ± ± ± Mslabeled ± ± ± n = 500 Clean ± ± ± Mslabeled ± ± ± Sne n = 50 Clean ± ± ± Mslabeled ± ± ± n = 500 Clean ± ± ± Mslabeled ± ± ± TABLE I AVERAGE AND STANDARD DEVIATION OF THE CONFIDENCES FOR CLEAN AND MISLABELED SAMPLES IN TWO DATA SETS WITH DIFFERENT NOISE LEVELS. our experment, the threshold s set to 0.07 at the begnnng wth an ncrement of 0.07 n each subsequent round. The process s repeated three tmes, and the fnal cut-off value for the agreement rate s 0.21 so that the sample sze doesn t decrease much. In the mean tme, dstrbutonal nformaton of the sample s kept relatvely ntact. Once a fltered data set, denoted as L red, s obtaned, two methods can be used to compute label confdence. If the nose level ε over the tranng labels s known or can be estmated, we can represent the frequency of observatons wth label y as follows: P (Y = y) = P (Y = y, Z = y) + P (Y = y, Z = y) = (1 ε)p (Z = y) + εp (Z = y), where the nose level ε = P (Y = y Z = y) = P (Y = y Z = y). Ths representaton explans two sources for the composton of label y: correctly labeled nstances belongng to true class y and mslabeled nstances belongng to true class y. Then P (Z = y) = (P (Y = y) ε)/(1 2ε), and utlzng the Bayesan formula, we assess the confdence as follows: P (Z = y)f(x Z = y) γ = P (Z = y x) = f(x) P (Z = y)f(x Z = y) = f(x Z = y)p (Z = y) + f(x Z = y)p (Z = y) = (P (Y = y) ε)f(x Z = y) (P (Y = y) ε)f(x Z = y) + εf(x Z = y). Wth condtonal dstrbuton type known, f(x Z = y) and f(x Z = y) can be estmated under L red whle P (Y = y) s drectly set to be the sample proporton of class y n L. The second method doesn t need to assume the nose level. KNN s recalled to assgn confdence on each label. Based on L red, the label agreement rate of each nstance among ts nearest neghbors can act as ts confdence. So the confdence probablty of an example (x, y) n L s computed as follows: P (Z = y x) = 1 K K j=1 x j N (x) I(y j = y), (III.14) where N (x) represents the set contanng K nearest neghbors of x from L red. In our experment, K = 5 s used. In the smulaton of Secton V, we wll evaluate the qualty of confdence assgned by these two methods. In practce, however, the Bayesan method s usually nfeasble snce the nose level s unknown. F. Relatonshp to prevous work Note that our modfed algorthm reduces to AdaBoost f we set the confdence on each label to one. The greater the confdence on each nstance, the less CB-AdaBoost dffers from AdaBoost n terms of the weght updatng and base classfers, as well as ther coeffcents n successve teratons. Rebbapragada et al. [27] proposed nstance weghtng va confdence n order to mtgate class nose. They attempted to assgn confdence on nstance label such that ncorrect labels receve lower confdences. We share a smlar opnon n dealng wth nose data, but nstance weghtng va confdence tself seems to be a dscardng technque rather than a correctng technque. That s, a low confdence mples an attempt to elmnate the example, whle a hgh confdence mples keepng t. By contrast, our algorthm consders both the correctly labeled and mslabeled probablty for an nstance. Therefore, the loss functon L γ (y, f(x)) = γe yf(x) + (1 γ)e yf(x) explans the atttude towards an nstance: delete t wth γ and correct t by 1 γ. In other words, our algorthm can be vewed as a composton technque of dscardng and correctng. For the same reason, our algorthm dffers from those proposed n [14] and [6]. In ther dscussons, they suggested heurstc algorthms to delete or revse suspcous examples durng teratons n order to mprove the accuracy of AdaBoost for mslabeled data. In our algorthm, the suspcous labels are smlarly revsed, whch s a consequence of mnmzng the modfed loss functon (III.2). The trusted label at each sample pont s the sgn of the Bayes rule and s assocated wth a confdence level. Other closely related work ncludes [45] and [50]. Both consder the same confdence level of x as p = p(z = 1 x ), whereas our approach takes advantage of the observed label y by consderng γ = P (z = y x ). We evaluate confdence of the observed label y, whle they assess confdence of the

8 8 postve label +1. In [50], the ntal weght 2p 1 s very smlar to our choce, but our re-weghtng and classfer combnaton rules are dfferent. [45] has a smlar combnaton rule as ours, but the ntal weghts are dfferent. IV. CONSISTENCY OF CB-ADABOOSTING In ths secton, we study consstency of the proposed CB- AdaBoostng method wth label confdences estmated by KNN approach. Several authors have shown that the orgnal and modfed versons of AdaBoost are consstent. For example, Zhang and Yu [47] consdered a general boostng wth a step sze restrcton. Lugos and Vayats [23] proved the consstency of regularzed boostng methods. Bartlett and Traskn [3] studed the stoppng rule of the tradtonal AdaBoost that guarantees ts consstency. In our algorthm, we use the exponental loss functon. We just use a dfferent emprcal verson of the exponental rsk. Ths enables us to adopt the stoppng strategy used n [3] wth a consstency result on the nearest neghborhood method ([34], [8]) to show that the proposed CB-AdaBoost s Bayes-rsk consstent. We use notaton smlar to [3]. Let (X, Z) be a par of random values n R p { 1, 1} wth the jont dstrbuton P X,Z and the margnal dstrbuton of X beng P X. The tranng sample data L n = {(x 1, y 1 ),..., (x n, y n )} s avalable, havng the same dstrbuton as (X, Z). The mslabel problem can be treated as the case P X,Z beng a contamnaton dstrbuton. The CB-AdaBoost produces a classfer g n = sgn(f n ) : R p { 1, 1} based on ths sample L n. The msclassfcaton probablty s gven by L(g n ) = P (g n (X) Z L n ). Our goal s to prove that L(g n ) approaches the Bayes rsk L = nf L(f) = E(mn(η(X), 1 η(x))), f as n, where the nfmum s taken over all measurable classfers and where η(x) s the condtonal probablty η(x) = P (Z = 1 X). Assume that H s the set of all lnear combnatons of base classfers and has a fnte VC dmenson. The proposed CB- AdaBoost fnds a combnaton f n H that mnmzes R n,kn (f) = 1 [ˆγ exp( y f(x ))+(1 ˆγ ) exp(y f(x ))], n =1 where ˆγ s a K-NN estmator of γ = P (Z = y x ). That s, ˆγ = 1 k n k n j=1 x j N (x ) I(y j = y ), where N (x ) denotes the set contanng k n nearest neghbors of x. We denote R n (f) = 1 [γ exp( y f(x )) + (1 γ ) exp(y f(x ))], n =1 and the true exponental rsk as R(f) = ExE z x exp( Zf(X)) = E exp( Zf(X). We frst prove that the CB-Adaboost s consstent wth the exponental rsk. Then, by [2], ts 0-1 rsk also approaches the Bayes rsk L, snce the exponental loss s classfcaton calbrated. We shall denote the convex hull of H scaled by λ 0 as F λ = {f f = β h, n N {0}, β = λ, h H}, =1 =1 and the set of t-combnatons, t N, of functons n H s denoted as t F t = {f f = β h, β R, h H}. =1 Defne the truncated functon π l ( ) to be π l (x) = xi(x [ l, l]) + lsgn(x), where I(x) s the ndcator functon. The set of truncated functons s π l F = { f f = π l (f), f F} and the set of classfers based on a class F s denoted by g F = { f f = g(f), f F}. Based on the stoppng strategy of [3] and the unversal consstency of nearest neghbor functon estmate of [8], we have the followng proposton. Proposton 4. Assume that V = d V C (H) < and that H s dense n the sense of lm λ nf f Fλ R(f) = R. Further assume k n, k n /n 0 and t n = n 1 a for a (0, 1). Then the CB-AdaBoost stopped at step t n returns a sequence of classfers almost surely satsfyng L(g(f n )) L. The proposton states the strong consstency of the proposed CB-AdaBoost method f t stops at t n = n 1 a and the sze of neghbors for estmatng label confdence k n but k n /n 0. A proof of Proposton 4 s gven n Appendx. V. EXPERIMENTS To begn, we run three experments to nvestgate performance of our proposed algorthm on synthetc data. The frst one examnes the qualty of assgned label confdence, snce t has a great mpact on the effectveness of the proposed method. The second explores the advantages of the proposed algorthm over other commonly used methods n dealng wth nose data. The thrd experment demonstrates sgnfcant dfferences of weghts between the proposed algorthm and the orgnal AdaBoost method. We generate random samples from two scenaros wth ncreasng levels of label nose. Norm Two classes of data are sampled from bvarate normal dstrbutons N((0, 0) T, I) and N((2, 2) T, I), respectvely. Sne Random vectors x = (x 1, x 2 ) T unformly dstrbuted on [ 3, 3] [ 3, 3] are smulated, and ther labels are assgned accordng to the condtonal probablty P (z = y x ) = e yg(x) /(e yg(x) + e yg(x) ), where y {1, 1} and g(x ) = ((x 2 3 sn x 1 ))/2.

9 Nose level:0% AdaBoost CORR DISC CB-AdaBoost Stump Nose level:10% AdaBoost CORR DISC CB-AdaBoost Stump Test Error 0.22 Test Error Iteratons Iteratons Nose level:20% AdaBoost CORR DISC CB-AdaBoost Stump Nose level:30% AdaBoost CORR DISC CB-AdaBoost Stump Test Error Test Error Iteratons Iteratons Fg. 1. Testng errors of each method under dfferent nose levels (0%,10%, 20% and 30%) as the number of teratons ncreases. Data sets consst of 50/500 tranng observatons and testng nstances. We ntroduce mslabeled data by randomly choosng tranng nstances and reversng ther labels. We then carry out the experment on real data sets from the UCI repostory [20]. Seventeen data sets of dfferent szes wth dfferent numbers of nput varables are used to compare performance of the proposed algorthm wth some exstng robust boostng methods. We set the number of teratons M to be 200 for all ensemble classfers. The base classfer used n the AdaBoost and CB-AdaBoost s the classfcaton stump, the smplest onelevel decson tree. A. Assessng the qualty of label confdence It s expected that the label confdence of clean sample nstances shall be hgh, whle for mslabelled nstances, the confdence should be low. In ths experment, we examne two assgnment methods prevously ntroduced n Secton III-E by assessng the qualty of ther label confdence results. We use the Bayesan method on the Normal data n whch the nose level s known to be 0%, 10% and 20%, respectvely. The KNN method s used on the Sne data. The number of nearest neghbors (K) used n KNN s selected from the range 3 to 15, and s set to 5 for consderaton of balance between accuracy and computaton effcency. Table I reports the average and standard devaton of confdences on clean and mslabeled samples. The averages and standard devatons are calculated through 30 repettons. As expected, there exsts a sgnfcant separaton between two types of samples on confdences. On average, clean labels acheve a hgher degree of confdence than corrupted ones. For example, under 10% contamnaton of normal samples of sze n = 500, confdence for clean sample s compared to for mslabeled sample. For the small sze n = 50, the confdence dfference s also sgnfcant wth for clean sample and for mslabeled sample. As the nose level ncreases, the dfference of label confdences between clean data and mslabeled data becomes smaller. Ths phenomenon mentoned n [27] s understandable because the certanty decreases n hgh nose data and because assgnment methods tend to be more conservatve than they are n low nose data. B. Comparsons wth dscardng and correctng methods We compare the effcency of the label-confdence based learnng wth the dscardng and correctng technques. For the latter two, a threshold on confdence s pre-specfed to defne suspect samples. We consder four types of classfers: 1) AdaBoost; 2) AdaBoost workng on the data wth suspected samples havng been dscarded (DISC); 3) AdaBoost workng on the orgnal tranng set but suspected labels havng been

10 10 Data Level n AdaBoost DISC20 DISC50 DISC80 CORR20 CORR50 CORR80 CB-AdaBoost Normal 0% ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± % ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± % ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± % ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ±.0173 Sne 0% ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± % ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± % ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± % ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ±.0278 TABLE II AVERAGE AND STANDARD DEVIATION OF TESTING ERRORS OF EACH METHOD UNDER DIFFERENT NOISE LEVELS. DISCARDING AND CORRECTING METHODS USE 0.20, 0.50 AND 0.80 AS THE CONFIDENCE THRESHOLDS. THE SMALLEST ERRORS ARE SHOWN IN BOLD ms AdaBoost cle AdaBoost ms CB cle CB hgh AdaBoost low AdaBoost hgh CB low CB Average Weghts Average Weghts Iteratons Iteratons Fg. 2. Average weghts of dfferent types of nstances durng the learnng process n the orgnal AdaBoost and CB-AdaBoost. The left panel s for weghts of mslabelled nstances and clean-labelled nstances. The rght panel s for weghts of groups wth hgh and low label confdence. corrected (CORR); 4) CB-AdaBoost. We repeat the procedure 30 tmes and record the test errors of the four classfers. Fg. 1 llustrates how the average test error changes as the number of teratons ncreases for dfferent classfers based on the tranng set of sze 50. The threshold s set to 0.5 for DISC and CORR methods. AdaBoost greatly mproves the predcton accuracy of Stump (a smple one-level decson tree) n the clean data, but ts ablty n boostng s lmted when the tranng set s corrupted, especally at hgh nose levels where t performs even worse than a sngle stump. Ths demonstrates that AdaBoost s ndeed very senstve to nose. It also suffers from overfttng at 0% nose level f the number of teratons becomes large. Wth preprocessng technques (CORR or DISC), AdaBoost acts well at the begnnng but ts accuracy decreases as a large number of base learners accumulate. Compared wth the above methods, our proposed algorthm shows better performance n clean data and better robustness aganst nose. Moreover, t tactcally avod overfttng by ceasng the learnng process at an early teraton (as early as 40). Table II provdes test errors for correctng and dscardng methods under dfferent thresholds of 0.2, 0.5 and 0.8, denoted as DISC20, DISC50 and DISC80 or CORR20, CORR50 and CORR80, respectvely. We now see that CB-AdaBoost s performance s superor n 15 out of 16 cases. The only excepton s the normal data under 30% nose level for n = 500, where CORR50 and DISC50 perform better. The advantage of CB-AdaBoost over the others s more sgnfcant for smaller szes than for larger szes. Nether the correctng nor dscardng method at one threshold performs unformly better than other thresholds. Ths makes ther practce use dffcult wth reasonable confdence thresholds. It s worthwhle to menton that CB-AdaBoost unformly outperforms AdaBoost even for the case wthout mslabels. Ths s because

11 11 there s overlappng between the two classes and because the proposed loss functon consders true rsks that may help the classfcaton acheve a better performance, wth a test error close to the theoretcally mnmum error, namely Bayes error. C. Reweghtng Ths experment llustrates re-weghtng dfferences between the orgnal AdaBoost and the proposed one. Fg. 2 plots how the average weghts of dfferent groups of nstances change as the number of teratons ncreases. Frst, we consder two groups: mslabeled nstances and clean-labeled nstances. Ther mean weghts are plotted n the left panel n Fg. 2. As the learnng process contnues, the mean weght of nose data n AdaBoost (ms-adaboost, the top red curve) rapdly rses and stays at a level much hgher than that of CB- AdaBoost (ms-cb, the mddle red curve). If the teratons cannot be stopped n tme, the weak classfers traned by heavly-weghted nose data become unrelable. By contrast, our proposed method does not place too much weght on nosy examples. The rght panel of Fg. 2 llustrates groups dvded by the certanty degree (hgher than 0.7 or not). The plot clearly demonstrates the features of the weghtng rule n CB- AdaBoost. Instances wth hgh certanty are ntalzed more and ther average weghts declne after beng fully traned whereas the average weghts of the others ncrease and reman at hgh values untl the teraton stops. However, ths adaptve ablty s not present n AdaBoost. D. Real data sets In addton, we conducted experments on 17 real data sets avalable from the UCI repostory [20]. Snce we focus on the two-class problem, the classes of several mult-class data sets are combned nto 2 classes. If the class varable s nomnal, class 1 s treated as the postve class, and the remanng classes are treated as the negatve class. If the class varable s ordnal, we merge the classes wth smlar propertes. For example, n the Cadotocography data, Suspect class and Pathologc class are combned as the postve class and Normal as the negatve class. For the Urban Land Cover dataset, we combne the tranng and test set, and any nstances wth mssng values are removed. Table III summarzes the man characterstcs of all data sets. For each data set, half of the nstances are randomly selected as the tranng set and the remanng are used for testng. 10%, 20% and 30% mslabels are ntroduced n the tranng data by randomly choosng tranng nstances and reversng ther labels. For comparson, we consder another boostng method known as LogtBoost, n addton to the two modfed AdaBoost algorthms known as MadaBoost [10] and β-boostng [49] 2, all of whch are robust aganst nose data. The procedure s repeated 30 tmes, and we take the average of the 30 test errors for each classfer as a measurement of ts performance. Accordng to Table IV, CB-AdaBoost performs better than the orgnal AdaBoost for all cases except for one whch 2 The orgnal paper dd not name the method. We name t β-boostng after the β parameter added to the algorthm, as suggested by a revewer. s the case of Musk under 10% nose level. It also greatly mproves the accuracy of the stump (.e., the base classfer). β-boostng, MadaBoost and LogtBoost methods show robustness to mslabeled data. They outperform AdaBoost for most cases especally whch LogtBoost acheves a lower test error than the other two. However, lke AdaBoost, they suffer from overfttng because they cannot stop teratons due to ther weght dstrbutons. Ths problem s overcome by CB- AdaBoost, and as a result, the wn-lose numbers of the proposed algorthm when compared to the robust three algorthms are 42-9, 48-3 and respectvely. We conducted the sgn test based on counts of wns, losses and tes [7] n order to quantfy the sgnfcance of the proposed method. Table V lsts the frequency and sgnfcance level that CB-AdaBoost wns each of other algorthms on 17 data sets at each nose level. Ths demonstrates the effectveness and advantages of CB-AdaBoost n handlng mslabeled data. VI. CONCLUSION In ths paper, we have provded a label-confdence based boostng method that s suffcently mmune to the label nose and overfttng problems. Wth the assgnment of confdence, our proposed algorthm dstngushes between clean and contamnated nstances. In addton, the values of confdence on nstances represent dfferent levels of judgments on ther label relablty. Under the gudance of confdent nstances, CB-AdaBoost s able to mnmze the loss functon over the tranng set under the condtonal rsk functon. Moreover, n CB-AdaBoost, explct solutons for weak learners and ther coeffcents on each stage can be easly obtaned and appled practcally. In comparsons wth some common nose handlng technques and other robust algorthms, CB-AdaBoost does a better job of tacklng problems of class overlappng and mslabellng. The proposed method has some lmtatons. The computatonal complexty of the proposed CB- AdaBoost s O(n 2 d), where n s the sample sze and d s dmenson. Ths s because we need to compute or estmate label confdence of each nstance, and the KNN method for label confdence evaluaton has the computaton complexty O(n 2 d). The remanng process of the CB-AdaBoost s O(n 2 a d) wth a (0, 1). Collectvely, ths yelds an overall computatonal complexty O(n 2 d), whch may be prohbtve for large-scale applcatons. As currently formulated, the proposed method cannot drectly handle categorcal or symbolc features. A smlarty metrc on those type of features needs to be ntroduced to defne neghbors for label confdence assgnment. Contnuaton of ths work could take several drectons. A general framework of optmzaton strategy based on the condtonal rsk deserves a deeper understandng and further development. In the current work, KNN s used to estmate the confdence of each nstance. Theoretcally, the number of neghbors shall go to nfnty wth a speed slower than the sample sze to ensure strong consstency of KNN

12 12 Datasets Instances Input varables Orgnal classes Datasets Instances Input varables Orgnal classes Breast-Cancer Wne Wpbc Haberman Wdbc Vehcle Pma Banknote Aust Cardotocography Heart Waveform Glass Urban Land Cover Seeds Musk Ecol TABLE III SUMMARIES OF DATA SETS. estmator. In practce, however, a small number of neghbors seem to be suffcent. Perhaps a proof of consstence exsts wthout the condtons of k n. It wll be nterestng to study the mpact of parameter k and dscuss a proper selecton on the number of neghborhoods n practce. For example, the cross-valdaton method for choosng k deserves further nvestgaton. In fact, the problem of how to desgn a good crteron for confdence assgnment s stll open. Other methods are needed to produce hgh qualty confdences, especally when categorcal features are nvolved. CB-AdaBoost outperforms the AdaBoost for class overlappng problems, thus t s promsng to extend CB- AdaBoost for multple class classfcaton problems and other applcatons such as mage or object recognton. VII. APPENDIX Proof of Proposton 4. Let { f n } n=1 be a sequence of reference functons such that R( f n ) R. We shall prove that there exst non-negatve sequences t n, ξ n, k n, and k n /n 0 such that the followng condtons are satsfed. Unform Convergence of t n -combnatons sup R(f) R n (f) a.s. 0; f π ξn F tn Emprcal convergence for the sequence { f n } R n ( f n ) R( f n ) a.s. 0; Convergence of the KNN estmates R n,kn ( f n ) R n ( f n ) a.s. 0; Algorthm convergence of t n -combnatons R n,kn (f tn ) R n,kn ( f n ) a.s 0. (VII.1) (VII.2) (VII.3) (VII.4) Snce R n (f) s an emprcal exponental rsk, a proof of (VII.1) follows exactly the same lnes of Lemma 4 n [3] wth the Lpschtz constant L ξ = (e ξ e ξ )/(2ξ) and M ξ = e ξ. Then for any δ > 0, wth probablty at least 1 δ, sup R(f) R n (f) f π ξ F t cξl ξ (V + 1)(t + 1) log2 [2(t + 1)/ ln 2] n + M ξ 1/δ 2n, (VII.5) where V = d V C (H) and c = 24 1 ln 8e 0 ɛ dɛ. We can take 2 t = n 1 a and ξ = κ ln n wth κ > 0, a (0, 1) and 2κ a < 0 so that the rght sde of the nequalty (VII.5) converges to 0, and n the mean tme n=1 δ n <. Hence an applcaton of the Borel-Cantell lemma ensures the almost surely convergence of (VII.1). Applyng Theorem 8 of [3], we have the result of (VII.4), n whch the reference sequence f n F λn wth λ n = κ 1 ln n where κ 1 (0, 1/2). (VII.2) can be proved by Hoeffdng s nequalty f the range of f n s restrcted to the nterval [ λ n, λ n ]. That s, P ( R n ( f n ) R( f n ) ɛ n ) exp( 2nɛ 2 n/m 2 λ n ) := δ n where M λn = e λn e λn. Let λ n = κ 1 ln n wth κ 1 (0, 1/2). Lettng ɛ n 0, we stll have n=1 δ n <, and hence convergence n probablty 1 of (VII.2) holds. By the result of Theorem 1 n [8], for each KNN estmate ˆγ wth k n and k n /n 0, we have P (2 ˆγ γ > ɛ n ) exp[ nɛ 2 n/(8n 2 p )], where the constant N p s the mnmal number of cones centered at the orgn of angle π/6 that cover R p. Then wth the restrcton of f n n [ λ n, λ n ], we have P ( R n,kn ( f n ) R n ( f n ) > ɛ n ) < exp[ nɛ 2 n/(2m 2 λ n N 2 d ) + ln n] := δ n Agan, a choce of λ n = κ 1 ln n wth κ 1 (0, 1/2) guarantees δn < when ɛ n = o(1), and hence (VII.3) holds. Now we are ready to prove Proposton 4. For almost every outcome ω on the probablty space, we can defne sequences ɛ n, (ω) 0 for = 1,..., 5 so that for almost all ω the followng nequaltes are true. R(π ξn (f tn )) R n (π ξn (f tn )) + ɛ n,1 (ω) R n,kn (π ξn (f tn )) + ɛ n,2(ω) R n,kn (f tn ) + e ξn + ɛ n,2(ω) R n,kn ( f n ) + e ξn + ɛ n,3(ω) R n ( f n ) + e ξn + ɛ n,4(ω) by (VII.1) by (VII.4) by (VII.3) (VII.6) (VII.7) R( f n ) + e ξn + ɛ n,5(ω) by (VII.2) (VII.8) where ɛ n,k (ω) = k j=1 ɛ n,j(ω). Inequalty (VII.6) follows smlarly as (VII.3) wth ξ n = κ 1 ln n, where κ 1 (0, 1/2). Inequalty (VII.7) follows from the facts that e π ξn (x) <

13 13 e x + e ξn and e π ξn (x) < e x + e ξn. Then wth t n = n 1 a, ξ n = κ ln n (a > 0, κ > 0, 2κ < a) and (VII.8), by choce of the sequence { f n } F λn wth λ n = κ 1 log n, κ 1 (0, 1/2), we have R( f n ) R and R(π ξn (f tn )) R a.s. By Theorem 3 of [2], L(g(π ξn (f tn ))) a.s L. Snce for ξ n > 0 we have g(π ξn (f tn )) = g(f tn ), t follows that L(g(f tn )) a.s L. Hence, the proposed CB AdaBoostng procedure s consstent f stopped after t n steps. ACKNOWLEDGMENT Zh Xao and Bo Zhong are supported by the Chna Natonal Scence Foundaton (NSF) under Grant No The authors would lke to thank Yxn Chen for dscussng the condtonal rsk. REFERENCES [1] H. Allende-Cd et al., Robust alternatng adaboost. Progress n pattern recognton, mage analyss and applcatons. Sprnger Berln Hedelberg (2007), [2] P.L. Bartlett, M. Jordan and J.D. McAulffe, Convexty, classfcaton, and rsk bounds, J. Amer. Stat. Ass. 101 (2006) [3] P.L. Bartlett and M. Traskn, AdaBoost s consstent, J. Mach. Learn. Res. 8 (2007) [4] C.E. Brodley and M.A. Fredl, Identfyng and elmnatng mslabeled tranng nstances, In: AAAI/IAAI, 1 (1996) [5] C.E. Brodley and M.A. Fredl, Identfyng mslabeled tranng data, J. Artf. Intell. Res. 11 (1999) [6] J. Cao, S. Kwong and R. Wang, A nose-detecton based AdaBoost algorthm for mslabeled data, Pattern Recog. 45(12) (2012) [7] J. Demšar, Statstcal comparsons of classfers over multple data sets, J. Mach. Learn. Res. 7 (2006) [8] L. Devroye, L. Györf, A. Krzyżak and G. Lugos, On the strong unversal consstency of nearest neghbor regresson functon estmates, Ann. Stat. 22(3) (1994) [9] T. Detterch, An expermental comparson of three methods for constructng ensembles of decson trees: baggng, boostng, and randomzaton, Mach. Learn. 40(2) (2000) [10] C. Domngo and O. Watanabe, MadaBoost: a modfcaton of adaboost, In: COLT (2000) [11] B. Frénay and A. Kabán, A comprehensve ntroducton to label nose, In: Proceedngs of the European Symposum on Artfcal Neural Networks, Computatonal Intellgence and Machne Learnng (ESANN) (2014), Bruges, Belgum. [12] Y. Freund and R.E. Schapre, Experments wth a new boostng algorthm, In: Proceedngs of the Thrteenth Internatonal Conference on Machne Learnng (ICML) (1996), [13] J. Fredman, T. Haste and R. Tbshran, Addtve logstc regresson: a statstcal vew of boostng, Ann. Stat. 28 (2) (2000) [14] Y. Gao and F. Gao, Edted adaboost by weghted knn, Neurocomputng 73 (2010) [15] T. Haste, R. Tbshran and J. Fredman, The Elements of Statstcal Learnng: Data Mnng, Inference, and Predcton, Sprnger, New York, [16] K. Hayash, A smple extenson of boostng for asymmetrc mslabeled data, Stat. Probabl. Lett. 82 (2) (2012) [17] Y. Jang and Z. Zhou, Edtng tranng data for knn classfers wth neural network ensemble, In: Advances n Neural Networks, ISNN 2004, Sprnger, Berln Hedelberg, 2004, [18] T. Kanamor, T. Takenouch and S. Eguch, The most robust loss functon for boostng, In: Neural Informaton Processng, ICONIP 2004, Sprnger, Berln Hedelberg, 2004, [19] T. Kanamor, T. Takenouch and S. Eguch, Robust loss functons for boostng, Neural Comput. 19 (8) (2007) [20] M. Lchman, UCI Machne Learnng Repostory [ 2013, Irvne, CA: Unversty of Calforna, School of Informaton and Computer Scence. [21] C. Ln and S. Wang, Fuzzy support vector machne, IEEE Trans. Neural Netw. 13 (2) (2002) [22] H. Lu and S. Zhang, Nosy data elmnaton usng mutual k-nearest neghbor for classfcaton mnng, J. Syst. Software 85 (5) (2012) [23] G. Lugos and N. Vayats, On the Bayes-rsk consstency of regularzed boostng methods, Ann. Stat. 32 (1) (2004) [24] H. Masnad-Shraz and N. Vasconcelos, Cost-senstve boostng, IEEE T. Pattern Anal. Mach. Intell. 33 (2) (2011) [25] P. Melvlle, N. Shah, L. Mhalkova and R. Mooney, Experments on ensembles wth mssng and nosy data. In: Proc. of the Ffth Internatonal Workshop on Mult Classfer Systems (2004) [26] T. Onoda, Overfttng of boostng and regularzed Boostng algorthms, Electron. Comm. Jpn (9) (2007) [27] U. Rebbapragada and C. Brodley, Class nose mtgaton through nstance weghtng, In: Lecture Notes n Computer Scence, ECML, Sprnger, Berln Hedelberg, 2007, [28] J.A. Sáez, J. Luengo and F. Herrera. Predctng nose flterng effcacy wth data complexty measures for nearest neghbor classfcaton. Pattern Recogn. 46 (1) (2013) [29] J.S. Sánchez, R. Barandela, A.I. Marqués et al. Analyss of new technques to obtan qualty tranng sets, Pattern Recogn. Lett. 24 (7) (2003) [30] R.E. Schapre and Y. Snger, Improved boostng algorthms usng confdence-rated predctons, Mach. Learn. 37 (3) (1999) [31] R.E. Schapre and Y. Freund, Boostng: Foundatons and Algorthms. The MIT Press, [32] R.A. Servedo, Smooth boostng and learnng wth malcous nose, J. Mach. Learn. Res. 4 (2003) [33] I. Stenwart and A. Chrstmann, Support vector machnes. Sprnger, New York, [34] C.J. Stone, Consstent nonparametrc regresson, Ann. Stat. 5 (4) (1977) [35] Y. Sun, J. L and W. Hager, Two new regularzed AdaBoost algorthms, In: Proc. ICMLA (2004) 11. [36] Y. Sun, S. Todorovc and J. L, Reducng the overfttng of AdaBoost by controllng ts data dstrbuton skewness, Int. J. Pattern Recogn. 20 (7) (2006) [37] T. Takenouch and S. Eguch, Robustfyng AdaBoost by addng the nave error rate. Neural Comput. 16 (4) (2004) [38] Q. Tao, G. Wu, F. Wang and J. Wang, Posteror probablty support vector machnes for unbalanced data, IEEE T. Neural Netw. 16 (6) (2015) [39] J. Thongkam, G. Xu, Y. Zhang and F. Huang, Toward breast cancer survvablty predcton models through mprovng tranng space, Expert Syst. Appl. 36 (2009) [40] L. Utkn and Y. Zhuk, Robust boostng classfcaton models wth local sets of probablty dstrbutons, Knowl.-Based Syst., 61 (2014) [41] S. Verbaeten and A. Van Assche, Ensemble methods for nose elmnaton n classfcaton problems, In: Multple classfer systems, Sprnger, Berln Hedelberg, 2003, [42] A. Vezhnevets and V. Vezhnevets, Modest adaboost-teachng adaboost to generalze better, Graphcon Novosbrsk Akademgorodok, Russa, [43] A. Vezhnevets and O. Barnova, Avodng boostng overfttng by removng confusng samples, In: Lecture Notes n Computer Scence, ECML, Sprnger, Berln Hedelberg, 2007, [44] P. Wang, C.H. Shen, N. Barnes and H. Zheng, Fast and robust object detecton usng asymmetrc totally correctve boostng, IEEE T. Neural Netw. Learn. Syst. 23 (1) (2012) [45] W. Wang, Y. Wang, F. Chen and A. Sowmya, A weakly supervsed approach for object detecton based on soft-label boostng, IEEE Workshop on Applcatons of Computer Vson, 2013, [46] F. Yoav and E. Robert, A decson-theoretc generalzaton of on-lne learnng and an applcaton to boostng, J. Comput. Sys. Sc. 55 (1) (1997) [47] T. Zhang and B. Yu, Boostng wth early stoppng: convergence and consstency, Ann. Stat. 33 (4) (2005) [48] C.X. Zhang and J.S. Zhang, A local boostng algorthm for solvng classfcaton problems, Comput. Stat. Data An. 52 (4) (2008) [49] C.X. Zhang, J.S. Zhang and G.Y. Zhang, An effcent modfed boostng method for solvng classfcaton problems, J. Comput. Appl. Math. 214 (2) (2008) [50] D. Zhou, B. Quost and V. Frémont, Soft label based sem-supervsed boostng for classfcaton and object recognton, In: 13th Internatonal Conference on Control, Automaton, Robotcs and Vson (ICARCV), 2014, Sngapore.

14 14 Zh Xao s a professor and the char of the Informaton Management Department at the Chongqng Unversty. Hs research nterests nclude operatonal optmzaton, Statstcs, forecastng, nformaton ntellgence analyss and data mnng. Recently he focuses on soft sets, nterdscplnary bg data analyss. Dr. Xao has been the Prncpal Investgator of 50 fundng projects. He has publshed 5 textbooks and more than 100 scentfc papers n journals ncludng Knowledge-based Systems, Expert Systems wth Applcaton, Appled Mathematcal Modelng, Journal of Computatonal and Appled Mathematcs, Computers & Mathematcs wth Applcatons, etc. Professor Xao serves as Vce Executve Drector of the Chna Informaton Economcs Assocaton, as Executve Offcer of the Natonal Statstcal Socety of Chna and as Vce Presdent of the Chongqng Statstcal Socety. Zhe Luo receved the Master degree n Probablty and Mathematcal Statstcs from the Chongqng Unversty. Currently, he s an assstant manager of the Bank of Chna at Nannng. Hs research focuses on statstcal decson, pattern recognton, cluster analyss and Monte Carlo smulatons. Bo Zhong s a professor of the Statstcs and Actuary Department at the Chongqng Unversty. She s the drector of the Graduate Mathematcs Courses Program. Her research specalty s on soft set, soft computaton, rough set, statstcal learnng and relablty analyss n power systems. Dr. Zhong leads 30 fundng projects, ncludng 10 from natonal fund agents. She has publshed 70 scentfc papers n journals ncludng Expert Systems wth Applcaton and Knowledge-based Systems, etc. She s also the author of 8 textbooks. Xn Dang (M 2017) receved the PhD degree n Statstcs from the Unversty of Texas at Dallas n Currently she s an assocate professor of the Department of Mathematcs at the Unversty of Msssspp. Her research nterests nclude robust and nonparametrc statstcs, statstcal and numercal computng, and multvarate data analyss. In partcular, she has focused on data depth and applcaton, bonformatcs, machne learnng, and robust procedure computaton. Dr. Dang s a member of the Insttute of Mathematcal Statstcs, the Amercan Statstcal Assocaton, the Internatonal Chnese Statstcal Assocaton, the Internatonal Neural Network Socety and the IEEE. For further nformaton, see home.olemss.edu/ xdang/.

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

We present the algorithm first, then derive it later. Assume access to a dataset {(x i, y i )} n i=1, where x i R d and y i { 1, 1}.

We present the algorithm first, then derive it later. Assume access to a dataset {(x i, y i )} n i=1, where x i R d and y i { 1, 1}. CS 189 Introducton to Machne Learnng Sprng 2018 Note 26 1 Boostng We have seen that n the case of random forests, combnng many mperfect models can produce a snglodel that works very well. Ths s the dea

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

CSC 411 / CSC D11 / CSC C11

CSC 411 / CSC D11 / CSC C11 18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Lecture 4. Instructor: Haipeng Luo

Lecture 4. Instructor: Haipeng Luo Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

Chapter 13: Multiple Regression

Chapter 13: Multiple Regression Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014 COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #16 Scrbe: Yannan Wang Aprl 3, 014 1 Introducton The goal of our onlne learnng scenaro from last class s C comparng wth best expert and

More information

Bayesian predictive Configural Frequency Analysis

Bayesian predictive Configural Frequency Analysis Psychologcal Test and Assessment Modelng, Volume 54, 2012 (3), 285-292 Bayesan predctve Confgural Frequency Analyss Eduardo Gutérrez-Peña 1 Abstract Confgural Frequency Analyss s a method for cell-wse

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands Content. Inference on Regresson Parameters a. Fndng Mean, s.d and covarance amongst estmates.. Confdence Intervals and Workng Hotellng Bands 3. Cochran s Theorem 4. General Lnear Testng 5. Measures of

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud Resource Allocaton wth a Budget Constrant for Computng Independent Tasks n the Cloud Wemng Sh and Bo Hong School of Electrcal and Computer Engneerng Georga Insttute of Technology, USA 2nd IEEE Internatonal

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

MAXIMUM A POSTERIORI TRANSDUCTION

MAXIMUM A POSTERIORI TRANSDUCTION MAXIMUM A POSTERIORI TRANSDUCTION LI-WEI WANG, JU-FU FENG School of Mathematcal Scences, Peng Unversty, Bejng, 0087, Chna Center for Informaton Scences, Peng Unversty, Bejng, 0087, Chna E-MIAL: {wanglw,

More information

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests Smulated of the Cramér-von Mses Goodness-of-Ft Tests Steele, M., Chaselng, J. and 3 Hurst, C. School of Mathematcal and Physcal Scences, James Cook Unversty, Australan School of Envronmental Studes, Grffth

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

CSE 546 Midterm Exam, Fall 2014(with Solution)

CSE 546 Midterm Exam, Fall 2014(with Solution) CSE 546 Mdterm Exam, Fall 014(wth Soluton) 1. Personal nfo: Name: UW NetID: Student ID:. There should be 14 numbered pages n ths exam (ncludng ths cover sheet). 3. You can use any materal you brought:

More information

Multilayer Perceptron (MLP)

Multilayer Perceptron (MLP) Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

Instance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification

Instance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification Instance-Based earnng (a.k.a. memory-based learnng) Part I: Nearest Neghbor Classfcaton Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA 4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

The Study of Teaching-learning-based Optimization Algorithm

The Study of Teaching-learning-based Optimization Algorithm Advanced Scence and Technology Letters Vol. (AST 06), pp.05- http://dx.do.org/0.57/astl.06. The Study of Teachng-learnng-based Optmzaton Algorthm u Sun, Yan fu, Lele Kong, Haolang Q,, Helongang Insttute

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution Department of Statstcs Unversty of Toronto STA35HS / HS Desgn and Analyss of Experments Term Test - Wnter - Soluton February, Last Name: Frst Name: Student Number: Instructons: Tme: hours. Ads: a non-programmable

More information

Chapter 8 Indicator Variables

Chapter 8 Indicator Variables Chapter 8 Indcator Varables In general, e explanatory varables n any regresson analyss are assumed to be quanttatve n nature. For example, e varables lke temperature, dstance, age etc. are quanttatve n

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

Natural Language Processing and Information Retrieval

Natural Language Processing and Information Retrieval Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support

More information

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018 INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton

More information

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs

More information

The Minimum Universal Cost Flow in an Infeasible Flow Network

The Minimum Universal Cost Flow in an Infeasible Flow Network Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran

More information

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of

More information

STAT 3008 Applied Regression Analysis

STAT 3008 Applied Regression Analysis STAT 3008 Appled Regresson Analyss Tutoral : Smple Lnear Regresson LAI Chun He Department of Statstcs, The Chnese Unversty of Hong Kong 1 Model Assumpton To quantfy the relatonshp between two factors,

More information

Motion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong

Motion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong Moton Percepton Under Uncertanty Hongjng Lu Department of Psychology Unversty of Hong Kong Outlne Uncertanty n moton stmulus Correspondence problem Qualtatve fttng usng deal observer models Based on sgnal

More information

Estimation: Part 2. Chapter GREG estimation

Estimation: Part 2. Chapter GREG estimation Chapter 9 Estmaton: Part 2 9. GREG estmaton In Chapter 8, we have seen that the regresson estmator s an effcent estmator when there s a lnear relatonshp between y and x. In ths chapter, we generalzed the

More information

SDMML HT MSc Problem Sheet 4

SDMML HT MSc Problem Sheet 4 SDMML HT 06 - MSc Problem Sheet 4. The recever operatng characterstc ROC curve plots the senstvty aganst the specfcty of a bnary classfer as the threshold for dscrmnaton s vared. Let the data space be

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Uncertainty in measurements of power and energy on power networks

Uncertainty in measurements of power and energy on power networks Uncertanty n measurements of power and energy on power networks E. Manov, N. Kolev Department of Measurement and Instrumentaton, Techncal Unversty Sofa, bul. Klment Ohrdsk No8, bl., 000 Sofa, Bulgara Tel./fax:

More information

VQ widely used in coding speech, image, and video

VQ widely used in coding speech, image, and video at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng

More information

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9 Chapter 9 Correlaton and Regresson 9. Correlaton Correlaton A correlaton s a relatonshp between two varables. The data can be represented b the ordered pars (, ) where s the ndependent (or eplanator) varable,

More information

UNR Joint Economics Working Paper Series Working Paper No Further Analysis of the Zipf Law: Does the Rank-Size Rule Really Exist?

UNR Joint Economics Working Paper Series Working Paper No Further Analysis of the Zipf Law: Does the Rank-Size Rule Really Exist? UNR Jont Economcs Workng Paper Seres Workng Paper No. 08-005 Further Analyss of the Zpf Law: Does the Rank-Sze Rule Really Exst? Fungsa Nota and Shunfeng Song Department of Economcs /030 Unversty of Nevada,

More information

Chapter - 2. Distribution System Power Flow Analysis

Chapter - 2. Distribution System Power Flow Analysis Chapter - 2 Dstrbuton System Power Flow Analyss CHAPTER - 2 Radal Dstrbuton System Load Flow 2.1 Introducton Load flow s an mportant tool [66] for analyzng electrcal power system network performance. Load

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have

More information

Excess Error, Approximation Error, and Estimation Error

Excess Error, Approximation Error, and Estimation Error E0 370 Statstcal Learnng Theory Lecture 10 Sep 15, 011 Excess Error, Approxaton Error, and Estaton Error Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton So far, we have consdered the fnte saple

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information

CHAPTER IV RESEARCH FINDING AND DISCUSSIONS

CHAPTER IV RESEARCH FINDING AND DISCUSSIONS CHAPTER IV RESEARCH FINDING AND DISCUSSIONS A. Descrpton of Research Fndng. The Implementaton of Learnng Havng ganed the whole needed data, the researcher then dd analyss whch refers to the statstcal data

More information

Design and Optimization of Fuzzy Controller for Inverse Pendulum System Using Genetic Algorithm

Design and Optimization of Fuzzy Controller for Inverse Pendulum System Using Genetic Algorithm Desgn and Optmzaton of Fuzzy Controller for Inverse Pendulum System Usng Genetc Algorthm H. Mehraban A. Ashoor Unversty of Tehran Unversty of Tehran h.mehraban@ece.ut.ac.r a.ashoor@ece.ut.ac.r Abstract:

More information

Basically, if you have a dummy dependent variable you will be estimating a probability.

Basically, if you have a dummy dependent variable you will be estimating a probability. ECON 497: Lecture Notes 13 Page 1 of 1 Metropoltan State Unversty ECON 497: Research and Forecastng Lecture Notes 13 Dummy Dependent Varable Technques Studenmund Chapter 13 Bascally, f you have a dummy

More information

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING MACHINE LEANING Vasant Honavar Bonformatcs and Computatonal Bology rogram Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/

More information

Economics 130. Lecture 4 Simple Linear Regression Continued

Economics 130. Lecture 4 Simple Linear Regression Continued Economcs 130 Lecture 4 Contnued Readngs for Week 4 Text, Chapter and 3. We contnue wth addressng our second ssue + add n how we evaluate these relatonshps: Where do we get data to do ths analyss? How do

More information

Learning Theory: Lecture Notes

Learning Theory: Lecture Notes Learnng Theory: Lecture Notes Lecturer: Kamalka Chaudhur Scrbe: Qush Wang October 27, 2012 1 The Agnostc PAC Model Recall that one of the constrants of the PAC model s that the data dstrbuton has to be

More information

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition) Count Data Models See Book Chapter 11 2 nd Edton (Chapter 10 1 st Edton) Count data consst of non-negatve nteger values Examples: number of drver route changes per week, the number of trp departure changes

More information

Linear Classification, SVMs and Nearest Neighbors

Linear Classification, SVMs and Nearest Neighbors 1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush

More information

Chapter 5 Multilevel Models

Chapter 5 Multilevel Models Chapter 5 Multlevel Models 5.1 Cross-sectonal multlevel models 5.1.1 Two-level models 5.1.2 Multple level models 5.1.3 Multple level modelng n other felds 5.2 Longtudnal multlevel models 5.2.1 Two-level

More information

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor Taylor Enterprses, Inc. Control Lmts for P Charts Copyrght 2017 by Taylor Enterprses, Inc., All Rghts Reserved. Control Lmts for P Charts Dr. Wayne A. Taylor Abstract: P charts are used for count data

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

Semi-supervised Classification with Active Query Selection

Semi-supervised Classification with Active Query Selection Sem-supervsed Classfcaton wth Actve Query Selecton Jao Wang and Swe Luo School of Computer and Informaton Technology, Beng Jaotong Unversty, Beng 00044, Chna Wangjao088@63.com Abstract. Labeled samples

More information

THE SUMMATION NOTATION Ʃ

THE SUMMATION NOTATION Ʃ Sngle Subscrpt otaton THE SUMMATIO OTATIO Ʃ Most of the calculatons we perform n statstcs are repettve operatons on lsts of numbers. For example, we compute the sum of a set of numbers, or the sum of the

More information

Sparse Gaussian Processes Using Backward Elimination

Sparse Gaussian Processes Using Backward Elimination Sparse Gaussan Processes Usng Backward Elmnaton Lefeng Bo, Lng Wang, and Lcheng Jao Insttute of Intellgent Informaton Processng and Natonal Key Laboratory for Radar Sgnal Processng, Xdan Unversty, X an

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.

More information

Negative Binomial Regression

Negative Binomial Regression STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

Gaussian Mixture Models

Gaussian Mixture Models Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous

More information

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1 On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS Avalable onlne at http://sck.org J. Math. Comput. Sc. 3 (3), No., 6-3 ISSN: 97-537 COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

More information

Computing MLE Bias Empirically

Computing MLE Bias Empirically Computng MLE Bas Emprcally Kar Wa Lm Australan atonal Unversty January 3, 27 Abstract Ths note studes the bas arses from the MLE estmate of the rate parameter and the mean parameter of an exponental dstrbuton.

More information

On the correction of the h-index for career length

On the correction of the h-index for career length 1 On the correcton of the h-ndex for career length by L. Egghe Unverstet Hasselt (UHasselt), Campus Depenbeek, Agoralaan, B-3590 Depenbeek, Belgum 1 and Unverstet Antwerpen (UA), IBW, Stadscampus, Venusstraat

More information

Comparison of Regression Lines

Comparison of Regression Lines STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence

More information

Games of Threats. Elon Kohlberg Abraham Neyman. Working Paper

Games of Threats. Elon Kohlberg Abraham Neyman. Working Paper Games of Threats Elon Kohlberg Abraham Neyman Workng Paper 18-023 Games of Threats Elon Kohlberg Harvard Busness School Abraham Neyman The Hebrew Unversty of Jerusalem Workng Paper 18-023 Copyrght 2017

More information

Supplementary Notes for Chapter 9 Mixture Thermodynamics

Supplementary Notes for Chapter 9 Mixture Thermodynamics Supplementary Notes for Chapter 9 Mxture Thermodynamcs Key ponts Nne major topcs of Chapter 9 are revewed below: 1. Notaton and operatonal equatons for mxtures 2. PVTN EOSs for mxtures 3. General effects

More information

Vapnik-Chervonenkis theory

Vapnik-Chervonenkis theory Vapnk-Chervonenks theory Rs Kondor June 13, 2008 For the purposes of ths lecture, we restrct ourselves to the bnary supervsed batch learnng settng. We assume that we have an nput space X, and an unknown

More information

x = , so that calculated

x = , so that calculated Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to

More information