A Smoothed Boosting Algorithm Using Probabilistic Output Codes

Size: px

Start display at page:

Download "A Smoothed Boosting Algorithm Using Probabilistic Output Codes"

Wesley Ferguson
5 years ago
Views:

1 A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang School of Coputer Science, Carnegie Mellon University, PA 15213, USA Abstract AdaBoost.OC has shown to be an effective ethod in boosting weak binary classifiers for ulti-class learning. It eploys the Error Correcting Output Code (ECOC) ethod to convert a ulti-class learning proble into a set of binary classification probles, and applies the AdaBoost algorith to solve the efficiently. In this paper, we propose a new boosting algorith that iproves the AdaBoost.OC algorith in two aspects: 1) It introduces a soothing echanis into the boosting algorith to alleviate the potential overfitting proble with the AdaBoost algorith, and 2) It introduces a probabilistic coding schee to generate binary codes for ultiple classes such that training errors can be efficiently reduced. Epirical studies with seven UCI datasets have indicated that the proposed boosting algorith is ore robust and effective than the AdaBoost.OC algorith for ulti-class learning. 1. Introduction AdaBoost has been shown to be an effective ethod for iproving the classification accuracy of weak classifiers (Freund & Schapire, 1996; Freund & Schapire, 1995; Schapire, 1997; Grove & Schuurans, 1998; Bauer & Kohavi, 1999; Ratsch et al., 1999; Ratsch et al., 2000; Schapire, 1999; Lebanon & Lafferty, 2001; Jin et al., 2003). It works by repeatedly running a weak learner on various training exaples sapled fro the original training pool, and cobining ul- Appearing in Proceedings of the 22 nd International Conference on Machine Learning, Bonn, Gerany, Copyright 2005 by the author(s)/owner(s). tiple instances of the week learner into a single coposite classifier. AdaBoost.OC (Schapire, 1997) extends the AdaBoost algorith to ulti-class learning by cobining it with the ethod of Error-Correcting Output Code (ECOC) (Dietterich & Bakiri, 1995). It reduces a ulti-class learning proble into a set of binary-class learning probles by encoding each class into a binary vector. A binary classifier is learned for every bit in the binary representation of classes, and is linearly cobined to ake predictions for test exaples. Epirical studies have shown that AdaBoost.OC is effective in boosting binary classifiers for ulti-class learning. Despite its success, there are two probles with AdaBoost.OC that have not been well addressed in the past: The overfitting proble Previous studies have shown that the AdaBoost algorith tends to overfit training exaples when they are noisy (Ratsch et al., 1999; Dietterich, 2000; Jin et al., 2003; Jiang, 2001). Since AdaBoost.OC is based on the AdaBoost algorith, we expect that it will suffer fro the overfitting proble as AdaBoost does. Although there have been any studies on alleviating the overfitting proble with AdaBoost (Ratsch et al., 1999; Schapire, 1999; Ratsch et al., 2000; Lebanon & Lafferty, 2001; Jin et al., 2003; Ratsch et al., 1998), no research has been done for its ulti-class version (i.e., AdaBoost.OC). Effective coding schees. AdaBoost.OC eploys the ECOC ethod to reduce a ulti-class learning proble into a sequence of binary classification probles. Hence, the choice of coding schees can have a significant ipact on the perforance of AdaBoost.OC. In (Schapire, 1997), the author exained several coding schees for AdaBoost.OC. But, none of the provided no-

2 SoothBoost Using Probabilistic Output Codes ticeable iproveent over a siple rando code. In this paper, we present a new boosting algorith that explicitly addresses the above two probles: To alleviate the overfitting proble, a soothing echanis is introduced into the boosting algorith. In particular, a bounded weight is assigned to each individual exaple during every iteration. This is in contrast to AdaBoost.OC, where weights assigned to individual exaples are unbounded and noisy data points can be overly ephasized during the iterations when they are assigned with extreely large weights. To convert a ulti-class learning proble into a set of binary classification probles, a probabilistic coding schee is introduced to generate binary codes for ultiple classes. Copared to a siple rando code, the new coding schee is advantageous in that it autoatically adapts to the perforance of basis classifiers and therefore is ore efficient in reducing training errors. Unlike the deterinistic coding schees (Schapire, 1997) whose success heavily depends on their assuption of basis classifiers, the probabilistic coding schee is robust even when its assuption about basis classifiers is violated. The rest of this paper is arranged as follows: Section 2 reviews the related work; Section 3 describes the soothed boosting algorith for ulti-class learning and the probabilistic output code; Section 4 presents the experient results; Section 5 concludes this paper with the future work. 2. Related Work The ost relevant work is the AdaBoost.OC algorith (Schapire, 1997), which cobines the AdaBoost algorith with the ECOC ethod for ulti-class learning. Figure 1 describes the detailed steps of the AdaBoost.OC algorith. It iteratively refines a ulticlass classifier H(x). At the t-th iteration, a coding function f t (y) is first generated to ap any class label y into the binary set { 1, +1}. Based on the coding function f t (y), a binary classifier h t (x) is trained over exaples that are weighted by the distribution D t (i). The learned classifier h t (x) is then linearly cobined with the binary classifiers that are learned in previous iterations to ake prediction for test exaples. The cobination paraeter α t is coputed based on the classification error ɛ t. Finally, the weight distribution D t (i, y) is updated using the cobination paraeter α t. Given: (x 1,y 1 ),..., (x,y ) where x i X,y i Y Initialize D 1 (i, y) =[y y i ]/(( Y 1)) For t =1,..., T 1. Generate coding function f t (y) :Y {0, 1} 2. Let U t = y Y D t(i, y)[f t (y) f t (y i )], and D t (i) = y Y D t(i, y)[f t (y) f t (y i )]/U t 3. Train a weak classifier h t (x) :X {0, 1} on exaples weighted by D t 4. Let ɛ t = D t(i)[f t (y i ) h t (x i )] ( ) 5. Let α t = 1 2 ln 1 ɛ t ɛ t 6. Update the weight distribution D t+1 (i, y) = D t (i, y)exp(α t ([f t (y i ) h t (x i )] + [f t (y) =h t (x i )])) Z t+1 where Z t+1 is a noralization factor (chosen so that D t+1 (i, y) is su to 1). Output: H(x) = arg ax y Y T t=1 α t[h t (x) =f t (y)] Figure 1. Description of the AdaBoost.OC algorith One proble with AdaBoost.OC is that it could overfit the training data. Note that weight D t (i, y) in Figure 1 can be arbitrarily close to 1. Given a data point is noisy, it ay be isclassified ultiple ties. As a result, its weight can be considerably larger than the weights of other data points, which could lead AdaBoost.OC to overfit training exaples. Another proble with AdaBoost.OC is how to deterine the coding function f t (y) in each iteration. In (Schapire, 1997), the author proved that the training error of AdaBoost.OC is upper bounded by t 1 4(γt U t ) 2 where γ t =1/2 ɛ t,andtheexpression for U t and ɛ t are presented in Figure 1. Based on this analysis, the author suggested that a good coding function f t (y) should axiize U t. However, this analysis ignores the fact that both factors U t and γ t are influenced by the choice of the coding function f t (y). As a result, a coding function f t (y) that axiizes U t can result in a sall value for γ t, which ay lead to a loose upper bound for the training error. This work is also related to previous studies on preventing AdaBoost fro overfitting training exaples. In the past, any approaches have been proposed, including the soothing approaches (Schapire, 1999; Jin et al., 2003), the regularization approaches (Ratsch

3 SoothBoost Using Probabilistic Output Codes et al., 1999; Lebanon & Lafferty, 2001), and the axiu argin approaches (Grove & Schuurans, 1998; Ratsch et al., 2000). However, ost of the target on binary-class classification probles, which ay not be the ost effective solution for ulti-class learning. 3. A Soothed Boosting Algorith Using the Probabilistic Output Code In this section, we first describe a soothed boosting algorith that cobines ultiple binary classifiers for ulti-class learning, followed by the description of a probabilistic coding schee that autoatically generates coding functions during the iterations of boosting A Soothed Boosting Algorith For Multi-class Learning Siilar to AdaBoost.OC, the proposed soothed boosting algorith applies an iterative procedure to boost binary classifiers for ulti-class learning. In each iteration, a new coding function is first generated to ap ultiple classes into a binary set. Then, a binary classifier is learned based on the given coding function. The final ulti-class classifier is a linear cobination of the binary-class classifiers that are learned in the iterations. Let the code functions for the first T iterations denoted by f T (y) =(f 1 (y),..., f T (y)) where each coding function f t (y) : Y { 1, +1}. Let g T (x) = (α 1 h 1 (x),..., α T h T (x)) be the weighted classifiers obtained in the first T iterations where h t (x) : X { 1, +1} and α t is a weight for h t (x). Using the above notations, the cobined classifier H(x) for the first T iterations is rewritten as: H T (x) = arg ax f T (y) g T (x) (1) y Y Then, the training error at the T -th iteration err T is written as: err T = 1 [H T (x i ) y i ] = 1 ( ) I ax( f T (y) f T (y i )) g T (x i ) where I(x) is an indicator function that outputs 1 when the input x is positive and zero otherwise. To facilitate the coputation, we upper bound the training error by the following expression: err T 1 (1 + λ) f e T (y) g T (x i) e ft (yi) gt (xi) + λ e (2) f T (y) g T (x i) where λ is a soothing paraeter that prevents the overfitting of training exaples. Notice that the contribution of each training exaple to the upper bound in (2) is bounded by factor 1 + 1/λ. As will be shown later, AdaBoost.OC corresponds to the case when λ 0, which leads to the unbounded contribution fro each training exaple. As an iterative procedure, the goal of our boosting algorith is to learn classifier h T +1 (x), cobination constant α T +1, and coding function f T +1 (y) atthe T + 1 iteration given H T (x) learned fro the T -th iteration. To this end, we rewrite the training error at T + 1 iteration into the following expression: err T +1 1 = 1 λ + λ + 1+λ exp( f T +1(y i) g T +1(x i)) exp( f T +1(y) g T +1(x i)) 1+λ exp( f T (y i) g T (x i)+f(y i)g(x i)) exp( f T (y) g T (x i)+f(y)g(x i)) (3) In the above, we drop the index T +1 and siply write f T +1 (y) andg T +1 (x) asf(y) andg(x), respectively. Define μ(y x i )= exp( f T (y) g T (x i )) exp( f T (y i ) g T (x i )) + λ exp( f T (y ) g T (x i )) (4) y y i Then, the training error in (3) can be rewritten as: err T +1 (1 + λ) μ(y x 1 i )e f(y)g(xi) μ(y i x i )e f(yi)g(xi) + λ μ(y x i )e f(y)g(xi) 1 Using inequality px+qy p x + q y if p + q = 1 and x, y, p, q > 0, the above equation is relaxed as: err T +1 λ +1 + λ +1 λ μ(y x i ) y yi μ(y i x i ) μ(y x i )e αh(xi)(f(y) f(yi)) (5) For the convenience of presentation, we drop the first ter in the above expression since it is independent fro f(y), α, and h(x). Using the convexity of exponential function, the above expression is then sipli- 2

4 SoothBoost Using Probabilistic Output Codes fied as: err T +1 1+λ 2 μ(y i x i )e 2αh(xi)f(yi) μ(y x i )(1 f(y)f(y i )) y + μ(y i x i ) μ(y x i )(1 + f(y)f(y i )) (6) Define weight distributions D T +1 (i, y)andd T +1 (i)as: D T +1 (i, y) = μ(y i x i )μ(y x i ) (7) Z T +1 y D T +1 (i) = D T +1(i, y)[f(y i ) f(y)] (8) U T +1 where Z T +1 and U T +1 are the noralization factors. Then, Eqn. (6) can be siplified as: err T +1 Z T +1 U T +1 +Z T +1 e 2αh(xi)f(yi) D T +1 (i) D T +1 (i, y)(1 + f(y)f(y i )) (9) To iniize the upper bound in the above expression, we will train a classifier h(x) on the training exaples that are weighted by D T +1 (i). Furtherore, given a coding function f(y) and a binary classification function h(x), the cobination weight α that iniizes the upper bound in Eqn. (9) is: where ɛ T +1 = α = 1 4 ln ( 1 ɛt +1 ɛ T +1 ) (10) D T +1 (i)[f(y i ) h(x i )] (11) Note that when λ = 0, our weight distribution D T +1 (i, y) in (7) becoes the sae expression as the one in Figure (1). This is because when λ =0,we have μ(y x i ) = exp(( f T (y) f T (y i )) g T (x i )) Given: (x 1,y 1 ),..., (x,y ) where x i X,y i Y Initialize μ 1 (y x i )=1/(1 + λ( Y 1)) For t =1,..., T 1. Copute D t (i, y) = μ t (y i x i )μ t (y x i )/Z t where Z t is a noralization factor. 2. Generate coding function f t (y) :Y {0, 1} 3. Let U t = y Y D t(i, y)[f t (y) f t (y i )] 4. Let D t (i) = y Y D t(i, y)[f t (y) f t (y i )]/U t 5. Train a weak classifier h t (x) :X {0, 1} on exaples weighted by D t (i) 6. Let ɛ t = D t(i)[f t (y i ) h t (x i )] ( ) 7. Let α t = 1 4 ln 1 ɛ t ɛ t 8. Update μ t+1 (y x i )as μ t+1 (y x i )= μ t (y x i )exp(αf t (y)h t (x i )) μ t (y i x i )e αft(yi)ht(xi) + λ y y i μ t (y x i )eαf t (y )h t (x i ) Output: H(x) = arg ax y Y T α t[h t (x) =f t (y)] Figure 2. Description of the SoothBoost algorith 3.2. The Probabilistic Output Code The difficulty in deterining coding functions arises fro the fact that h(x) and f(y) are coupled with each other through the ter exp(α(x i )h(x i )(f(y) f(y i )) in Eqn. (5). Thus, to identify the optial coding function f(y), we need to ake certain assuption about classifier h(x). In this paper, we assue h(x) to be a rando classifier, which outputs 1 with probability ρ and 1 with probability 1 ρ. Given the syetry between +1 and 1, we assue 0 ρ 1/2. Then, the upper bound in (5) is siplified as:. Then, D T +1 (i, y) = 1 Z T +1 μ(y i x i )μ(y x i ) = D T (i, y) 2αT ([ft (y)=ht (xi)]+[ft (yi) ht (xi)]) e Z T +1 Thus, the proposed boosting algorith can be viewed as the generalized version of AdaBoost.OC algorith. We refer to the proposed soothed boosting algorith as SoothBoost. The detailed steps of the SoothBoost algorith is suarized in Figure 2. err T +1 1+λ μ(y i x i ) ρ μ(y x i )e α[f(y) f(yi)] + (1 ρ) μ(y x i )e α[f(yi) f(y)] The above expression can be relaxed as: err T +1 1+λ (12) μ(y i x i )μ(y x i )(e 2α + e 2α +1)

5 + α(1 2ρ) 1+λ SoothBoost Using Probabilistic Output Codes μ(y i x i )μ(y x i )(f(y) f(y i )) using inequality e αx e α + e α +1 αx for 1 x 1. Thus, a coding function f(y) that effectively reduces the upper bound in the above equation should iniize the following expression: Ω(f) = 1 2ρ where φ(y) = 1 = (1 2ρ) y μ(y i x i )μ(y x i )(f(y) f(y i )) f(y)φ(y) ( μ(y i x i ) μ(y x i ) δ(y, y i ) y μ(y x i ) and δ(y, y ) outputs 1 when y = y and zero otherwise. Hence, the optial solution is: ) f(y) = I(φ(y)) (13) We refer to the above expression as the Deterinistic Output Code. One proble with the above approach is that the bound in Eqn. (12) is valid only when the output of classifier h(x) follows the Bernoulli distribution with probability ρ. When this is not the case, the deterinistic code will no long be optial. To relieve the dependence of the optial coding function on the assuption of the basis classifier, we consider a probabilistic coding schee. In particular, we view Ω(f) as a log-likelihood function for f(y), i.e., the probabilistic output code, separately. To evaluate the perforance of SoothBoost for ulti-class learning, we copare it to three baseline algoriths that also cobine binary classification odels for ulticlass learning. They are: 1) the ECOC ethod that uses a rando code, 2) the AbaBoost.OC algorith in (Schapire, 1997), and 3) the one-against-all approach that uses the regularized boosting algorith (Jin et al., 2003) as its binary classifier. We refer to these three baseline odels as ECOC, AdaBoost.OC, and BinBoost.Reg. To evaluate the proposed coding schee, we copare it to a rando code. We also copare the probabilistic output code to the deterinistic one to see if the probabilistic output code is ore robust and effective for ulti-class learning Experiental Design Seven ulti-class datasets fro the UCI achine learning repository are used. The statistics of these datasets are listed in Table 1. In order to exaine the robustness of the proposed algoriths, we conducted experients with noisy data, which are generated by randoly selecting 20% of training data and assigning the with incorrect labels. A decision stup is used as the basis classifier throughout the experient. All boosting algoriths generate 50 binary classifiers and cobine the to ake prediction. The choice of 50 iterations is based on the previous studies on AdaBoost (e.g., (Dietterich, 2000)). For each dataset, 60% of data are randoly selected for training and the rest is used for testing. Each experient is repeated 10 ties and the average classification error together its variance is used as the evaluation etric. log Pr(f) (1 2ρ) y Then, Pr(f(y) = 1) is written as: Pr(f(y) =1)= f(y)φ(y) 1 1+exp( γφ(y)) (14) where γ 1 2ρ. To generate a coding function f(y), we will saple the value for f(y) according to the distribution in (14). We refer to this ethod as the Probabilistic Output Code. Paraeter γ is deterined by the cross validation ethod with 20% of training data used for validation. 4. Experient In this experient, we exaine the effectiveness of the SoothBoost algorith and the effectiveness of 4.2. Experient (I): Effectiveness of SoothBoost We tested the SoothBoost algorith on the UCI datasets. A rando code is used for the proposed boosting algorith. To see the effectiveness of SoothBoost for ulti-class learning, we also evaluate the perforance of the three baseline ethods using the sae datasets. Table 2 suarizes the classification errors of the four ethods. First, copared to the ECOC ethod, we see that all three boosting algoriths are effective in reducing classification errors. For exaple, for dataset ecoli, its classification error is reduced fro 14.4% to around 3% by all three boosting algoriths. Second, we see that for ost datasets, SoothBoost is ore effective than the other two boosting ethods in iproving the classification accuracy for ulti-class learning. For in-

6 SoothBoost Using Probabilistic Output Codes Table 1. Properties of test datasets Method ecoli wine pendigit iris glass vehicle yeast physiological #Instance #Classes #Features Table 2. Classification errors (%) for the original UCI datasets. Method ecoli wine pendigit iris glass vehicle yeast ECOC 14.4± ± ± ± ± ± ± 5.5 AdaBoost.OC 3.5± ± ± ± ± ± ± 2.6 BinBoost.Reg 3.4± ± ± ± ± ± ± 2.9 SoothBoost 2.8± ± ± ± ± ± ± 3.1 stance, for dataset vehicle, SoothBoost reduces the training error to 21.4% while the classification errors of AdaBoost.OC and BinBoost.Reg are over 26%. In order to see the robustness of SoothBoost with regard to training noises, we tested both the Sooth- Boost algorith and the three baseline ethods on the corrupted datasets in which 20% of training data are incorrectly labeled. The classification errors of the four ethods are displayed in Table 3. First, coparing Table 3 to Table 2, we observed substantial degradation in the classification errors for the ECOC ethod. The ost noticeable cases are dataset wine and pendigit, whose classification errors have increased fro 20.3% and 10.5% to 27.2% and 15.6%, respectively. Second, copared to the ECOC ethod, AdaBoost.OC has experienced ore severe degradation in its classification errors when training data are noisy. For exaple, for dataset ecoli and pendigit, the classification errors of AdaBoost.OC have increased draatically fro 3.5% and 2.7% to 12.8% and 13.9%, respectively. Third, copared to AdaBoost.OC, both BinBoost.Reg and SoothBoost are ore robust to training noise. For all datasets, they suffer fro less degradation than AdaBoost.OC in classification errors. This is because both BinBoost.Reg and SoothBoost eploy certain echanis to alleviate the overfitting proble while AdaBoost.OC does not. Finally, coparing the SoothBoost algorith to BinBoost.Reg, we found that SoothBoost is ore resilient to training noise. For exaple, for dataset pendigit, given 20% training noise, the classification error for SoothBoost is only 6.4% while the classification error for Bin- Boost.Reg is over 12%. Although BinBoost.Reg eploys the regularized boosting algorith as its binary classifier, it is less effective than SoothBoost because the regularized boosting algorith is designed for binary classification, not for ulti-class learning. Based on the above observation, we conclude that Sooth- Boost is effective for ulti-class learning and is robust to training noise Experient (II): Effectiveness of Proposed Coding Schees In this experient, we first exaine the effectiveness of the probabilistic output code, followed by the coparison between the probabilistic output code and the deterinistic code. All experients discussed in this subsection use the SoothBoost algorith Effectiveness of the Probabilistic Output Code In this subsection, we exaine the efficiency and the robustness of the probabilistic output code by coparingittoarandocode. To test the efficiency of the probabilistic output code, we run each experient twice with different nuber of iterations: a long run with 50 iterations, and a interediate run with only 20 iterations. The classification errors of the probabilistic code and the rando code for both runs are suarized in Table 4. We see that both coding schees achieve siilar perforance when the nuber of iterations is 50. The difference becoes noticeable when the nuber of iterations is liited to 20. For dataset wine, glass, and vehicle, the

7 SoothBoost Using Probabilistic Output Codes Table 3. Classification errors (%) for the corrupted UCI datasets with 20% training noise. Method ecoli wine pendigit iris glass vehicle yeast ECOC 18.7± ± ± ± ± ± ± 5.8 AdaBoost.OC 12.8 ± ± ± ± ± ± ± 6.2 BinBoost.Reg 9.3± ± ± ± ± ± ± 3.8 SoothBoost 9.7 ± ± ± ± ± ± ± 4.3 Table 4. Coparison of a rando code with the probabilistic output code for original UCI datasets. #Iteration Coding ecoli wine pendigit iris glass vehicle yeast Method 50 Rando 2.8± ± ± ± ± ± ± 3.1 Prob. 2.9± ± ± ± ± ± ± Rando 5.8± ± ± ± ± ± ± 3.3 Prob. 4.9± ± ± ± ± ± ± 2.3 classification errors of the probabilistic output code is noticeably lower than the rando code. This fact indicates that the probabilistic output code is ore efficient than a rando code in reducing the classification errors. To see the robustness of the probabilistic code with regard to training noise, we test it on the corrupted UCI datasets where 20% of the training data are labeled incorrectly. The classification errors of both coding schees are listed in Table 5. Overall, the probabilistic output code perfors slightly better than the rando code. The iproveent is ore noticeable for dataset ecoli and iris. The probabilistic output code outperfors a rando code with 6.1% vs. 9.7% for dataset ecoli and 5.5% vs. 8.0% for dataset iris. This result indicates that the probabilistic output code is ore resilient to training noise than the rando code The Probabilistic Output Code Vs. The Deterinistic Output Code The classification results of both the probabilistic output code and the deterinistic output code for the original UCI datasets are suarized in Table 6. We clearly see that the deterinistic output code perfors substantially worse than the probabilistic output code. The ost noticeable cases are dataset iris and glass. The classification errors of the deterinistic output code for these two datsets are 11.5% and 38.4%, while the classification errors of the probabilistic output code are only 4.2% and 11.5%. This result indicates that the probabilistic output code is ore effective for ulti-class learning than the deterinistic output code. As discussed earlier, the deterinistic output code is heavily dependent on the assuption of the basis classifier. In contrast, the probabilistic output code is able to alleviate its dependence on the assuption of the basis classifier by probabilistically generating a coding function. 5. Conclusion In this paper, we propose a new boosting algorith that is able to boost binary classifiers for ulti-class learning probles. It iproves the AdaBoost.OC algorith in two aspects: It introduces a soothing echanis, naed SoothBoost, into the boosting algorith to alleviate the overfitting proble. It introduces a probabilistic coding schee to efficiently reduce the training errors of ulti-class learning. A set of extensive experients have been conducted to evaluate the effectiveness and the robustness of the proposed boosting algorith and coding schees. Our epirical studies have shown that the proposed SoothBoost algorith perfors better than the AdaBoost.OC algorith for both clean data and noisy data. Our epirical studies also showed that the proposed probabilistic output code is ore effective and

8 SoothBoost Using Probabilistic Output Codes Table 5. Coparison of a rando code with the probabilistic output code for the corrupted UCI datasets with 20% of the training data incorrectly labeled. Coding Method ecoli wine pendigit iris glass vehicle yeast Rando 9.7 ± ± ± ± ± ± ± 4.3 Prob. 6.1 ± ± ± ± ± ± ± 4.1 Table 6. Coparison of the probabilistic output code with the deterinistic output code for the original UCI datasets. Coding Method ecoli wine pendigit iris glass vehicle yeast Deterinistic 4.9± ± ± ± ± ± ± 5.8 Probabilistic 2.9± ± ± ± ± ± ± 3.5 ore robust for ulti-class learning than a rando code. In the future, we plan to extend this work to ulti-label classification where each data point can be assigned with ultiple labels. References Bauer, E., & Kohavi, R. (1999). An Epirical Coparison of Voting Classification Algoriths: Bagging, Boosting and Variants. Machine Learning, 36, Dietterich, T. G. (2000). An Experiental Coparison of Three Methods for Constructing Ensebles of Decision Trees: Bagging, Boosting, and Randoization. Machine Learning, 40, Dietterich, T. G., & Bakiri, G. (1995). Solving Multiclass Learning Probles via Error-Correcting Output Codes. Journal of Artificial Intelligence Research, 2, Freund, Y., & Schapire, R. E. (1995). A decisiontheoretic generalization of on-line learning and an application to boosting. Proceedings of European Conference on Coputational Learning Theory (pp ). Freund, Y., & Schapire, R. E. (1996). Experients with a New Boosting Algorith. Proceedings of the Thirteenth International Conference on Machine Learning (pp ). Grove, A., & Schuurans, D. (1998). Boosting in the liit: Maxiizing the argin of learned ensebles. Proceedings of the Fifteenth National Conference on Artifical Intelligence (pp ). Jiang, W. (2001). Soe theoretical aspects of boosting in the presence of noisy data. Proceedings of The Eighteenth International Conference on Machine Learning (ICML-2001). Jin, R., Liu, Y., Si, L., Carbonell, J., & Hauptann, A. G. (2003). A New Boosting Algorith Using Input-Dependent Regularizer. Proceedings of Twentieth International Conference on Machine Learning (ICML 03). Lebanon, G., & Lafferty, J. (2001). Boosting and axiu likelihood for exponential odels. Neural Inforation Processing Systes (NIPS). Ratsch, G., Onoda, T., & Muller, K. (1998). An Iproveent of AdaBoost to Avoid Overfitting. Advances in Neutral Inforation Processing Systes (pp ). Ratsch, G., Onoda, T., & Muller, K. (1999). Regularizing AdaBoost. Advances in Neutral Inforation Processing Systes 11. Ratsch, G., Onoda, T., & Muller, K. (2000). Soft Margins for AdaBoost. Machine Learning, 42, Schapire, R. (1999). Theoretical views of boosting and applications. Proceedings of Tenth International Conference on Algorithic Learning Theory (pp ). Schapire, R. E. (1997). Using output codes to boost ulticlass learning probles. Proceedings of 14th International Conference on Machine Learning (pp ).

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/