Regularization and Averaging of the Selective Naïve Bayes classifier

Size: px
Start display at page:

Download "Regularization and Averaging of the Selective Naïve Bayes classifier"

Transcription

1 Regularization and Averaging of the Selective Naïve Bayes classifier Marc Boullé Abstract The Naïve Bayes classifier has proved to be very effective on any real data applications. Its perforances usually benefit fro an accurate estiation of univariate conditional probabilities and fro variable selection. However, although variable selection is a desirable feature, it is prone to overfitting. In this paper, we introduce a new regularization technique to select the ost probable subset of variables and propose a new odel averaging ethod. The weighting schee on the odels reduces to a weighting schee on the variables, and finally results in a Naïve Bayes with "soft variable selection". Extensive experiental results show that the averaged regularized classifier outperfors the initial Selective Naïve Bayes classifier. T I. INTRODUCTION HE Naïve Bayes odeling approach is based on the assuption that the variables are independent within each output label, and siply relies on the estiation of univariate conditional probabilities. The evaluation of the probabilities for nueric variables has already been discussed in the literature [10, 18, 22]. Experients deonstrate that even a siple Equal Width discretization with 10 bins brings superior perforances copared to the assuption using a Gaussian distribution. Using the Bayes optial MODL discretization ethod [4] to estiate the conditional probabilities has proved to be very efficient in detecting irrelevant variables [5]. Siilar iproveents can be achieved in the case of categorical variable, using the Bayes optial value grouping ethod presented in [2] and extended in [3]. The naïve independence assuption can har the perforances when violated. In order to better deal with highly correlated variables, the Selective Naïve Bayes approach [17] uses a greedy forward search to select the variables. The accuracy is evaluated directly on the training set, and the variables are selected as long as they do not degrade the accuracy. One proble with this approach is that it does not prevent the selection of irrelevant variables having no effect on the accuracy, or even of redundant variables having either an insignificant or no effect on the accuracy. In [5], the area under the ROC curve [11] is used as a selection criterion and exhibits a better predictive perforance than the accuracy criterion with fewer variables. Although the Selective Naïve Bayes approach perfors quite well on datasets with a reasonable nuber of variables, it does not scale on very large datasets with hundreds of thousands of instances and thousands of variables, such as in M. Boullé is with France Teleco R&D, 2, avenue Pierre Marzin, Lannion Cedex, France (e-ail: arc.boulle@lfranceteleco.co). arketing applications. The proble coes both fro the search algorith, whose coplexity is quadratic in the nuber of the variables, and fro the selection process which is subject to overfitting. In this paper, we present a new regularization technique to coproise between the nuber of selected variables and the perforance of the classifier. The new criterion is optiized owing to a search heuristic with super-linear algorithic coplexity in the nuber of instances and variables. We also present a new odel averaging ethod, inspired fro the Bayesian Model Averaging approach [15]. We show that averaging the odel turns into averaging the contribution of the variables in the case of the Selective Bayes Classifier. Finally we proceed with extensive experients to evaluate our ethod. The reainder of the paper is organized as follows. Section II presents the regularization technique, section III the odel averaging technique. Section IV browses related work. Section V proceeds with extensive experiental evaluations. Appendix suarizes the ethod and its results on the Perforance Prediction Challenge [14]. II. REGULARIZATION After introducing the ai of regularization, this section forally states the assuptions and notations, applies the Bayesian approach to derive a new evaluation criterion for variable selection, and finally presents the search algorith used to optiize this criterion. A. Introduction The Naïve Bayes classifier is a very robust algorith. It can hardly overfit the data, since no hypothesis space is explored during the learning process. The Selective Naïve Bayes classifier reduces the strong bias of the naïve independence assuption, owing to variable selection. The objective is to search aong all the subsets of variables, in order to find the best possible classifier, copliant with the Naïve Bayes assuption. The size of the searched hypothesis space grows exponentially with the nuber of variables, which ight cause overfitting. Experients show that during the variable selection process, the last added variables raise the "coplexity" of the classifier while having an insignificant ipact on the evaluation criterion (area under the ROC curve for exaple). These slight iproveents during the training step can be detriental to the predictive perforance in test. We propose to tackle this overfitting proble by relying on a Bayesian approach, where the best odel is found by axiizing the probability P(Model/Data) of the odel

2 given the data. Using the Bayes rule and since the probability P(Data) is constant while varying the odel, this is equivalent to axiizing P(Model)P(Data/Model). In the following, we describe how we copute the likelihood of the odels and propose a prior distribution for variable selection. B. Assuptions and Notation Let X(X 1, X 2,, X ) be the vector of the explanatory variables and Y the class variable. Let y 1, y 2,, y J be the J class values of Y. Let N be the nuber of instances, D{D n } the labeled database containing the instances D n (x (n), y (n) ). Let M{M } be the set of all the potential Selective Naïve Bayes odels. Each odel M is described by paraeter values a k, where a k is 1 if variable k is selected in odel M and 0 otherwise. Let P(y j ) be the prior probabilities of the class values, and P(X k /y j ) the conditional probability distributions of the explanatory variables given the class values. We assue that the prior probabilities P(y j ) and the conditional probability distributions P(X k /y j ) are known. The purpose of the ethod is to select the best subset of variables for Naïve Bayes classification. In the experiental section, the P(y j ) are estiated by counting and the P(X k /y j ) are coputed using the contingency tables, resulting fro the preprocessing of the explanatory variables. This preprocessing is perfored using the MODL discretization ethod [4] for the nueric variables and the MODL grouping ethod [3] for the categorical variables. The conditional probabilities are estiated using a -estiate (support+*p)/(coverage+) with J/N and p1/j, in order to avoid zero probabilities. The MODL preprocessing ethods are based on a Bayesian approach. A space of discretization (or grouping) odels is defined, and a prior distribution on this odel space is proposed. This leads to a Bayes optial evaluation criterion of discretization (or grouping) odels. C. Likelihood of Models The Naïve Bayes classifier assigns to each instance the class value having the highest conditional probability P( yj ) P( X yj ) P( yj X). (1) P( X) Using the assuption that the explanatory variables are independent conditionally to the class variable, we get ( j X) P y ( j ) ( k j) P y P X y k 1 P X. (2) For a given odel M, the class conditional probability estiation P (y j /X) turns into ( j ) P y X P y X ( j ) k ( k j ) P y a P X y k 1 P X k 1 j J ( j) k ( k j) P y a P X y ( j ) k ( k j ) P y a P X y j 1 k 1, (3). (4) Equation (4) provides the class conditional probability distribution for each odel M on the basis of the paraeter values a k of the odel. For a given instance D n, the probability of observing the class value y (n) given the explanatory values x (n) and given the odel M is P (y (n) /Xx (n) ). The likelihood of the odel is obtained by coputing the product of these quantities on the whole dataset. The negative log-likelihood of the odel is given by: N ( n) ( n) log ( P( D M) ) log ( P( y X x )). (5) n 1 This quantity turns out to be the su over the dataset of the Inforation Loss Function [21]. This quantity is a popular criterion for the evaluation of probabilistic prediction. It is iniized when the true probabilities are predicted. D. Prior for Variable Selection The paraeters of a variable selection odel M are the Boolean values a k. We propose a hierarchic prior, by first choosing the nuber of selected variables and second choosing the subset of selected variables. For the nuber of variables, we propose to use a unifor prior between 0 and variables, representing (+1) equiprobable alternatives. For the choice of the variables, we assign the sae probability to every subset of variables. The nuber of cobinations C(, ) sees the natural way to copute this prior, but it has the disadvantage of being syetric. Beyond /2 variables, every new variable akes the selection ore probable. Thus, adding irrelevant variables is favored, provided that this has an insignificant ipact on the likelihood of the odel. As we prefer sipler odels, we propose to use the nuber of cobinations with replaceent C(+ -1, ). Taking the negative log of this prior, we get the following code length for the variable selection odels log( P( M) ) log( + 1) + log( C( + 1, ) ).(6) Using this prior, the "inforational cost" of the first selected variables is about log() and about log(2) for the last variables. E. Posterior Distribution of the Models The posterior probability of a odel is evaluated as the product of the prior and the likelihood. This is equivalent to a MDL approach [20], where the code length of the odel plus the data given the odel has to be iniized:

3 ( + ) + ( C( + ) ) N log ( ( n ) ( n P ) y X x ) log 1 log 1, n 1. (7) The first two ters encode the coplexity of the odel and the last one the fit of the data. The coproise is found by iniizing this criterion. We can notice a trend of increasing attention to the predicted probabilities in the evaluation criteria proposed for variable selection: whereas the accuracy criterion focuses only on the ajority class, the area under the ROC curve evaluates the correct ordering of the predicted probabilities, our regularized criterion evaluates the correctness of all the predicted probabilities (not only their rank) and introduces a regularization ter to balance the coplexity of the odels. F. An Efficient Search Heuristic Many heuristics have been used for variable selection. The greedy Forward Selection heuristic evaluates all the variables, starting fro an epty set of variables. The best variable is added to the current selection, and the process is iterated until no new variable iproves the evaluation criterion. This heuristic ay fall in local optia and has a quadratic tie coplexity with respect to the nuber of variables. The Forward Backward Selection heuristic allows to add or drop one variable at each step, in order to avoid local optia. The Fast Forward Selection heuristic evaluates each variable one at a tie, and adds it to the selection as soon as this iproves the criterion. This last heuristic is tie effective, but its results exhibit a large variance caused by the dependence over the order of the variables. We introduce a new search heuristic called Fast Forward Backward Selection (FFWBW), based on a ix of the preceding approaches. It consists in a sequence of Fast Forward Selection and Fast Backward Selection steps. The variables are randoly reordered between each step, and evaluated only once during each Forward or Backward search. This process is iterated as long as two successive (Forward and Backward) search steps bring at least one iproveent of the criterion. Each search step requires O() evaluations. The whole process converges very quickly, so that it still requires O() evaluations in practice. Each evaluation of a Selective Naïve Bayes odel requires O(N) steps, ainly to evaluate all the class conditional probabilities. Using the additivity of these probabilities with respect to the addition or deletion of variables, the total tie coplexity of the FFWBW heuristic can be reduced down to O(N). In order to further reduce both the possibility of local optia and the variance of the results, this FFWBW heuristic is ebedded into a ulti-start (MS) algorith, by repeating the search heuristic starting fro several rando orderings of the variables. The nuber of repetitions is set to log 2 (N), which offers a reasonable coproise between tie coplexity and quality of the optiization. Overall, the tie coplexity of the MS(FFWBW) heuristic is O(Nlog(N)). Algorith MS(FFWBW) Multi-start: repeat log 2 (N) ties o Start with an epty subset of variables o Fast Forward Backward Selection Initialize an epty subset of variables Repeat until no iproveent Randoly reorder the variables Fast Forward Selection Randoly reorder the variables Fast Backward Selection o Update the best subset of variables if iproved Return the best subset of variables III. MODEL AVERAGING Model averaging consists in cobining the prediction of an enseble of classifiers in order to reduce the predictive error. This section introduces a new odel averaging ethod applied to the Selective Naïve Bayes classifier. A. Fro Bayesian Model Averaging to Expectation Most inductive ethods ignore the uncertainty in odel selection and are over-confident about their predictive perforances. The Bayesian Model Averaging (BMA) approach [15] provides a consistent ethod to accounting for odel uncertainty, by weighting the by their posterior probability. For a given variable of interest, this weighting schee is coputed according to the following forula: P( P( M, P( M. (8) This forula can be written, using only the prior probabilities and the likelihood of the odels. P( M, P( M) P( D M) P(. (9) P M P D M Let f(m,p( /M, and f(p( /. Using these notations, the BMA forula can be interpreted as the expectation of function f for the posterior distribution of the odels E f f M, D P M D. (10) We propose to extend the BMA approach in the case where f is not restricted to be a probability function. B. Expectation of the class conditional inforation The Naïve Bayes classifier provides an estiation of the class conditional probabilities. These estiated probabilities are the natural candidates for averaging. For a given odel M defined by the variable selection {a k }, we have (, ) k ( k ) k 1 P( X) PY a P X Y f M D. (11) Let I(M,-log(f(M,) be the class conditional inforation. Whereas the expectation of f relates to a (weighted) arithetic ean of the class conditional

4 probabilities, the expectation of I relates to a (weighted) geoetric ean of these probabilities. This puts ore ephasis on the agnitude of the estiated probabilities. Taking the negative log of (11), we obtain (, ) I M D I Y I X + a I X Y. (12) k k k 1 We are looking for the expectation of this conditional inforation I( M, P( M E( I), (13) P M D E I I Y I X + P( M aki( Xk Y) k 1 P( M ( ak P( M ) + ( k ) k 1 P( M E I I Y I X I X Y Let b k k a P M D ( P M. We have bk [0,1].,(14).(15) The b k coefficients are coputed using (7), on the basis of the prior probabilities and of the likelihood of the odels. Using these coefficients, the expectation of the conditional inforation is k k 1 E I I Y I X + b I X Y. (16) The averaged odel thus provides the following estiation for the class conditional probabilities: PY X ( k ) k 1 P( X) PY P X Y k bk. (17) It is noteworthy that the expectation of the conditional inforation in (16) is siilar to the conditional inforation estiated by each individual odel in (12). The weighting schee on the odels reduces to a weighting schee on the variables. When the MAP odel is selected, the variables have a weight of 1 when selected and 0 otherwise: this is a "hard selection" of the variables. When the above averaging is applied, each variable has a [0, 1] weight, which can be interpreted as a "soft selection". C. Expectation with Copression Coefficients Using the posterior probabilities to weight the odels in the averaging approach presents soe practical disadvantages. When the posterior distribution is sharply peaked around the MAP, averaging is alost the sae as selecting the MAP odel. These peaked posterior distributions are ore and ore likely to happen when the nuber of instances rises, since a few tenths of instances better classified by a odel are sufficient to increase its likelihood by several orders of agnitude. Therefore, the algorithic overhead is not valuable if averaging turns out to be the sae as selecting the MAP. We propose an alternative weighting schee, whose objective is to better account for the set of all odels. Let us first introduce the copression coefficient c(m, of a odel. The Bayesian odel selection approach we use to derive the criterion (7) is equivalent to a MDL odel selection approach where l( M) + l( D M) log ( P( M) ) log( P( D M) ).(18) Let M be the "null" odel, with no variable selected. The null odel estiates the class conditional probabilities by their prior probabilities, ignoring all the explanatory variables. The code length of the null odel can be interpreted as the quantity of inforation necessary to describe the classes, when no explanatory data is used to induce the odel. Applying (4), we have l M + l D M log Nent Y (19) J where ent( Y) P( yj) log ( P( yj) ) j 1. Each odel M can potentially exploit the explanatory data to better "copress" the class conditional inforation. The ratio of the code length of a odel to that of the null odel stands for a relative gain in copression efficiency. We define the copression coefficient c(m, of a odel as follows: l( M) + l( D M) c( M, 1. (20) l( M ) + l( D M ) The copression coefficient is 0 for the null odel, is axial when the true class conditional probabilities are correctly estiated and tends to 1 in case of separable classes. This coefficient can be negative for odels which provide an estiation worse than that of the null odel. In our heuristic attept to better account for all the odels, we replace the posterior probabilities by their related copression coefficient in the weighting schee. Let us focus again on the variable weights b k introduced in our first odel averaging ethod. Dividing the posterior probabilities by those of the null odel, we get P( M ak P( M bk. (21) P( M P M D We introduce new c k coefficients by taking the log of the probability ratios and noralizing by the code length of the null odel. We obtain akc( M, ck. (22) c M, D In the ipleentation, we ignore the "bad" odels and consider the positive copression coefficients only. Mainly, the principle of this new heuristic weighting

5 schee consists in soothing the peaked posterior probability distribution with the log function. D. An Efficient Algorith for Model Averaging We have previously introduced two odel averaging ethods which rely on the expectation of the class conditional inforation (with standard probabilistic weights or with copression-based weights). The calculation of this expectation requires the evaluation of all the variable selection odels, which is not coputationally feasible as soon as the nuber of variables goes beyond about 20. This expectation can heuristically be evaluated by sapling the posterior distribution of the odels and accounting only for the sapled odels in the weighting schee. We propose to reuse the MS(FFWBW) search heuristic to perfor this sapling. This heuristic is effective for finding high probability odels and searching in their neighborhood. The repetition of the search fro several rando starting points (in the ulti-start eta-heuristics) brings diversity and allows to escape local optia. We use the whole set of odels evaluated during the search to estiate the expectation. This sapling strategy is biased in favor of the ost probable odels. This has little ipact, since the ost probable odels contribute the ost to the weights. The overhead in the tie coplexity of the learning algorith is negligible, since the only need is to collect the posterior probability of the odels and to copute the weights in the averaging forula. Concerning the deployent of the averaged odel, the overhead is also negligible, since the initial Naïve Bayes estiation of the class conditional probabilities is just extended with variables weights. IV. RELATED WOR The section briefly reviews three popular alternative enseble ethods: Boosting, Bagging and Bayesian Model Averaging. A. Introduction The predictive perforances of the classifiers can be understood using the bias-variance decoposition [16]. The bias evaluates the fit capacity: classifiers with low bias such as neural networks can approxiate any data distribution, whereas classifiers with strong bias such as the Naïve Bayes classifier cannot fit coplex data. The variance results fro the statistical uncertainty around the training set. The odel averaging ethods ai at reducing the bias or the variance part of predictive error by cobining the predictions of an enseble of classifiers. They ainly differ by the way they saple the distribution of the odels and by their weighting schee. B. Boosting The Boosting averaging ethod [12] exploits a weight distribution of the instances. Initially, all the instances have the sae weight, and a classifier is trained on the initial training set. The instances that are incorrectly classified are given a higher weight, and a new training set is sapled fro the weight-updated dataset. This process is repeated a given nuber of iterations, and the averaged classifier is built using weights based on the predictive perforances of the individual classifiers. The Boosting ethod is theoretically founded to reduce the training bias, but without guarantee against overfitting. C. Bagging The Bagging (Boostrap Aggregating) averaging ethod [7] exploits a set of data saples obtained owing to a bootstrap process (N instances are randoly selected fro the training set, with replaceent). The classifier is trained on each bootstrap dataset, and the averaged classifier gives the sae weight to all the trained odels. The bootstrap process is a way of sapling the odel space around reasonably good and equiprobable odels. It is theoretically founded to reduce the variance part of the predictive error. The rando selection of data saples has been extended to variables in the Rando Forests classifier [8]: each node of classification tree is trained on a new randoly sapled subset of variables. A siilar approach is used in [23], where a sequence of Naïve Bayes classifiers is iteratively trained on randoly selected variables. The probability of selecting each variable is increased or decreased according to the perforance of the previously trained classifier. By default, the nuber of training iterations is equal to the nuber of variables, and each classifier has the sae weight. D. Bayesian Model Averaging The Bayesian Model Averaging (BMA) ethod [15] ais at accounting for the odel uncertainty. Whereas the MAP approach retrieves the ost probable odel given the data, the BMA approach exploits every odel in the odel space, weighted by their posterior probability. This approach relies on the definition of a prior distribution on the odels, on an efficient coputation technique to estiate the odel posterior probabilities and on an effective ethod to saple to posterior distribution. Apart fro these technical difficulties, the BMA approach is an appealing technique, with strong theoretical results concerning the optiality of its long-run perforances [19]. The BMA approach has been applied to the Naïve Bayes classifier in [9]. Apart fro the differences in the weighting schee, their ethod (DC) differs fro ours ainly on the initial assuptions. The DC ethod does not anage the nueric variables and assues ultinoial distributions with Dirichlet priors for the categorical variable. Structure odularity of the Bayesian network is also assued: each selection of a variable is independent fro the others. The DC approach estiates the full data distribution (explanatory and class variables), whereas we focus on the class conditional probabilities. Once the prior hyper-paraeters are fixed, the DC ethod allows to copute an exact odel averaging, whereas we rely on an heuristic to estiate the averaged odel. Copared to the DC ethod, our ethod is

6 not restricted to categorical attributes and does not need any hyper-paraeter. V. EXPERIMENTS This section presents an experiental evaluation of the perforances of the Selective Naïve Bayes inducing ethods described in the previous sections. We first coent on the rationale behind the copression-based odel averaging approach before proceeding with extensive experients. A. The Wavefor exaple 100% 200 odels (0.05%). Five variables (5, 9, 10, 11, 12) aong the 8 MAP variables are always selected, and the other odels exploits a diversity of subsets of variables. The potential benefit of odel averaging is to account for all these odels, with higher weights for the ost probable odels. In the Wavefor exaple, averaging using the posterior probabilities to weight the odels is alost the sae as selecting the MAP odel (which itself is hard to find with a heuristic search). The copression-based odel averaging exploits a flattened distribution of the weights (see figure 1) and thus enables to account for a large nuber of odels. This averaging approach is evaluated in the next section. Copression coefficient 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% % odels Fig. 1. Repartition function of the copression coefficients (noralized posterior probabilities) of half a illion variable selection odels evaluated for the Wavefor dataset, sorted by increasing copression coefficient. For exaple, the 10% odels on the left represent the odels having the lowest copression coefficient. The Wavefor dataset [6] contains 5000 instances and 21 nueric variables, with 3 classes of waves. We use 70% of this dataset to train a Selective Naïve Bayes classifier, using the Bayes regularization introduced in section II. The MODL preprocessing deterines that 2 variables (1 st and 21 th ) are irrelevant. The variable selection proble consists in finding the ost probable subset of variables aong about half a illion (2 19 ) potential subsets. In order to study the posterior distribution of the odels, all these subsets are evaluated. The MAP odel selects 8 variables (5, 6, 9, 10, 11, 12, 13, 17). The posterior distribution is very sharp everywhere, not only around the MAP. Variable 18 is first selected in the 3 rd odel, which is about 40 ties less probable than the MAP odel. Variable 4 is first selected in the 10 th odel, about 4000 ties less probable than the MAP odel. Figure 1 displays the repartition function of the posterior probabilities, using a noralized log scale (copression coefficients). Using this logarithic transforation, the posterior distribution is flattened and can be visualized. The MAP odel is ties ore probable than the iniu a posteriori odel, which is the null one. This huge range of probabilities is projected on a [0%, 62%] range of copression coefficients. A closer look at the posterior distribution shows that ost of the good odels (in the top 50%) contain around 10 variables. Figure 2 displays the selected variables in the top Fig. 2. Index of the selected variables in the 200 ost probable Selective Naïve Bayes odels for the Wavefor dataset. Each line represents a odel, where the variables are in black color when selected. B. Experiental Setup The experients ai at coparing the perforances of odel averaging ethods versus the MAP ethod, the standard Selective Naïve Bayes (SNB) and Naïve Bayes (NB) ethods. All the classifiers except the last one exploit the sae MODL preprocessing. The evaluated ethods are: SNB(CMA): odel averaging using copressionbased weights in the expectation forula, SNB(MA): odel averaging using expectation, SNB(MAP): MAP SNB odel, SNB(AUC): optiization of the area under the ROC curve, SNB(ACC): optiization of the accuracy, NB: NB with MODL preprocessing, NB(EF): NB with 10 bins Equal Frequency discretization and no value grouping. All the SNB classifiers are optiized with the sae MS(FFWBW) search heuristic, except the SNB(ACC), based on the Forward Selection greedy heuristic. The DC ethod [9], siilar to the SNB(MA) approach, was not evaluated since it is restricted to categorical attributes. We evaluate three criteria of increasing coplexity: accuracy (ACC), area under the ROC curve (AUC) and inforational loss function (ILF). The ACC criterion focuses on the ost probable class (which is fair to evaluate

7 class probabilities, but not very discriinating in case of highly unbalanced datasets), the AUC criterion focusses on the ranking of the class probabilities and the ILF criterion on the class probabilities. We noralize the ILF using the copression rate CR1-ILF/Entropy (siilar to (20), without the prior regularization penalty). The entropy has the sae value as the ILF when the prior class probabilities are predicted on every instance. The noralized CR criterion is ainly ranged between 0 (prediction not better than the basic prediction of the class priors) and 1 (prediction of the true class probabilities in case of perfectly separable classes). It can be negative when the predicted probabilities are worse than the basic prior predictions. Nae Instances TABLE I UCI DATASETS Nueric variables Categorical variables Classes Majority Accuracy Abalone Adult Australian Breast Crx Geran Glass Heart Hepatitis HorseColic Hypothyroid Ionosphere Iris LED LED Letter Mushroo PenDigits Pia Satiage Segentation SickEuthyroid Sonar Spa Thyroid TicTacToe Vehicle Wavefor Wine Yeast We conduct the experients on two collections of datasets: 30 datasets fro the repository at University of California at Irvine [1] and 10 datasets fro the NIPS 2003 Feature Selection Challenge [13] and the WCCI 2006 Perforance Prediction Challenge [14]. A suary of soe properties of these datasets is given in table I for the UCI datasets and in table II for the Challenge datasets. We use stratified 10-fold cross validation to evaluate the criteria. A two-tailed Student test at the 5% confidence level is perfored in order to evaluate the significant wins or losses of the SNB(CMA) ethod versus each other ethod. Nae Instances TABLE II CHALLENGE DATASETS Nueric variables Categorical variables Classes Majority Accuracy Arcene % Dexter % Dorothea % Gisette % Madelon % Ada % Gina % Hiva % Nova % Sylva % C. Results We collect and average the three criteria owing to the stratified 10-fold cross validation, for the seven evaluated ethods on the forty datasets. We suarize these results using the ean of each criterion, the nuber of significant wins or losses of the SNB(CMA) ethod and the average rank of each ethod. The results are presented in table III for the UCI datasets and in table IV for the Challenge datasets. TABLE III EVALUATION OF THE METHODS ON THE UCI DATASETS Method ACC AUC CR M W/L R M W/L R M W/L R SNB(CMA) SNB(MA).817 9/ / /6 2.6 SNB(MAP) / / /6 3.6 SNB(AUC).820 8/ / /4 4.4 SNB(ACC).817 5/ / /2 4.5 NB / / /2 5.3 NB(EF) / / /2 5.3 The evaluated criteria are the accuracy (ACC), the area under the ROC curve (AUC) and the copression rate (CR). The results are suarized across the datasets using the ean (M), the nuber of wins and losses for the SNB(CMA) ethod (W/L) and the average rank (R). TABLE IV EVALUATION OF THE METHODS ON THE CHALLENGE DATASETS Method ACC AUC CR M W/L R M W/L R M W/L R SNB(CMA) SNB(MA).872 3/ / /0 2.3 SNB(MAP).865 4/ / /0 3.7 SNB(AUC).872 6/ / /0 4.1 SNB(ACC).875 3/ / /0 4.3 NB.841 7/ / /0 5.9 NB(EF).823 9/ / /0 6.7 The evaluated criteria are the accuracy (ACC), the area under the ROC curve (AUC) and the copression rate (CR). The results are suarized across the datasets using the ean (M), the nuber of wins and losses for the SNB(CMA) ethod (W/L) and the average rank (R). The results of the two NB ethods are reported ainly as a sanity check. The MODL preprocessing in the NB classifier exhibits better perforances than the Equal Frequency discretization ethod in the NB(EF) classifier. All the SNB ethods exploit the sae MODL

8 preprocessing, allowing a fair coparison. The experients confir the benefit of selecting the variables, using the standard selection ethods SNB(ACC) and SNB(AUC). These two ethods achieve coparable results, with an ephasis on their respective optiized criterion. They significantly iprove the result of the NB ethods, especially for the estiation of class conditional probabilities. The three regularized ethods SNB(MAP), SNB(MA) and SNB(CMA) focus on the estiation of the class conditional probabilities, which are evaluated using the copression rate criterion. They clearly outperfor the other ethods on this criterion. However, the SNB(MAP) ethod is not better than the two standard SNB ethods for the accuracy and AUC criteria. The MAP ethod increases the bias of the odels by penalizing the coplex odels, leading to a decayed fit of the data. The odel averaging approach exploited in SNB(MA) ethod offers only slight enhanceents copared to the SNB(MAP) ethod. This confirs the analysis drawn fro the Wavefor case study. The copression-based averaging ethod SNB(CMA) strongly doinates all the other ethods on all the criteria. On average, the nuber of significant wins is about 10 ties the nuber of significant losses, and aounts to ore than half of the 40 datasets. On the 10 challenge datasets, having very large nubers of variables, the SNB(CMA) ethods always gets the best results on the AUC and CR criteria, and alost always on the accuracy criterion. The doination of the SNB(CMA) increases with the coplexity of the criteria: it is noteworthy for accuracy (ACC), iportant for the ordering of the class conditional probabilities (AUC) and very large for the prediction of the class conditional probabilities (CR). This shows that the regularized and averaged Naïve Bayes becoes effective for conditional probability estiation, whereas the standard Naïve Bayes is usually considered to be poor at estiating these probabilities. VI. CONCLUSION The Naïve Bayes classifier is a popular ethod that is often highly effective on real datasets and is copetitive with or even soeties outperfors uch ore sophisticated classifiers. This paper confirs the potential benefit of variable selection to obtain still better perforances. We have proposed a new regularization ethod, founded on a Bayesian approach, to select the best subset of variables. We have introduced a new odel averaging ethod as well, using copression-based weights. Extensive experients on any datasets deonstrate the effectiveness of the new ethod, which considerably enhances the predictive perforances of the raw Naïve Bayes classifier, especially for the estiation of the class conditional probabilities. APPENDIX This appendix suarizes the ethod and its results on the Perforance Prediction Challenge [14]. Title: Regularized and Averaged Selective Naïve Bayes Classifier Nae, address, eail: Marc Boullé, France Teleco R&D, 2, avenue Pierre Marzin, Lannion cedex France arc.boulle@franceteleco.co Acrony of y best entry: SNB(CMA) + 10k F(3 tv References: M. Boullé, "Regularization and Averaging of the Selective Naïve Bayes classifier", International Joint Conference on Neural Networks, M. Boullé, "MODL: a Bayes Optial Discretization Method for Continuous Attributes", Machine Learning, to be published. Method: Our ethod is based on the Naive Bayes assuption. All the input features are preprocessed using the Bayes optial MODL discretization ethod. We use a Bayesian approach to coproise between the nuber of selected features and the perforance of the Selective Naïve Bayes classifier: this provides a regularized feature selection criterion. The feature selection search is perfored using alternate forward selection and backward eliination searches on randoly ordered feature sets: this provides a fast search heuristic, with super-linear tie coplexity with respect to the nuber of instances and features. Finally, our ethod introduces a variant of feature selection: feature "soft" selection. Whereas feature "hard" selection gives a "Boolean" weight to the features according to whether they selected or not, our ethod gives a continuous weight between 0 and 1 to each feature. This weighing schea of the features coes fro a new classifier averaging ethod, derived fro Bayesian Model Averaging. The ethod coputes the posterior probabilities of the classes, which is convenient when the classical accuracy criterion or the area under the ROC curve is evaluated. For the challenge, the Balanced Error Rate (BER) criterion is the ain criterion. In order to iprove the BER criterion, we adjusted the decision threshold in a post-optiization step. We still predict the class having the highest posterior probability, but we artificially adjust the class prior probabilities in order to optiize the BER criterion on the train dataset. For the challenge, several trials of feature construction have been perfored in order to evaluate the coputational and statistical scalability of the ethod, and to naively attept to escape the naïve Bayes assuption: 10k F(2: features constructed for each dataset, each one is the su of two randoly selected initial features, 100k F(2: features constructed (sus of two features),

9 10k F(3: features constructed (sus of three features). The perforance prediction guess is coputed using a stratified tenfold cross-validation. Results: In the challenge, we rank 7 th as a group and our best entry is 26 th, according to the average rank coputed by the organizers. On 2 of the 5 five datasets (ADA and SYLVA), our best entry ranks 1 st, as shown in table V. Our ethod is highly scalable and resistant to noisy or redundant features: it is able to quickly process about constructed features without decreasing the predictive perforance. Its ain liitation coes fro the Naïve Bayes assuption. However, when the constructed features allow to "partially" break the naïve Bayes assuption, the ethod succeeds in significantly iprove its perforances. This is the case for exaple for the GINA dataset, which does not fit well the naïve Bayes assuption: adding randoly constructed features allows to iprove the BER fro 12.83% down to 7.30%. The AUC criterion, which evaluates the ranking of the class posterior probabilities, indicates high perforances for our ethod. Code: Our ipleentation was done in C++. eywords: Discretization, Bayesianis, Naïve Bayes, Wrapper, Regularization, Model Averaging REFERENCES [1] C.L Blake, C.J Merz, UCI Repository of achine learning databases, UC. Irvine, Departent of Inforation and Coputer Science, [2] M. Boullé, "A Bayes Optial Approach for Partitioning the Values of Categorical Attributes". Journal of Machine Learning Research 6, 2005a, pp [3] M. Boullé, " A Grouping Method for Categorical Attributes Having Very Large Nuber of Values", P. Perner and A. Iiya (Eds.), Machine Learning and Data Mining in Pattern Recognition, Springer Verlag, lnai 3587, 2005b, pp [4] M. Boullé, "MODL: a Bayes Optial Discretization Method for Continuous Attributes", Machine Learning, to be published. [5] M. Boullé, "An Enhanced Selective Naïve Bayes Method with Optial Discretization", Feature extraction, foundations and Applications, II(25), to be published. [6] L. Breian, J.H. Friedan, R.A. Olshen and C.J. Stone, Classification and Regression Trees. California: Wadsworth International, [7] L. Breian, "Bagging predictors", Machine Learning, 24(2), 1996, pp [8] L. Breian, "Rando forests", Machine Learning, 45(1), 2001, pp [9] D. Dash and G.F. Cooper, "Exact odel averaging with naïve Bayesian classifiers", Proceeding of the Nineteenth International Conference on Machine Learning, 2002, pp [10] J. Dougherty, R. ohavi, M. Sahai, "Supervised and Unsupervised Discretization of Continuous Features". Proceedings of the 12th International Conference on Machine Learning. San Francisco, CA: Morgan aufann., 1995, pp [11] T. Fawcett, "ROC Graphs: Notes and Practical Considerations for Researchers", Technical Report HPL , HP Laboratories, [12] Y. Freund and R.E. Shapire, "Experients with a New Boosting Algorith", Proceeding of the Thirteenth International Conference on Machine Learning, 1996, pp [13] I. Guyon, "Design of experients of the NIPS 2003 variable selection benchark", [14] I. Guyon, "Model Selection Workshop and Perforance Prediction Challenge", IEEE World Congress on Coputational Intelligence, [15] J.A. Hoeting, D. Madigan, A.E. Raftery and C.T. Volinsky, "Bayesian odel averaging: A tutorial", Statistical Science, 14(4), 1999, [16] R. ohavi and D.H. Wolpert, "Bias Plus Variance Decoposition for Zero-One Loss Function", Proceeding of the Thirteenth International Conference on Machine Learning, 1996, pp [17] P. Langley and S. Sage, "Induction of Selective Bayesian Classifiers", Proceedings of the 10th Conference on Uncertainty in Artificial Intelligence. Morgan aufann, 1994, pp [18] H. Liu, F. Hussain, C.L. Tan, M. Dash., "Discretization: An Enabling Technique", Data Mining and nowledge Discovery, 6(4), 2002, pp [19] A.E. Raftery and Y. Zheng "Long-Run Perforance of Bayesian Model Averaging", Technical Report no. 433, Departent of Statistics, University of Washington., [20] J. Rissanen, "Modeling by shortest data description", Autoatica, 14, 1978, pp [21] I.H. Witten, and E. Frank, Data Mining, Morgan aufann, 2000 [22] Y. Yang and G.I. Webb, "A coparative study of discretization ethods for naïve-bayes classifiers", Proceeding of the Pacific Ri nowledge Acquisition Workshop, [23] Z. Zheng, "Naïve Bayesian Classifier Coittees", Proceedings of the ECML'98, Berlin: Springer Verlag, 1998, pp TABLE V RESULTS OF OUR BEST ENTRY ON THE PERFORMANCE PREDICTION CHALLENGE DATASETS Dataset Our best entry The challenge best entry Test Test Ber Guess Test Test Test Ber Guess AUC BER Guess Error Score AUC BER Guess Error ADA (1) GINA HIVA NOVA SYLVA (1) Overall (26.4) (6.2) Test score

Compression-Based Averaging of Selective Naive Bayes Classifiers

Compression-Based Averaging of Selective Naive Bayes Classifiers Journal of Machine Learning Research 8 (2007) 1659-1685 Submitted 1/07; Published 7/07 Compression-Based Averaging of Selective Naive Bayes Classifiers Marc Boullé France Telecom R&D 2, avenue Pierre Marzin

More information

Ensemble Based on Data Envelopment Analysis

Ensemble Based on Data Envelopment Analysis Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon Lecture 2: Enseble Methods Isabelle Guyon guyoni@inf.ethz.ch Introduction Book Chapter 7 Weighted Majority Mixture of Experts/Coittee Assue K experts f, f 2, f K (base learners) x f (x) Each expert akes

More information

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

A Smoothed Boosting Algorithm Using Probabilistic Output Codes A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu

More information

Bayes Decision Rule and Naïve Bayes Classifier

Bayes Decision Rule and Naïve Bayes Classifier Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t. CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

E. Alpaydın AERFAISS

E. Alpaydın AERFAISS E. Alpaydın AERFAISS 00 Introduction Questions: Is the error rate of y classifier less than %? Is k-nn ore accurate than MLP? Does having PCA before iprove accuracy? Which kernel leads to highest accuracy

More information

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA)

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA) Bayesian Learning Chapter 6: Bayesian Learning CS 536: Machine Learning Littan (Wu, TA) [Read Ch. 6, except 6.3] [Suggested exercises: 6.1, 6.2, 6.6] Bayes Theore MAP, ML hypotheses MAP learners Miniu

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION A eshsize boosting algorith in kernel density estiation A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION C.C. Ishiekwene, S.M. Ogbonwan and J.E. Osewenkhae Departent of Matheatics, University

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space Journal of Machine Learning Research 3 (2003) 1333-1356 Subitted 5/02; Published 3/03 Grafting: Fast, Increental Feature Selection by Gradient Descent in Function Space Sion Perkins Space and Reote Sensing

More information

An improved self-adaptive harmony search algorithm for joint replenishment problems

An improved self-adaptive harmony search algorithm for joint replenishment problems An iproved self-adaptive harony search algorith for joint replenishent probles Lin Wang School of Manageent, Huazhong University of Science & Technology zhoulearner@gail.co Xiaojian Zhou School of Manageent,

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Pattern Classification using Simplified Neural Networks with Pruning Algorithm

Pattern Classification using Simplified Neural Networks with Pruning Algorithm Pattern Classification using Siplified Neural Networks with Pruning Algorith S. M. Karuzzaan 1 Ahed Ryadh Hasan 2 Abstract: In recent years, any neural network odels have been proposed for pattern classification,

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

Analyzing Simulation Results

Analyzing Simulation Results Analyzing Siulation Results Dr. John Mellor-Cruey Departent of Coputer Science Rice University johnc@cs.rice.edu COMP 528 Lecture 20 31 March 2005 Topics for Today Model verification Model validation Transient

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5,

Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5, Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5, 2015 31 11 Motif Finding Sources for this section: Rouchka, 1997, A Brief Overview of Gibbs Sapling. J. Buhler, M. Topa:

More information

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical IEEE TRANSACTIONS ON INFORMATION THEORY Large Alphabet Source Coding using Independent Coponent Analysis Aichai Painsky, Meber, IEEE, Saharon Rosset and Meir Feder, Fellow, IEEE arxiv:67.7v [cs.it] Jul

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

Ch 12: Variations on Backpropagation

Ch 12: Variations on Backpropagation Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith

More information

A Note on the Applied Use of MDL Approximations

A Note on the Applied Use of MDL Approximations A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention

More information

Tracking using CONDENSATION: Conditional Density Propagation

Tracking using CONDENSATION: Conditional Density Propagation Tracking using CONDENSATION: Conditional Density Propagation Goal Model-based visual tracking in dense clutter at near video frae rates M. Isard and A. Blake, CONDENSATION Conditional density propagation

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials Fast Montgoery-like Square Root Coputation over GF( ) for All Trinoials Yin Li a, Yu Zhang a, a Departent of Coputer Science and Technology, Xinyang Noral University, Henan, P.R.China Abstract This letter

More information

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis City University of New York (CUNY) CUNY Acadeic Works International Conference on Hydroinforatics 8-1-2014 Experiental Design For Model Discriination And Precise Paraeter Estiation In WDS Analysis Giovanna

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

Gene Selection for Colon Cancer Classification using Bayesian Model Averaging of Linear and Quadratic Discriminants

Gene Selection for Colon Cancer Classification using Bayesian Model Averaging of Linear and Quadratic Discriminants Gene Selection for Colon Cancer Classification using Bayesian Model Averaging of Linear and Quadratic Discriinants Oyebayo Ridwan Olaniran*, Mohd Asrul Affendi Abdullah Departent of Matheatics and Statistics,

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

ZISC Neural Network Base Indicator for Classification Complexity Estimation

ZISC Neural Network Base Indicator for Classification Complexity Estimation ZISC Neural Network Base Indicator for Classification Coplexity Estiation Ivan Budnyk, Abdennasser Сhebira and Kurosh Madani Iages, Signals and Intelligent Systes Laboratory (LISSI / EA 3956) PARIS XII

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information Cite as: Straub D. (2014). Value of inforation analysis with structural reliability ethods. Structural Safety, 49: 75-86. Value of Inforation Analysis with Structural Reliability Methods Daniel Straub

More information

Support Vector Machines. Goals for the lecture

Support Vector Machines. Goals for the lecture Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016/2017 Lessons 9 11 Jan 2017 Outline Artificial Neural networks Notation...2 Convolutional Neural Networks...3

More information

Department of Electronic and Optical Engineering, Ordnance Engineering College, Shijiazhuang, , China

Department of Electronic and Optical Engineering, Ordnance Engineering College, Shijiazhuang, , China 6th International Conference on Machinery, Materials, Environent, Biotechnology and Coputer (MMEBC 06) Solving Multi-Sensor Multi-Target Assignent Proble Based on Copositive Cobat Efficiency and QPSO Algorith

More information

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words)

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words) 1 A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine (1900 words) Contact: Jerry Farlow Dept of Matheatics Univeristy of Maine Orono, ME 04469 Tel (07) 866-3540 Eail: farlow@ath.uaine.edu

More information

INTELLECTUAL DATA ANALYSIS IN AIRCRAFT DESIGN

INTELLECTUAL DATA ANALYSIS IN AIRCRAFT DESIGN INTELLECTUAL DATA ANALYSIS IN AIRCRAFT DESIGN V.A. Koarov 1, S.A. Piyavskiy 2 1 Saara National Research University, Saara, Russia 2 Saara State Architectural University, Saara, Russia Abstract. This article

More information

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels Extension of CSRSM for the Paraetric Study of the Face Stability of Pressurized Tunnels Guilhe Mollon 1, Daniel Dias 2, and Abdul-Haid Soubra 3, M.ASCE 1 LGCIE, INSA Lyon, Université de Lyon, Doaine scientifique

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

Optimal Resource Allocation in Multicast Device-to-Device Communications Underlaying LTE Networks

Optimal Resource Allocation in Multicast Device-to-Device Communications Underlaying LTE Networks 1 Optial Resource Allocation in Multicast Device-to-Device Counications Underlaying LTE Networks Hadi Meshgi 1, Dongei Zhao 1 and Rong Zheng 2 1 Departent of Electrical and Coputer Engineering, McMaster

More information

On Constant Power Water-filling

On Constant Power Water-filling On Constant Power Water-filling Wei Yu and John M. Cioffi Electrical Engineering Departent Stanford University, Stanford, CA94305, U.S.A. eails: {weiyu,cioffi}@stanford.edu Abstract This paper derives

More information

MULTIAGENT Resource Allocation (MARA) is the

MULTIAGENT Resource Allocation (MARA) is the EDIC RESEARCH PROPOSAL 1 Designing Negotiation Protocols for Utility Maxiization in Multiagent Resource Allocation Tri Kurniawan Wijaya LSIR, I&C, EPFL Abstract Resource allocation is one of the ain concerns

More information

Correlated Bayesian Model Fusion: Efficient Performance Modeling of Large-Scale Tunable Analog/RF Integrated Circuits

Correlated Bayesian Model Fusion: Efficient Performance Modeling of Large-Scale Tunable Analog/RF Integrated Circuits Correlated Bayesian odel Fusion: Efficient Perforance odeling of Large-Scale unable Analog/RF Integrated Circuits Fa Wang and Xin Li ECE Departent, Carnegie ellon University, Pittsburgh, PA 53 {fwang,

More information

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS ISSN 1440-771X AUSTRALIA DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS An Iproved Method for Bandwidth Selection When Estiating ROC Curves Peter G Hall and Rob J Hyndan Working Paper 11/00 An iproved

More information

When Short Runs Beat Long Runs

When Short Runs Beat Long Runs When Short Runs Beat Long Runs Sean Luke George Mason University http://www.cs.gu.edu/ sean/ Abstract What will yield the best results: doing one run n generations long or doing runs n/ generations long

More information

Support Vector Machines. Maximizing the Margin

Support Vector Machines. Maximizing the Margin Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the

More information

Research in Area of Longevity of Sylphon Scraies

Research in Area of Longevity of Sylphon Scraies IOP Conference Series: Earth and Environental Science PAPER OPEN ACCESS Research in Area of Longevity of Sylphon Scraies To cite this article: Natalia Y Golovina and Svetlana Y Krivosheeva 2018 IOP Conf.

More information

Upper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition

Upper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition Upper bound on false alar rate for landine detection and classification using syntactic pattern recognition Ahed O. Nasif, Brian L. Mark, Kenneth J. Hintz, and Nathalia Peixoto Dept. of Electrical and

More information

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution Testing approxiate norality of an estiator using the estiated MSE and bias with an application to the shape paraeter of the generalized Pareto distribution J. Martin van Zyl Abstract In this work the norality

More information

Compression and Predictive Distributions for Large Alphabet i.i.d and Markov models

Compression and Predictive Distributions for Large Alphabet i.i.d and Markov models 2014 IEEE International Syposiu on Inforation Theory Copression and Predictive Distributions for Large Alphabet i.i.d and Markov odels Xiao Yang Departent of Statistics Yale University New Haven, CT, 06511

More information

Understanding Machine Learning Solution Manual

Understanding Machine Learning Solution Manual Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y

More information

Randomized Recovery for Boolean Compressed Sensing

Randomized Recovery for Boolean Compressed Sensing Randoized Recovery for Boolean Copressed Sensing Mitra Fatei and Martin Vetterli Laboratory of Audiovisual Counication École Polytechnique Fédéral de Lausanne (EPFL) Eail: {itra.fatei, artin.vetterli}@epfl.ch

More information

Warning System of Dangerous Chemical Gas in Factory Based on Wireless Sensor Network

Warning System of Dangerous Chemical Gas in Factory Based on Wireless Sensor Network 565 A publication of CHEMICAL ENGINEERING TRANSACTIONS VOL. 59, 07 Guest Editors: Zhuo Yang, Junie Ba, Jing Pan Copyright 07, AIDIC Servizi S.r.l. ISBN 978-88-95608-49-5; ISSN 83-96 The Italian Association

More information

Topic 5a Introduction to Curve Fitting & Linear Regression

Topic 5a Introduction to Curve Fitting & Linear Regression /7/08 Course Instructor Dr. Rayond C. Rup Oice: A 337 Phone: (95) 747 6958 E ail: rcrup@utep.edu opic 5a Introduction to Curve Fitting & Linear Regression EE 4386/530 Coputational ethods in EE Outline

More information

OBJECTIVES INTRODUCTION

OBJECTIVES INTRODUCTION M7 Chapter 3 Section 1 OBJECTIVES Suarize data using easures of central tendency, such as the ean, edian, ode, and idrange. Describe data using the easures of variation, such as the range, variance, and

More information

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE Proceeding of the ASME 9 International Manufacturing Science and Engineering Conference MSEC9 October 4-7, 9, West Lafayette, Indiana, USA MSEC9-8466 MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL

More information

Randomized Accuracy-Aware Program Transformations For Efficient Approximate Computations

Randomized Accuracy-Aware Program Transformations For Efficient Approximate Computations Randoized Accuracy-Aware Progra Transforations For Efficient Approxiate Coputations Zeyuan Allen Zhu Sasa Misailovic Jonathan A. Kelner Martin Rinard MIT CSAIL zeyuan@csail.it.edu isailo@it.edu kelner@it.edu

More information

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES Vol. 57, No. 3, 2009 Algoriths for parallel processor scheduling with distinct due windows and unit-tie obs A. JANIAK 1, W.A. JANIAK 2, and

More information

Soft-margin SVM can address linearly separable problems with outliers

Soft-margin SVM can address linearly separable problems with outliers Non-linear Support Vector Machines Non-linearly separable probles Hard-argin SVM can address linearly separable probles Soft-argin SVM can address linearly separable probles with outliers Non-linearly

More information

A New Approach to Solving Dynamic Traveling Salesman Problems

A New Approach to Solving Dynamic Traveling Salesman Problems A New Approach to Solving Dynaic Traveling Salesan Probles Changhe Li 1 Ming Yang 1 Lishan Kang 1 1 China University of Geosciences(Wuhan) School of Coputer 4374 Wuhan,P.R.China lchwfx@yahoo.co,yanging72@gail.co,ang_whu@yahoo.co

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

Low complexity bit parallel multiplier for GF(2 m ) generated by equally-spaced trinomials

Low complexity bit parallel multiplier for GF(2 m ) generated by equally-spaced trinomials Inforation Processing Letters 107 008 11 15 www.elsevier.co/locate/ipl Low coplexity bit parallel ultiplier for GF generated by equally-spaced trinoials Haibin Shen a,, Yier Jin a,b a Institute of VLSI

More information

Kernel-Based Nonparametric Anomaly Detection

Kernel-Based Nonparametric Anomaly Detection Kernel-Based Nonparaetric Anoaly Detection Shaofeng Zou Dept of EECS Syracuse University Eail: szou@syr.edu Yingbin Liang Dept of EECS Syracuse University Eail: yliang6@syr.edu H. Vincent Poor Dept of

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification

More information

Probabilistic Machine Learning

Probabilistic Machine Learning Probabilistic Machine Learning by Prof. Seungchul Lee isystes Design Lab http://isystes.unist.ac.kr/ UNIST Table of Contents I.. Probabilistic Linear Regression I... Maxiu Likelihood Solution II... Maxiu-a-Posteriori

More information

EMPIRICAL COMPLEXITY ANALYSIS OF A MILP-APPROACH FOR OPTIMIZATION OF HYBRID SYSTEMS

EMPIRICAL COMPLEXITY ANALYSIS OF A MILP-APPROACH FOR OPTIMIZATION OF HYBRID SYSTEMS EMPIRICAL COMPLEXITY ANALYSIS OF A MILP-APPROACH FOR OPTIMIZATION OF HYBRID SYSTEMS Jochen Till, Sebastian Engell, Sebastian Panek, and Olaf Stursberg Process Control Lab (CT-AST), University of Dortund,

More information

PAC-Bayesian Learning of Linear Classifiers

PAC-Bayesian Learning of Linear Classifiers Pascal Gerain Pascal.Gerain.@ulaval.ca Alexandre Lacasse Alexandre.Lacasse@ift.ulaval.ca François Laviolette Francois.Laviolette@ift.ulaval.ca Mario Marchand Mario.Marchand@ift.ulaval.ca Départeent d inforatique

More information

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS A Thesis Presented to The Faculty of the Departent of Matheatics San Jose State University In Partial Fulfillent of the Requireents

More information

C na (1) a=l. c = CO + Clm + CZ TWO-STAGE SAMPLE DESIGN WITH SMALL CLUSTERS. 1. Introduction

C na (1) a=l. c = CO + Clm + CZ TWO-STAGE SAMPLE DESIGN WITH SMALL CLUSTERS. 1. Introduction TWO-STGE SMPLE DESIGN WITH SMLL CLUSTERS Robert G. Clark and David G. Steel School of Matheatics and pplied Statistics, University of Wollongong, NSW 5 ustralia. (robert.clark@abs.gov.au) Key Words: saple

More information

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair Proceedings of the 6th SEAS International Conference on Siulation, Modelling and Optiization, Lisbon, Portugal, Septeber -4, 006 0 A Siplified Analytical Approach for Efficiency Evaluation of the eaving

More information

Qualitative Modelling of Time Series Using Self-Organizing Maps: Application to Animal Science

Qualitative Modelling of Time Series Using Self-Organizing Maps: Application to Animal Science Proceedings of the 6th WSEAS International Conference on Applied Coputer Science, Tenerife, Canary Islands, Spain, Deceber 16-18, 2006 183 Qualitative Modelling of Tie Series Using Self-Organizing Maps:

More information

Domain-Adversarial Neural Networks

Domain-Adversarial Neural Networks Doain-Adversarial Neural Networks Hana Ajakan, Pascal Gerain 2, Hugo Larochelle 3, François Laviolette 2, Mario Marchand 2,2 Départeent d inforatique et de génie logiciel, Université Laval, Québec, Canada

More information

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup)

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup) Recovering Data fro Underdeterined Quadratic Measureents (CS 229a Project: Final Writeup) Mahdi Soltanolkotabi Deceber 16, 2011 1 Introduction Data that arises fro engineering applications often contains

More information

Figure 1: Equivalent electric (RC) circuit of a neurons membrane

Figure 1: Equivalent electric (RC) circuit of a neurons membrane Exercise: Leaky integrate and fire odel of neural spike generation This exercise investigates a siplified odel of how neurons spike in response to current inputs, one of the ost fundaental properties of

More information

3.3 Variational Characterization of Singular Values

3.3 Variational Characterization of Singular Values 3.3. Variational Characterization of Singular Values 61 3.3 Variational Characterization of Singular Values Since the singular values are square roots of the eigenvalues of the Heritian atrices A A and

More information

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Soft Coputing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Beverly Rivera 1,2, Irbis Gallegos 1, and Vladik Kreinovich 2 1 Regional Cyber and Energy Security Center RCES

More information

arxiv: v1 [cs.lg] 8 Jan 2019

arxiv: v1 [cs.lg] 8 Jan 2019 Data Masking with Privacy Guarantees Anh T. Pha Oregon State University phatheanhbka@gail.co Shalini Ghosh Sasung Research shalini.ghosh@gail.co Vinod Yegneswaran SRI international vinod@csl.sri.co arxiv:90.085v

More information

An Introduction to Meta-Analysis

An Introduction to Meta-Analysis An Introduction to Meta-Analysis Douglas G. Bonett University of California, Santa Cruz How to cite this work: Bonett, D.G. (2016) An Introduction to Meta-analysis. Retrieved fro http://people.ucsc.edu/~dgbonett/eta.htl

More information

Estimation of the Mean of the Exponential Distribution Using Maximum Ranked Set Sampling with Unequal Samples

Estimation of the Mean of the Exponential Distribution Using Maximum Ranked Set Sampling with Unequal Samples Open Journal of Statistics, 4, 4, 64-649 Published Online Septeber 4 in SciRes http//wwwscirporg/ournal/os http//ddoiorg/436/os4486 Estiation of the Mean of the Eponential Distribution Using Maiu Ranked

More information

LONG-TERM PREDICTIVE VALUE INTERVAL WITH THE FUZZY TIME SERIES

LONG-TERM PREDICTIVE VALUE INTERVAL WITH THE FUZZY TIME SERIES Journal of Marine Science and Technology, Vol 19, No 5, pp 509-513 (2011) 509 LONG-TERM PREDICTIVE VALUE INTERVAL WITH THE FUZZY TIME SERIES Ming-Tao Chou* Key words: fuzzy tie series, fuzzy forecasting,

More information

Machine Learning: Fisher s Linear Discriminant. Lecture 05

Machine Learning: Fisher s Linear Discriminant. Lecture 05 Machine Learning: Fisher s Linear Discriinant Lecture 05 Razvan C. Bunescu chool of Electrical Engineering and Coputer cience bunescu@ohio.edu Lecture 05 upervised Learning ask learn an (unkon) function

More information