Additive Preference Model with Piecewise Linear Components Resulting from Dominance-based Rough Set Approximations
|
|
- Katrina Norris
- 5 years ago
- Views:
Transcription
1 Additive Preference Model with Piecewise Linear Components Resulting from Dominance-based Rough Set Approximations Krzysztof Dembczyński, Wojciech Kotłowski, and Roman Słowiński,2 Institute of Computing Science, Poznań University of Technology, Poznań, Poland {kdembczynski, wkotlowski, 2 Institute for Systems Research, Polish Academy of Sciences, -447 Warsaw, Poland Abstract. Dominance-based Rough Set Approach (DRSA) has been proposed for multi-criteria classification problems in order to handle inconsistencies in the input information with respect to the dominance principle. The end result of DRSA is a decision rule model of Decision Maker preferences. In this paper, we consider an additive function model resulting from dominance-based rough approximations. The presented approach is similar to UTA and UTADIS methods. However, we define a goal function of the optimization problem in a similar way as it is done in Support Vector Machines (SVM). The problem may also be defined as the one of searching for linear value functions in a transformed feature space obtained by exhaustive binarization of criteria. Introduction The rough set approach has often proved to be an interesting tool for solving a classification problem that consists in an assignment of objects from set A, described by condition attributes, to pre-defined decision classes Cl t, where t T and T is a finite set of numerically coded labels. In order to solve the problem (i.e., to classify all objects from A), a decision rule model is induced from a set of reference (training) objects U A. The rough set analysis starts with computing lower and upper rough approximations of decision classes. The lower approximation of a decision class contains objects (from U) certainly belonging to the decision class without any inconsistency. The upper approximation of a decision class contains objects possibly belonging to the decision class that may cause inconsistencies. In the simplest case, the inconsistency is defined as a situation where two objects described by the same values of condition attributes (it is said that these objects are indiscernible) are assigned to different classes. In the next step, decision rules are induced from lower and upper rough approximations. These rules represent, respectively, certain and possible patterns explaining relationships between conditions and decisions. The model in the form of decision rules permits to classify all objects from A. Greco. Matarazzo and Słowiński [5, 6, 3] have introduced a rough set approach (called Dominance-based Rough Set Approach DRSA) for solving
2 the problem of multi-criteria classification. In this problem, it is additionally assumed that the domains of attributes (scales) are preference-ordered. The decision classes are also preference-ordered according to an increasing order of class labels, i.e. for all r, s T, such that r > s, the objects from Cl r are strictly preferred to the objects from Cl s. The condition attributes are often referred to as condition criteria. DRSA extends the classical approach by substituting the indiscernibility relation by a dominance relation, which permits taking into account the preference order. The inconsistency is defined in view of a dominance principle that requires that any object x, having not worse evaluations than any object y on the considered set of criteria, cannot be assigned to a worse class than y. Moreover, unlike in the classical rough set approach, there is no need in DRSA to make discretization of numerical attributes. The preference model is a necessary component of a decision support system for multi-criteria classification. Construction of preference model requires some preference information from the Decision Maker (DM). Classically, these are substitution rates among criteria, importance weights, comparisons of lotteries, etc.. Acquisition of this preference information from the DM is not easy. In this situation, the preference model induced from decision examples provided by the DM has clear advantages over the classical approaches. DRSA, but also UTA [8] and UTADIS [9, 5], follows the paradigm of inductive learning (in Multi- Criteria Decision Analysis referred to as a preference-disaggregation approach). It is very often underlined by Słowiński, Greco and Matarazzo (see, for example [3]) that a decision rule model has another advantage over other models, i.e. it is intelligible and speaks the language of the DM. However, in the case of many numerical criteria and decision classes, the set of decision rules may be huge and may loose its intelligibility. In such situations, an additive model composed of marginal value (utility) functions, like in UTA and UTADIS, may be helpful. The marginal value functions are usually presented graphically to the DM in order to support her/his intuition. In the following, we present an extension of DRSA, where after computing rough approximations, additive value functions are constructed instead of a set of decision rules. The additive value function is composed of piecewise linear marginal value functions. Its construction is proceeded by solving a problem of mathematical programming similar to that formulated in UTA and UTADIS. The main difference is that we define a goal function of the optimization problem in a similar way as it is done in Support Vector Machines (SVM) [4]. However, the obtained additive value functions, for lower and upper rough approximations of decision classes, may not cover accurately all objects belonging to corresponding rough approximations. It is caused by a limited capacity of an additive model based on piecewise linear functions to represent preferences as proved in [7, 2]. The problem may be also defined as the one of searching linear value functions in a transformed feature space obtained by exhaustive binarization of criteria. The paper is organized as follows. In Section 2, DRSA involving piecewise linear marginal value functions is presented. Section 3 contains first experimental results of the methodology. The last section concludes the paper.
3 Table. Decision table: q and q 2 indicate criteria, d class label. The last two columns present range of generalized decision function; objects x 2 and x 3 are inconsistent. U q q 2 d(x) l(x) u(x) x.25.3 x x x Piecewise Linear Marginal Value Functions and Dominance-based Rough Set Approach Assume, we have a set of objects A described by n criteria. We assign to each object x a vector x = (q (x),..., q n (x)), where i-th coordinate q i (x) is a value (evaluation) of object x on criterion q i, i =,..., n. For simplicity, it is assumed that domains of criteria are numerically coded with an increasing order of preference. The objective of multi-criteria classification problem is to build a preference model, according to which a class label d(x) from a finite set T is assigned to every object from A. Here, for simplicity, it is assumed that T = {, }. It corresponds to that the objects from Cl are strictly preferred to the objects from Cl. We assume that the DM provides a preference information concerning a set of reference objects U A, assigning to each object x U a label d(x) T. Reference objects described by criteria and class labels are often presented in the decision table. An example of the decision table is presented in Table. The criteria aggregation model (preference model) is assumed to be additive value function: n Φ(x) = w i φ i (q i (x)) () i= where φ i (q i (x)), i =,..., n, are non-decreasing marginal value functions, normalized between and, w i is a weight of φ i (q i (x)). A similar aggregation model with was used in [8] within UTA method (for ranking problems) and UTADIS [9, 5] (for multi-criteria classification problems), where marginal value functions were assumed to be piecewise linear. The use of this aggregation model for classification requires existence of threshold φ, such that d(x) = if Φ(x) φ and d(x) = otherwise (so d(x) = sgn(φ(x) φ )). The error, which is the sum of differences Φ(x) φ of misclassified objects is minimized. Assume however, that objects can be inconsistent. By inconsistency we mean violation of the dominance principle, requiring that any object x, having not worse evaluations than any object y on the considered set of criteria, cannot be assigned to a worse class than y. If such inconsistencies occur, the UTA method is not able to find any additive value function compatible with this information, whatever the complexity of the marginal functions (number of breakpoints) is, since none monotonic function can model this information. Within DRSA, such inconsistencies can be handled by using concepts of lower and upper approxi-
4 mations of classes. It was shown [3] that it corresponds to generalized decision function δ for an object x U: where δ(x) = l(x), u(x) (2) l(x) = min{d(x) : ydx, y U} u(x) = max{d(x) : xdy, y U} (3) where D is a dominance relation defined as xdy i {,...,n} q i (x) q i (y). In other words, given the preference information, for object x there is determined a range of decision classes to which x may belong. This range results from taking into account inconsistencies caused by x. Remark that without inconsistencies, for all x U, l(x) = u(x). Moreover, if we assign to each x U a class index l(x) (instead of d(x)), the decision table becomes consistent (similarly if we assign a class index u(x) for all x U). Thus one can deal with inconsistent set U, by considering two consistent sets with two different labelings. The values of generelized decision function are also presented in Table. In terms of further classification, the response of such model is a range of classes, to which an object may belong. For the two consistent decision tables it is possible to derive compatible value functions Φ l (x), Φ u (x) respectively, and corresponding marginal value functions φ l i (q i(x)) and φ u i (q i(x)) We assume that both φ l i (q i(x)) and φ u i (q i(x)) have piecewise linear form: k φ i (q i (x)) = c r i + r= c k i h k i hk i (q i (x) h k i ), for h k i q i (x) h k i (4) where h k i is the location of the k-th brakepoint on the i-th criterion (k =,..., κ i, where κ i is a number of brakepoints on i-th criterion), and c k i is an increment of marginal value function between brakepoints h k i and h k i, i.e. φ i(h k i ) φ i(h k+ i ). Equation (4) states that function φ i evaluated at q i (x) equals to the sum of increments on all intervals on the left of q i (x) and linearly approximated increment in the interval where q i (x) is located. The example is shown on Figure. In the simplest form, the corresponding optimization problem can be formulated for lower bound of generalized decision (Φ l i (x)) as follows: m min: σj l (5) j= subject to constraints: Φ l (x j ) φ l + σ l j x j : l(x j ) = (6) Φ l (x j ) φ l σ l j x j : l(x j ) = (7) σ l j x j (8) φ l i(z i ) = i {,..., n} (9) φ l i(z i ) = i {,..., n} () c k i i {,..., n}, k {,..., κ i } ()
5 .9 φi(qi(x)) q i(x) Fig.. Piecewise linear marginal value function φ i defined by Equation 4. The increments are: c i =., c 2 i =.5, c 3 i =.3, c 4 i =.. where m is the number of reference objects, σ j are possible errors, zi and z i are the highest and the lowest value on i-th criterion. Constraints (6) and (7) ensure correct separation, (9) and () control scaling and () preserves monotonicity. Analogous problem can be formulated for upper bound of generalized decision (function Φ u (x)). It is worth noting, that the method does not assure all errors become zero as the complexity of φ l i (q i(x)) and φ u i (q i(x)) grows, however, it avoids errors caused by inconsistencies. If all σi l and σu i become, the obtained model is concordant with DRSA in the sense that all objects belonging to lower or upper approximations will be reassigned by the obtained functions to these approximations. It is worth introducing some measure of complexity of marginal functions and minimize it, to avoid building complex models. Notice, that as the function φ i (q i (x)) is multiplied in () by weight w i, we can introduce new coefficients wi k = c k i w i in order to keep the problem linear. Now control of the complexity is done by minimizing a regularization term: n κ i w 2 = (wi k ) 2 (2) i= k= instead of controlling the scale of functions in (9) and (). Minimizing of term may lead to rescaling the utility function, so that all values of φ i (q i (x)) will decrease down almost to zero. To avoid that, constraints are modified introducing the unit margin around threshold, in which no object may appear without error. Thus we rewrite equations (6) and (7) as: The objective of the optimization is now: (Φ l (x j ) φ l )l(x j ) σ l i (3) min: w l 2 + C m σj l (4) j=
6 where C is the complexity constant. Analogous reasoning may be proceeded for Φ u (x). Such problem resembles maximal margin classifier and Support Vector Machines [4]. We will try to bring it even more similar. Let us first modify Φ(x) to be Φ(x) = n i= w iφ i (q i (x)) φ. Now, we decompose each function φ i (q i (x)) into κ i functions φ i,k (q i (x)) in the following way: κ i φ i (q i (x)) = c i kφ i,k (q i (x)). (5) k= An example of such decomposition is shown on Figure 2. One can treat the family of functions {φ, (q (x)),..., φ n,κn (q n (x))} as a transformation of the space of criteria. Namely, there is a map T : A R s where s = n i= κ i, such that T (x) = (φ, (q (x)),..., φ n,κn (q n (x))). By substituting wi k = w i c k i and denoting w = (w,..., wn κn ), the function () becomes: Φ(x) = w, T (x) φ (6) where, is a canonical dot product in R s. φi,(qi(x)) φi,2(qi(x)) q i(x) q i(x) φi,3(qi(x)) φi,4(qi(x)) q i(x) q i(x) Fig. 2. Functions obtained by decomposing φ i(q i(x)). Notice that φ i(q i(x) =.φ i,(q i(x)) +.5φ i,2(q i(x)) +.3φ i,3(q i(x)) +.φ i,4(q i(x)).
7 Finally, after reformulating the problem we obtain: min: w l + C m σj l (7) j= subject to constraints: ( w l, T l (x) φ l )l(x) σ l i x i U (8) σ l i i {,..., m} (9) w k i i {,..., n}, k {,..., κ i } (2) Notice that the regularization term is just squared Euclidean norm of the weight vector in new feature space. Motivated by the above result, we introduce a kernel function k : A A R, defined as: n k(x, y) = k i (x, y) (2) where i= κ i k i (x, y) = φ i,j (q i (x))φ i,j (q i (y)). (22) j= Notice, that k(x, y) = T (x), T (y). Assume that we have a brakepoint in each evaluation point of all objects x U, the set of brakepoints for i-th criterion is {q i (x ),..., q i (x m )}. Then the computing of the marginal kernel function k i (x, y) boils down to: k i (x, y) = min{rank i (x), rank i (y)} (23) where rank i (x) is a position (in ranking) of value q i (x) on i-th criterion. Thus, the problem may be formulated in a dual form. The most essential advantage of such approach is reduction in number of variables, irrespective to the number of brakepoints of marginal functions. As the complexity of marginal functions increases, the optimization problem remains the same and only the computation of the kernel function becomes harder. However there is a problem, how to ensure monotonicity of the resulting utility function. In the dual formulation the information about each criterion is lost, thus not all the weights may be non-negative. Let us remark that the transformed criteria space obtained by image of mapping T (A) (i.e. (φ, (q (x)),..., φ n,κn (q n (x))), x A) may be also seen as a result of binarization of criteria. This type of binarization should be called an exhaustive one by analogy to other approaches well-known in rough set theory or in logical analysis of data (see for example, []). The exhaustive binarization is proceeded by choosing cut points in each evaluation point of all objects x U. More precisely, the binarization of the i-th criterion is accomplished in a straightforward way by associating with each value
8 Table 2. Decision table from Table with binarized criteria U q.25 q.5 q.75 q q 2.3 q 2.6 q 2.65 q 2.7 d x x 2 x 3 x 4 v on this criterion, for which there exists an object x, such that q i (x) = v, a boolean attribute q iv such that: { if qi (x) v q iv (x) = if q i (x) < v. (24) Table 2 shows the exhaustive binarization of criteria from Table. Moreover, let us remark that the binarized decision table contains almost the same information as dominance matrix introduced in [2]. The dominance matrix DM is defined as follows: DM = {dm(x, y) : x, y U}, where dm(x, y) = {q i Q : q i (x) q i (y)}. (25) where Q = {q i, i =,..., n}. Dominance Matrix DM is usually implemented as 3-dimensional binary cube C defined as c jki =, if q i dm(x j, x k ), and c jki = otherwise, where j, k =,..., m and i =,..., n. Such a structure is very useful in a procedure of generating exhaustive set of decision rules [2], because all computations may be quickly proceeded as bitwise operations. It is a counterpart of a discernibility matrix [] well-known in classical rough set approach. It is easy to see that the following occurs: c jki = q iqi (x k ) (x j) =, x j, x k U. 3 Experimental results We performed a computational experiment on Wisconsin breast cancer (BCW) data obtained from the UCI Machine Learning Repository []. This problem was chosen since it is known to have monotonic relationship between values on condition attributes and decision labels. Thus, all attributes can be interpreted as criteria, enabling DRSA. BCW consist of 699 instances described by 9 integervalued attributes, from which 6 instances have missing values. Each instance is assigned to one of two classes (malignant and benign). Several approaches have been compared with the methodology presented in the previous section that will be referred to as Piecewise Linear DRSA (PL- DRSA). These are k-nearest Neighbours, linear Support Vector Machines, Logistic Regression, J48 Decision Trees and Naive Bayes. WEKA [4] software was used for the experiment. For all algorithms a criteria selection was performed, by
9 Table 3. Experimental results for Wisconsin breast cancer data. Algorithm Number of criteria loo estimate k-nn (k = ) % linear SVM % J % Logistic Regr % Naive Bayes 6 97.% PL-DRSA % using backward elimination. The number of criteria left and the leaving-one-out (loo) accuracy estimate are shown in Table 3. Apparently, all the accuracy estimates are similar. PL-DRSA was conducted with setting breakpoints on each evaluation point of all objects on each criterion. Two models were created, one for lower bound of the decision range, second for the upper bound. The classification rule was the following: if Φ l (x) Φ u (x) then assign x to class Cl otherwise assign x to Cl. The achieved accuracy was one of the best, competitive to other methods. However, marginal value functions constructed in PL-DRSA may be presented graphically and easily interpreted. Moreover, PL-DRSA shows the inconsistent data both in learning and classification stage. 4 Conclusions Within DRSA framework, the decision rule model were always considered for multicriteria decision analysis. We presented an alternative method, related to additive aggregation model, similar to the one used in the UTA method. The described approach has several advantages. First, it is flexible and allows various shapes of separating function to be obtained. Marginal value functions may also be presented graphically and interpreted by DM. Finally, PL-DRSA can control the complexity of the additive function by fixing the number of breakpoints and minimizing the slopes in each breakpoint. DRSA plays important role in handling inconsistencies, which affect the data. Ignoring them may cause errors and, therefore generate wrong decision model. The method can be also interpreted in terms of criteria transformation (in a specific case, also in terms of binarization) and Support Vector Machines. Acknowledgements. The authors wish to acknowledge financial support from the Ministry of Education and Science (grant no. 3TF 227). References. Boros, E., Hammer, P. L., Ibaraki, T., Kogan, A., Mayoraz, E., Muchnik, I.: An Implementation of Logical Analysis of Data. IEEE Trans. on Knowledge and Data Engineering 2 (2),
10 2. Dembczyński, K., Pindur, R., Susmaga R.: Generation of Exhaustive Set of Rules within Dominance-based Rough Set Approach. Electr. Notes Theor. Comput. Sci, 82 4 (23) 3. Dembczyński, K., Greco, S., Słowiński, R.: Second-order Rough Approximations in Multi-criteria Classification with Imprecise Evaluations and Assignments. LNAI 364 (25) Frank, E., Witten, I. H.: Data Mining: Practical machine learning tools and techniques, 2nd Edition. Morgan Kaufmann, San Francisco, (25) 5. Greco S., Matarazzo, B. and Słowiński, R.: Rough approximation of a preference relation by dominance relations, European Journal of Operational Research 7 (999) Greco S., Matarazzo, B. and Słowiński, R.: Rough sets theory for multicriteria decision analysis. European Journal of Operational Research, 29 (2) Greco S., Matarazzo, B. and Słowiński, R.: Axiomatic characterization of a general utility function and its particular cases in terms of conjoint measurement and roughset decision rules. European Journal of Operational Research, 58, (24) Jacquet-Lagréze, E., Siskos, Y.: Assessing a set of additive utility functions for multicriteria decision making: The UTA method. European Journal of Operational Research, (982) Jacquet-Lagréze, E.: An application of the UTA discriminant model for the evaluation of R&D projects, In Pardalos, P.M., Siskos, Y., Zopounidis, C., (eds.): Advances in multicriteria analysis. Kluwer Academic Publishers, Dordrecht (995) Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases, Irvine, CA: University of California, Department of Information and Computer Science (998). Skowron A., Rauszer C.: The discernibility matrices and functions in information systems. In Slowinski R. (ed.): Intelligent Decision Support. Handbook of Applications and Advances of the Rough Set Theory. Kluwer Academic Publishers, Dordrecht (992) Słowiński, R., Greco, S., Matarazzo, B.: Axiomatization of utility, outranking and decision-rule preference models for multiple-criteria classification problems under partial inconsistency with the dominance principle. Control & Cybernetics 3 (22) Słowiński, R., Greco S., Matarazzo, B., Rough Set Based Decision Support, Chapter 6, in Burke, E., Kendall, G. (eds.): Introductory Tutorials on Optimization, Search and Decision Support Methodologies. Springer-Verlag, Boston (25) 4. Vapnik, V.: The Nature of Statistical Learning Theory, New York, Springer-Verlag (995) 5. Zopounidis, C., Doumpos, M.: PREFDIS: a multicriteria decision support system for sorting decision problems, Computers & Operations Research 27 (2)
Similarity-based Classification with Dominance-based Decision Rules
Similarity-based Classification with Dominance-based Decision Rules Marcin Szeląg, Salvatore Greco 2,3, Roman Słowiński,4 Institute of Computing Science, Poznań University of Technology, 60-965 Poznań,
More informationStatistical Model for Rough Set Approach to Multicriteria Classification
Statistical Model for Rough Set Approach to Multicriteria Classification Krzysztof Dembczyński 1, Salvatore Greco 2, Wojciech Kotłowski 1 and Roman Słowiński 1,3 1 Institute of Computing Science, Poznań
More informationENSEMBLES OF DECISION RULES
ENSEMBLES OF DECISION RULES Jerzy BŁASZCZYŃSKI, Krzysztof DEMBCZYŃSKI, Wojciech KOTŁOWSKI, Roman SŁOWIŃSKI, Marcin SZELĄG Abstract. In most approaches to ensemble methods, base classifiers are decision
More informationThe possible and the necessary for multiple criteria group decision
The possible and the necessary for multiple criteria group decision Salvatore Greco 1, Vincent Mousseau 2 and Roman S lowiński 3 1 Faculty of Economics, University of Catania, Italy, salgreco@unict.it,
More informationStochastic Dominance-based Rough Set Model for Ordinal Classification
Stochastic Dominance-based Rough Set Model for Ordinal Classification Wojciech Kotłowski wkotlowski@cs.put.poznan.pl Institute of Computing Science, Poznań University of Technology, Poznań, 60-965, Poland
More informationRelationship between Loss Functions and Confirmation Measures
Relationship between Loss Functions and Confirmation Measures Krzysztof Dembczyński 1 and Salvatore Greco 2 and Wojciech Kotłowski 1 and Roman Słowiński 1,3 1 Institute of Computing Science, Poznań University
More informationAn algorithm for induction of decision rules consistent with the dominance principle
An algorithm for induction of decision rules consistent with the dominance principle Salvatore Greco 1, Benedetto Matarazzo 1, Roman Slowinski 2, Jerzy Stefanowski 2 1 Faculty of Economics, University
More informationNon-additive robust ordinal regression with Choquet integral, bipolar and level dependent Choquet integrals
Non-additive robust ordinal regression with Choquet integral, bipolar and level dependent Choquet integrals Silvia Angilella Salvatore Greco Benedetto Matarazzo Department of Economics and Quantitative
More informationFeature Selection with Fuzzy Decision Reducts
Feature Selection with Fuzzy Decision Reducts Chris Cornelis 1, Germán Hurtado Martín 1,2, Richard Jensen 3, and Dominik Ślȩzak4 1 Dept. of Mathematics and Computer Science, Ghent University, Gent, Belgium
More informationOrdinal Classification with Decision Rules
Ordinal Classification with Decision Rules Krzysztof Dembczyński 1, Wojciech Kotłowski 1, and Roman Słowiński 1,2 1 Institute of Computing Science, Poznań University of Technology, 60-965 Poznań, Poland
More informationReference-based Preferences Aggregation Procedures in Multicriteria Decision Making
Reference-based Preferences Aggregation Procedures in Multicriteria Decision Making Antoine Rolland Laboratoire ERIC - Université Lumière Lyon 2 5 avenue Pierre Mendes-France F-69676 BRON Cedex - FRANCE
More informationChapter 2 An Overview of Multiple Criteria Decision Aid
Chapter 2 An Overview of Multiple Criteria Decision Aid Abstract This chapter provides an overview of the multicriteria decision aid paradigm. The discussion covers the main features and concepts in the
More informationInteractive Multi-Objective Optimization (MOO) using a Set of Additive Value Functions
Interactive Multi-Objective Optimization (MOO) using a Set of Additive Value Functions José Rui Figueira 1, Salvatore Greco 2, Vincent Mousseau 3, and Roman Słowiński 4 1 CEG-IST, Center for Management
More informationNonlinear Support Vector Machines through Iterative Majorization and I-Splines
Nonlinear Support Vector Machines through Iterative Majorization and I-Splines P.J.F. Groenen G. Nalbantov J.C. Bioch July 9, 26 Econometric Institute Report EI 26-25 Abstract To minimize the primal support
More informationSupport Vector Machine via Nonlinear Rescaling Method
Manuscript Click here to download Manuscript: svm-nrm_3.tex Support Vector Machine via Nonlinear Rescaling Method Roman Polyak Department of SEOR and Department of Mathematical Sciences George Mason University
More informationBrief Introduction to Machine Learning
Brief Introduction to Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU August 29, 2016 1 / 49 1 Introduction 2 Binary Classification 3 Support Vector
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More informationSupport Vector Machines
Support Vector Machines Hypothesis Space variable size deterministic continuous parameters Learning Algorithm linear and quadratic programming eager batch SVMs combine three important ideas Apply optimization
More informationSupport Vector Machine Classification via Parameterless Robust Linear Programming
Support Vector Machine Classification via Parameterless Robust Linear Programming O. L. Mangasarian Abstract We show that the problem of minimizing the sum of arbitrary-norm real distances to misclassified
More informationData Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396
Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction
More informationSupport Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar
Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 Support Vector Machines Find a linear hyperplane
More informationInterpreting Low and High Order Rules: A Granular Computing Approach
Interpreting Low and High Order Rules: A Granular Computing Approach Yiyu Yao, Bing Zhou and Yaohua Chen Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail:
More informationSupport Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Support Vector Machines CAP 5610: Machine Learning Instructor: Guo-Jun QI 1 Linear Classifier Naive Bayes Assume each attribute is drawn from Gaussian distribution with the same variance Generative model:
More informationSupport Vector Machines Explained
December 23, 2008 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
More informationA Simple Algorithm for Multilabel Ranking
A Simple Algorithm for Multilabel Ranking Krzysztof Dembczyński 1 Wojciech Kot lowski 1 Eyke Hüllermeier 2 1 Intelligent Decision Support Systems Laboratory (IDSS), Poznań University of Technology, Poland
More informationStudy on Classification Methods Based on Three Different Learning Criteria. Jae Kyu Suhr
Study on Classification Methods Based on Three Different Learning Criteria Jae Kyu Suhr Contents Introduction Three learning criteria LSE, TER, AUC Methods based on three learning criteria LSE:, ELM TER:
More informationAn axiomatic approach to ELECTRE TRI 1
An axiomatic approach to ELECTRE TRI 1 Denis Bouyssou 2 CNRS LAMSADE Thierry Marchant 3 Ghent University 22 April 2004 Prepared for the Brest Proceedings Revised following the comments by Thierry 1 This
More informationEnsembles of classifiers based on approximate reducts
Fundamenta Informaticae 34 (2014) 1 10 1 IOS Press Ensembles of classifiers based on approximate reducts Jakub Wróblewski Polish-Japanese Institute of Information Technology and Institute of Mathematics,
More informationROBUST ORDINAL REGRESSION
Chapter 1 ROBUST ORDINAL REGRESSION Salvatore Greco Faculty of Economics, University of Catania, Corso Italia 55, 95 129 Catania, Italy salgreco@unict.it Roman S lowiński Institute of Computing Science,
More informationSupport'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan
Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination
More informationLecture 9: Large Margin Classifiers. Linear Support Vector Machines
Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation
More informationMax Margin-Classifier
Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization
More informationMinimal Attribute Space Bias for Attribute Reduction
Minimal Attribute Space Bias for Attribute Reduction Fan Min, Xianghui Du, Hang Qiu, and Qihe Liu School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu
More informationAijun An and Nick Cercone. Department of Computer Science, University of Waterloo. methods in a context of learning classication rules.
Discretization of Continuous Attributes for Learning Classication Rules Aijun An and Nick Cercone Department of Computer Science, University of Waterloo Waterloo, Ontario N2L 3G1 Canada Abstract. We present
More informationLinear Classification and SVM. Dr. Xin Zhang
Linear Classification and SVM Dr. Xin Zhang Email: eexinzhang@scut.edu.cn What is linear classification? Classification is intrinsically non-linear It puts non-identical things in the same class, so a
More informationCOMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37
COMP 652: Machine Learning Lecture 12 COMP 652 Lecture 12 1 / 37 Today Perceptrons Definition Perceptron learning rule Convergence (Linear) support vector machines Margin & max margin classifier Formulation
More informationA new Approach to Drawing Conclusions from Data A Rough Set Perspective
Motto: Let the data speak for themselves R.A. Fisher A new Approach to Drawing Conclusions from Data A Rough et Perspective Zdzisław Pawlak Institute for Theoretical and Applied Informatics Polish Academy
More informationSupport Vector Machine & Its Applications
Support Vector Machine & Its Applications A portion (1/3) of the slides are taken from Prof. Andrew Moore s SVM tutorial at http://www.cs.cmu.edu/~awm/tutorials Mingyue Tan The University of British Columbia
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationA short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie
A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab
More informationSupport Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification
More informationLearning Kernel Parameters by using Class Separability Measure
Learning Kernel Parameters by using Class Separability Measure Lei Wang, Kap Luk Chan School of Electrical and Electronic Engineering Nanyang Technological University Singapore, 3979 E-mail: P 3733@ntu.edu.sg,eklchan@ntu.edu.sg
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationResearch on Complete Algorithms for Minimal Attribute Reduction
Research on Complete Algorithms for Minimal Attribute Reduction Jie Zhou, Duoqian Miao, Qinrong Feng, and Lijun Sun Department of Computer Science and Technology, Tongji University Shanghai, P.R. China,
More informationSupport Vector Machines on General Confidence Functions
Support Vector Machines on General Confidence Functions Yuhong Guo University of Alberta yuhong@cs.ualberta.ca Dale Schuurmans University of Alberta dale@cs.ualberta.ca Abstract We present a generalized
More informationMultivariate statistical methods and data mining in particle physics
Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More informationA Posteriori Corrections to Classification Methods.
A Posteriori Corrections to Classification Methods. Włodzisław Duch and Łukasz Itert Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland; http://www.phys.uni.torun.pl/kmk
More informationOn Improving the k-means Algorithm to Classify Unclassified Patterns
On Improving the k-means Algorithm to Classify Unclassified Patterns Mohamed M. Rizk 1, Safar Mohamed Safar Alghamdi 2 1 Mathematics & Statistics Department, Faculty of Science, Taif University, Taif,
More informationSupport Vector Machines
Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector
More informationSUPPORT VECTOR MACHINE
SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition
More informationMachine Learning. Support Vector Machines. Manfred Huber
Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationAn Empirical Study of Building Compact Ensembles
An Empirical Study of Building Compact Ensembles Huan Liu, Amit Mandvikar, and Jigar Mody Computer Science & Engineering Arizona State University Tempe, AZ 85281 {huan.liu,amitm,jigar.mody}@asu.edu Abstract.
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationBinary Classification, Multi-label Classification and Ranking: A Decision-theoretic Approach
Binary Classification, Multi-label Classification and Ranking: A Decision-theoretic Approach Krzysztof Dembczyński and Wojciech Kot lowski Intelligent Decision Support Systems Laboratory (IDSS) Poznań
More informationSupport Vector Machine (continued)
Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need
More information6.867 Machine learning
6.867 Machine learning Mid-term eam October 8, 6 ( points) Your name and MIT ID: .5.5 y.5 y.5 a).5.5 b).5.5.5.5 y.5 y.5 c).5.5 d).5.5 Figure : Plots of linear regression results with different types of
More informationDominance-based Rough Set Approach Data Analysis Framework. User s guide
Dominance-based Rough Set Approach Data Analysis Framework User s guide jmaf - Dominance-based Rough Set Data Analysis Framework http://www.cs.put.poznan.pl/jblaszczynski/site/jrs.html Jerzy Błaszczyński,
More informationClassification Based on Logical Concept Analysis
Classification Based on Logical Concept Analysis Yan Zhao and Yiyu Yao Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2 E-mail: {yanzhao, yyao}@cs.uregina.ca Abstract.
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)
More informationMinOver Revisited for Incremental Support-Vector-Classification
MinOver Revisited for Incremental Support-Vector-Classification Thomas Martinetz Institute for Neuro- and Bioinformatics University of Lübeck D-23538 Lübeck, Germany martinetz@informatik.uni-luebeck.de
More informationCRITERIA REDUCTION OF SET-VALUED ORDERED DECISION SYSTEM BASED ON APPROXIMATION QUALITY
International Journal of Innovative omputing, Information and ontrol II International c 2013 ISSN 1349-4198 Volume 9, Number 6, June 2013 pp. 2393 2404 RITERI REDUTION OF SET-VLUED ORDERED DEISION SYSTEM
More informationLecture 7: Kernels for Classification and Regression
Lecture 7: Kernels for Classification and Regression CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 15, 2011 Outline Outline A linear regression problem Linear auto-regressive
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationNatural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley
Natural Language Processing Classification Classification I Dan Klein UC Berkeley Classification Automatically make a decision about inputs Example: document category Example: image of digit digit Example:
More informationRobust Kernel-Based Regression
Robust Kernel-Based Regression Budi Santosa Department of Industrial Engineering Sepuluh Nopember Institute of Technology Kampus ITS Surabaya Surabaya 60111,Indonesia Theodore B. Trafalis School of Industrial
More informationPattern Recognition 2018 Support Vector Machines
Pattern Recognition 2018 Support Vector Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 1 / 48 Support Vector Machines Ad Feelders ( Universiteit Utrecht
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationSupport Vector Machines with Example Dependent Costs
Support Vector Machines with Example Dependent Costs Ulf Brefeld, Peter Geibel, and Fritz Wysotzki TU Berlin, Fak. IV, ISTI, AI Group, Sekr. FR5-8 Franklinstr. 8/9, D-0587 Berlin, Germany Email {geibel
More informationSupport Vector Machines
Support Vector Machines Tobias Pohlen Selected Topics in Human Language Technology and Pattern Recognition February 10, 2014 Human Language Technology and Pattern Recognition Lehrstuhl für Informatik 6
More information18.9 SUPPORT VECTOR MACHINES
744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the
More informationSupport Vector Machines
Two SVM tutorials linked in class website (please, read both): High-level presentation with applications (Hearst 1998) Detailed tutorial (Burges 1998) Support Vector Machines Machine Learning 10701/15781
More informationSupport Vector Machines II. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Support Vector Machines II CAP 5610: Machine Learning Instructor: Guo-Jun QI 1 Outline Linear SVM hard margin Linear SVM soft margin Non-linear SVM Application Linear Support Vector Machine An optimization
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More informationSupport Vector Machines. Machine Learning Fall 2017
Support Vector Machines Machine Learning Fall 2017 1 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost 2 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost Produce
More informationSupport Vector Machines.
Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel
More informationMining Classification Knowledge
Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification
More informationML (cont.): SUPPORT VECTOR MACHINES
ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version
More informationA Logical Formulation of the Granular Data Model
2008 IEEE International Conference on Data Mining Workshops A Logical Formulation of the Granular Data Model Tuan-Fang Fan Department of Computer Science and Information Engineering National Penghu University
More informationReview: Support vector machines. Machine learning techniques and image analysis
Review: Support vector machines Review: Support vector machines Margin optimization min (w,w 0 ) 1 2 w 2 subject to y i (w 0 + w T x i ) 1 0, i = 1,..., n. Review: Support vector machines Margin optimization
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall
More informationEasy Categorization of Attributes in Decision Tables Based on Basic Binary Discernibility Matrix
Easy Categorization of Attributes in Decision Tables Based on Basic Binary Discernibility Matrix Manuel S. Lazo-Cortés 1, José Francisco Martínez-Trinidad 1, Jesús Ariel Carrasco-Ochoa 1, and Guillermo
More information9 Classification. 9.1 Linear Classifiers
9 Classification This topic returns to prediction. Unlike linear regression where we were predicting a numeric value, in this case we are predicting a class: winner or loser, yes or no, rich or poor, positive
More informationON COMBINING PRINCIPAL COMPONENTS WITH FISHER S LINEAR DISCRIMINANTS FOR SUPERVISED LEARNING
ON COMBINING PRINCIPAL COMPONENTS WITH FISHER S LINEAR DISCRIMINANTS FOR SUPERVISED LEARNING Mykola PECHENIZKIY*, Alexey TSYMBAL**, Seppo PUURONEN* Abstract. The curse of dimensionality is pertinent to
More informationData Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction
Data Mining 3.6 Regression Analysis Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Straight-Line Linear Regression Multiple Linear Regression Other Regression Models References Introduction
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationA Reformulation of Support Vector Machines for General Confidence Functions
A Reformulation of Support Vector Machines for General Confidence Functions Yuhong Guo 1 and Dale Schuurmans 2 1 Department of Computer & Information Sciences Temple University www.cis.temple.edu/ yuhong
More informationGroup Decision-Making with Incomplete Fuzzy Linguistic Preference Relations
Group Decision-Making with Incomplete Fuzzy Linguistic Preference Relations S. Alonso Department of Software Engineering University of Granada, 18071, Granada, Spain; salonso@decsai.ugr.es, F.J. Cabrerizo
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationKernels and the Kernel Trick. Machine Learning Fall 2017
Kernels and the Kernel Trick Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem Support vectors, duals and kernels
More informationPattern Recognition and Machine Learning. Perceptrons and Support Vector machines
Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3
More informationSupport vector machines
Support vector machines Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 SVM, kernel methods and multiclass 1/23 Outline 1 Constrained optimization, Lagrangian duality and KKT 2 Support
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationBeyond Sequential Covering Boosted Decision Rules
Beyond Sequential Covering Boosted Decision Rules Krzysztof Dembczyński 1, Wojciech Kotłowski 1,andRomanSłowiński 2 1 Poznań University of Technology, 60-965 Poznań, Poland kdembczynski@cs.put.poznan.pl,
More informationOutline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22
Outline Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems
More information