CONSTRUCTING A NOVEL MONOTONICITY CONSTRAINED SUPPORT VECTOR REGRESSION MODEL

Size: px

Start display at page:

Download "CONSTRUCTING A NOVEL MONOTONICITY CONSTRAINED SUPPORT VECTOR REGRESSION MODEL"

Gabriel Robertson
5 years ago
Views:

1 CONSTRUCTING A NOVEL MONOTONICITY CONSTRAINED SUPPORT VECTOR REGRESSION MODEL Chih-Chuan Chen, Department of Industrial and Information management, National Cheng Kung University, Taiwan, R.O.C. Department of Leisure Information Management Taiwan Shoufu University, Taiwan, R.O.C. ccchen@tsu.edu.tw Shu-Ching Kuo, Department of Leisure Information Management Taiwan Shoufu University, Taiwan, R.O.C. su0102@gmail.com Corresponding Author Sheng-Tun Li, Department of Industrial and Information management National Cheng Kung University, Taiwan, R.O.C. Institute of Information Management National Cheng Kung University, Taiwan, R.O.C. stli@mail.ncku.edu.tw ABSTRACT This paper aims to construct a monotonicity constrained nonlinear regression model based on Support Vector Machines (SVMs). In many application areas of machine learning, there exists prior knowledge concerning the monotone relations between the response variable and some of the predictor variables. Monotonicity may be an important model requirement with a view toward explaining and justifying decisions. Therefore, in the study we propose a monotonicity constrained Support Vector Regression (SVR) model that incorporates in the monotone nature of the problems. A quadratic programming problem in the dual space is developed similar to its SVR predecessor. When applied to some synthetic data sets, the proposed method shows advantages and promising results. Keywords: Classification problems, SVM, Monotonicity constraints S5-241

2 INTRODUCTION Data mining techniques enable us to discover hidden patterns and extract valuable knowledge from databases. With the advent of the computer, various data mining methods have been proposed and vehemently discussed. Among them, support vector machines (SVM), characterized by convex optimization problems, is an important method in the fields of neural networks and nonlinear modeling, and has been successfully applied to problems of classification and nonlinear function estimation. The technology of SVM, pioneered by Vapnik in 1995, is a state-of-the-art artificial neural network (ANN) based on statistical learning (Vapnik, 1995; Vapnik, 1998). In recent years, it has drawn overwhelming attention from diverse research communities due to its outstanding performance to solve classification problems and its novel approach to improve the generalization property of ANNs (Burges, 1998; Cristianini & Shawe-Taylor, 2000). Unlike ANNs which minimize empirical risk, SVM is designed to minimize the structural risk by minimizing an upper bound of the generalization error rather than the training error. Therefore, the over fitting problem in machine learning can be solved successfully. Compared to ANNs, the other outstanding property of SVM is that the task of training SVM can be mapped to a uniquely solvable linearly constrained quadratic programming problem, which produces a solution that is always unique and globally optimal. SVMs have been widely applied to many kinds of fields in the past few years, such as corporate distress, consumer loan evaluation, text categorization, handwritten digit recognition, speaker verification, bioinformatics, and many others. In many applications of classification, we have a priori knowledge to the extent that, all else being equal, an increase in an input variable should not lead to a decrease (or increase) in class label. For example, if loan applicants A and B have the same attribute values, except that A has a higher income than B, then it would be surprising if B got the loan while A did not. Examples of other application domains in which we can have this type of knowledge are legal support systems, medicine (e.g. smoking increases the probability of vascular diseases), operations research and economics (e.g. house prices increase with the house area). In the aforementioned problems, one can see that there are some monotonic relationships between the class and some of the attributes. When taking into account this prior knowledge about the data, one needs to add some monotonicity constraints into the classification model like SVM. It has been shown that classification technique incorporated with monotonicity constraints can extract knowledge with more justifiability and comprehensibility. In the data-mining literatures about monotonicity constraints, there are two different approaches for dealing with problems that have prior knowledge of monotonic properties, although there are only few papers focused on this topic. One is to apply a relabeling technique to those data missing monotonicity (Duivesteijn & Feelders, 2008). The other is to add the monotonicity constraints directly to the optimization modeling settings (Falck et al., 2009; Evgeniou & Boussios, 2005; Doumpos & Zopounidis, 2009), In the latter approach, S5-242

3 Evgeniou, Boussios and Zcharia (Evgeniou & Boussios, 2005) and Doumpos and Zopounidis (Doumpos & Zopounidis, 2009) simulated a mass of monotonic data to formulate monotonicity constraints to enforce monotonicity. One can see that the simulated data could increase the complexity of the problem computation-wise. Pelckmans et al. developed a LS- SVM regression model with monotonicity constraints. In their problems settings, instead of using simulated data, the input data are all utilized to formulate the monotonicity constraints, and they assume the input data follow a linear order and the bias term is omitted. However, such assumption may not be applied in practice. Moreover, the sparseness is lost in the LS- SVM regression model. Therefore, to deal with the shortcomings caused in the aforementioned studies, in this research, we propose a new SVR model with monotonicity constraints that are inequalities and are based on the partial order in the input data. The rest of the paper This paper is organized as follows. In Section 2, we have the related literature review. In Section 3, we discuss formulation of the monotonicity constrained SVMR model. Section 4 presents the experimental results. And finally, in Section 5, we have discussion and conclusion. LITERATURE REVIEW In this section, we review the related literatures to lay the foundation of this research project. The topics include support vector machines and classification with monotonicity constraints. Support Vector Machines SVM is the state-of-the-art neural network technology based on statistical learning (Vapnik, 1995; Vapnik, 1998). It was originally designed for binary classification in order to construct an optimal hyperplane so that the margin of separation between the negative and positive data set will be maximized. If the data are linearly separated, the optimal hyperplane will separate the data without error and the data points closest to the optimal separating hyperplane are named as support vectors. However, in practice, the data set of interest is usually linear nonseparable. In order to enhance the feasibility of linear separation, one can usually perform a non-linear transformation to the data set into a higher dimensional space, the so-called feature space. Unfortunately, the curse of dimensionality in machine learning makes the non-linear mapping too difficult to solve. SVMs solve the hurdle by using the mechanism of inner-product kernel. A comprehensive tutorial on SVM classifier has been published by Burges (1998) (Burges, 1998). Excellent performances were also obtained in the function estimation and time-series prediction applications (Müller et al., 1997; Mukherjee et al., 1997). Huang, Nakamoria, Wang (2005) (Huang et al., 2005)investigated the predictability of financial movement direction with SVM by forecasting the weekly movement direction of NIKKEI 225 index. They demonstrated that SVM outperforms Linear Discriminant Analysis, Quadratic Discriminant Analysis and Elman Backpropagation Neural Networks. Recently, SVM has S5-243

4 received much more attractions than the traditional backpropagation neural network attributed to its salient advantages (Kim & Sohn, 2010). With the advantages, many studies on SVM are presented in multiple disciplinary concerning its theory and applications. Classification with Monotonicity Constraints For classification problems with ordinal attributes very often, the class attribute should increase with each or some of the explaining attributes. They are called classification problems with the monotonicity constraints (Potharst & Feelders, 2002). The problem of classification with monotonicity constraints are commonly encountered in real-life applications such as bankruptcy risk prediction (Greco et al., 1998), finance (Gamarnik, 1998), breast cancer diagnosis (Ryu et al., 2007), house pricing (Potharst & Feelders, 2002), credit rating (Doumpos & Pasiouras, 2005) and many others. The importance of classification with monotonicity constraints had been witnessed by (Pazzani et al., 2001), which presented an evaluation work to investigate the potential for monotonicity constraints to bias machine learning systems to learn rules that were both accurate and meaningful. M. Doumpos, C. Zopounidis (Doumpos & Zopounidis, 2009) proposed a monotonic support vector machines for credit risk rating. It use the monotonicity hints to produce the virtual examples to impose the monotonic conditions which represent the special prior domain knowledge of the problem. The experimental results from a large sample of Greek industrial firms demonstrated that the introduction of the monotonicity condition reduces the danger of overfitting, thus leading to models with higher predicting ability. S. Wang applied the neural network with monotonicity property as a non-parametric efficiency analysis method to the study of efficiency analysis for private and public organizations (Wang, 2003). Simulation experiments demonstrated that their approach can remove the overhead about the parametric assumption of distribution functions as traditional efficiency analysis such as data envelopment analysis (DEA) and stochastic frontier functions (SFF). Similar work can be seen in (Pendharkar and Rodger, 2003) (Pendharkar & Rodger, 2003). Daniels and Kamp (1999) (Daniels & Kamp, 1999) proposed a monotonic neural network whose construction was done by considering multilayer neural networks with non-negative weights. In the literature we surveyed, there is a lack of method for constructing SVM with monotonicity constraints. With the increasingly more popularity ofsvm over traditional classification methods, our study isexpected to able to fulfill the need for a monotonic SVM. RESEARCH METHODOLOGY In this section, we describe the formulation of the proposed monotonicity constrained SVM model. Monotonicity is a relationship in which increasing the value of the variables always increases or decreases the likelihood category membership. Monotonicity is defined as S5-244

5 follows: Let N be the number of instances, and n be the number of attributes. Given a dataset {( ) }, with denoted as the feature space, and a partial ordering defined over this input space. A linear ordering is defined over the space Y of class values. Then the classifier is monotoneif the following statement holds: ( ) ( ) ( ) A a partial ordering on a set A is a relation which satisfies three properties containing reflexivity, anti-symmetry, and transitivity. A linear ordering is a partial order with comparability. These properties are described in the following. Reflexivity: for all Anti-symmetry: If and for any then Transitivity: If and for any then Comparability: For any either or For a set of observed dataset {( ) } the primal SVR model can be presented as ( ) ( ) ( ( ) ) Subject to { ( ( ) ) ( ) Where the inequalities ( ) ( ) for all (3) Are the monotonicity constraints. The Lagrangian for this problem is S5-245

6 ( ) ( ) ([ ( ) ] ) ( [ ( ) ] ) (4) ( ( ) ( )) ( ) with Lagrangian multipliers for and for all the optimal solution can be found at the saddle point of the Lagrangian by first minimizing over the primal variables w and b, and then maximizing over the dual multipliers and ( ) (5) Note that here by and v, we refer to ( ) and ( ). Taking the derivatives on wand b, one obtains a quadratic programming problem (dual problem) which has the following form. ( ) ( ) ( ) ( ) ( ) ( ) ( ( ) ( )) ( ) ( ( ) ( )) ( ) ( ) ( ) ( ( ) ( )) ( ) ( ) Subject to S5-246

7 ( ) A kernel trick can be applied to this quadratic form. For any symmetric, continuous function satisfying Mercer s condition [45-47], there exists a mapping ( ) such that ( ) ( ) ( ) With an appropriate choice of kernel K, the nonlinear monotonicity constrained SVM regression takes the form: Where s, and are the solution to the quadratic programming problem in (8). Now, we develop an algorithm to solve the proposed monotonicityconstrained support vector machine. Firstly, we write the objective function ( ) in matrix form. For notational convenience, we re-index and denote the new as in the following manner. Suppose there are M elements in. We can define a one-to-one mapping from the set { } to the set {( ) } and denote it as ( ). Now the optimization problem (5) can be rewritten as where G is a symmetric matrix. Apparently, the above problem is a quadraticprogramming. If the matrix G is positive S5-247

8 semidefinite, the solution is global. If G is positive definite the solution is global and unique. When G is indefinite there may exist localsolutions. Quadratic programming problems can be solved by any available quadratic programming solvers. We applied the proposed method with Gaussian kernel. The hyperparameter C and the kernel parameters can be tuned by applying k-fold cross-validation on training/validation data to a grid search. The algorithm of monotonicity constrained SVR isplanned as follows. Algorithm MC-SVR Input: Observed dataset {( ) } Output: and the corresponding classifier Steps: 1. Determine the M pairs of monotonicity constraints ( ) such that for 2. Compute the matrix G in (11). 3. Solve the quadratic programming problem in (12) for by a quadratic programming solver such as quadprog in MATLAB. 4. Apply k-fold cross-validation on training/validation data and repeat Step 3 to find the optimal parameters. 5. Output the optimal α and β Fig. 1: The MC-SVR algorithm 6. Determine the MC-SVR estimator as in (9). EXPERIMENTAL RESULTS We applied the proposed algorithm to approximate the sigmoid function on the interval [-1, 1]. In total, 100 even-spaced x values are used, along with their corresponding y values, to form the correct data set. Various perturbations, ranging from 5% to 25%, are applied to simulate artificial data. Five-fold cross-validation was adopted in the experiment. Dataset was random divided into 5 approximately equal sets, and each set was used to test the classifier, and other sets were used to train the classifier. We repeated this step until all sets have been used to test the classifier once. The Table 1 shows the comparison on accuracy of original SVR and the proposed MC-SVR. The results of artificial datasets show that the proposed monotoncity constrained SVR performs better. Table 1: Average SSE of MC-SVR and SVR on Sigmoid Data S5-248

CONCLUSIONS In many application areas of machine learning, there exists prior knowledge concerning the monotone relations between the response variable and predictor variables.

9 CONCLUSIONS In many application areas of machine learning, there exists prior knowledge concerning the monotone relations between the response variable and predictor variables. For some cases, monotonicity is an important model requirement with a view toward explaining and justifying decisions. Therefore, in the study we propose a monotonicity constrained SVR that takes into account the monotone nature of problems. A quadratic programming problem in the dual space is developed similar to its SVR predecessor. We experimented the proposed method on the perturbed data made from Sigmoid function. The results showed that the proposed method has advantages over the original SVR. REFERENCES 1. Burges, C. J. C. (1998). Data Mining Knowledge Discovery 2, Cristianini, N. & Shawe-Taylor, J. (2000). Cambridge University Press, Ordering Information. 3. Daniels, H. & Kamp, B. (1999). Neural Computation and Applications 8, Doumpos, M. & Pasiouras, F. (2005). Computational Economics 25, Doumpos, M. & Zopounidis, C. (2009). new Mathematics and Natural Computation 5, Duivesteijn, W. & Feelders, A. (2008). Nearest Neighbour Classification with Monotonicity 7. Constraints,, 2008 European Conference on Machine Learning and Knowledge Discovery in Databases, Antwerp, Belgium Springer-Verlag. 8. Evgeniou, T., C. & Boussios, e. a. (2005). Marketing Science 24, Falck, T., Suykens, J. & De Moor, B. (2009). The 48th IEEE Conference on Decision and Control (CDC 2009) Shanghai, China 10. Gamarnik, D. (1998). In: Proceedings of the eleventh annual conference on computational learning theory, ACM Press, New York S5-249

10 11. Greco, S., Matarazzo, B. & SแowiĔski, R. (1998). In: Zopounidis, C. (ed.) Operational Tools in the Management of Financial Risks, Kluwer Academic Publishers, Dordrech Huang, W., Nakamoria, Y. & Wang, S. Y. (2005). Computers & Operations Research 32, Kim, H. S. & Sohn, S. Y. (2010). European Journal of Operational Research 201, R., Smola, A., Rไtsch, G., Sch๖lkopf, B., Kohlmorgen, J. & Vapnik, V. (1997). International Conference on Artificial Neural Networks p Springer Lecture Notes in Computer Science. 15. Mukherjee, S., Osuna, E. & Girosi, F. (1997). IEEE Workshop on Neural networks for Signal Processing 7, pp Amelia Island, FL 16. Pazzani, M. J., Mani, S. & Shankle, W. R. (2001). Methods of Information in Medicine 40, Pendharkar, P. C. & Rodger, J. A. (2003). Decision Support Systems 36, Potharst, R. & Feelders, A. J. (2002). ACM SIGKDD Explorations Newsletter 4, Ryu, Y. U., Chandrasekaran, R. & Jacob, V. (2007). European Journal of Operational Research 181, Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. New York: Springer-Verlag. 21. Vapnik, V. N. (1998). Statistical Learning Theory. New York: 22. Wiley. Wang, S. (2003). Computers and Operations Research 30, 17. S5-250

Jeff Howbert Introduction to Machine Learning Winter

Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable