A Linear Regression Model for Nonlinear Fuzzy Data

Similar documents
A New Method to Forecast Enrollments Using Fuzzy Time Series

An Evolutive Interval Type-2 TSK Fuzzy Logic System for Volatile Time Series Identification.

Solution of Fuzzy Maximal Flow Network Problem Based on Generalized Trapezoidal Fuzzy Numbers with Rank and Mode

FORECASTING OF ECONOMIC QUANTITIES USING FUZZY AUTOREGRESSIVE MODEL AND FUZZY NEURAL NETWORK

Downloaded from iors.ir at 10: on Saturday May 12th 2018 Fuzzy Primal Simplex Algorithms for Solving Fuzzy Linear Programming Problems

Uncertain System Control: An Engineering Approach

Intersection and union of type-2 fuzzy sets and connection to (α 1, α 2 )-double cuts

Enhancing Fuzzy Controllers Using Generalized Orthogonality Principle

Centroid of an Interval Type-2 Fuzzy Set Re-Formulation of the Problem

Why Bellman-Zadeh Approach to Fuzzy Optimization

MODELLING OF TOOL LIFE, TORQUE AND THRUST FORCE IN DRILLING: A NEURO-FUZZY APPROACH

Fuzzy relation equations with dual composition

On correlation between two real interval sets

Solving Linear Programming Problems with Fuzzy Data

Fuzzy Ridge Regression with non symmetric membership functions and quadratic models

Stability Analysis of the Simplest Takagi-Sugeno Fuzzy Control System Using Popov Criterion

Membership Functions Representing a Number vs. Representing a Set: Proof of Unique Reconstruction

Institute for Advanced Management Systems Research Department of Information Technologies Åbo Akademi University. Fuzzy Logic Controllers - Tutorial

The Trapezoidal Fuzzy Number. Linear Programming

H. Zareamoghaddam, Z. Zareamoghaddam. (Received 3 August 2013, accepted 14 March 2014)

Introduction to Intelligent Control Part 6

The problem of distributivity between binary operations in bifuzzy set theory

Uncertain Logic with Multiple Predicates

The Fuzziness in Regression Models

INTELLIGENT CONTROL OF DYNAMIC SYSTEMS USING TYPE-2 FUZZY LOGIC AND STABILITY ISSUES

A Zadeh-Norm Fuzzy Description Logic for Handling Uncertainty: Reasoning Algorithms and the Reasoning System

On using different error measures for fuzzy linear regression analysis

Stability in multiobjective possibilistic linear programs

A New Approach for Solving Dual Fuzzy Nonlinear Equations Using Broyden's and Newton's Methods

Function Approximation through Fuzzy Systems Using Taylor Series Expansion-Based Rules: Interpretability and Parameter Tuning

Fuzzy Systems. Introduction

Multi level inventory management decisions with transportation cost consideration in fuzzy environment. W. Ritha, S.

A Generalized Decision Logic in Interval-set-valued Information Tables

Type-2 Fuzzy Shortest Path

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

On the Relation of Probability, Fuzziness, Rough and Evidence Theory

IN many real-life situations we come across problems with

Interval based Uncertain Reasoning using Fuzzy and Rough Sets

Predicting Pre-monsoon Thunderstorms -A Statistical View through Propositional Logic

An Uncertain Bilevel Newsboy Model with a Budget Constraint

Towards Foundations of Interval and Fuzzy Uncertainty

How to Define "and"- and "or"-operations for Intuitionistic and Picture Fuzzy Sets

Where are we? Operations on fuzzy sets (cont.) Fuzzy Logic. Motivation. Crisp and fuzzy sets. Examples

A framework for type 2 fuzzy time series models. K. Huarng and H.-K. Yu Feng Chia University, Taiwan

Fuzzy Systems. Introduction

Fuzzy efficiency: Multiplier and enveloping CCR models

Multiple Linear Regression estimation, testing and checking assumptions

FUZZY LINEAR REGRESSION

Computing with Words: Towards a New Tuple-Based Formalization

PAijpam.eu ON FUZZY INVENTORY MODEL WITH ALLOWABLE SHORTAGE

A New Fuzzy Positive and Negative Ideal Solution for Fuzzy TOPSIS

Computations Under Time Constraints: Algorithms Developed for Fuzzy Computations can Help

Credibilistic Bi-Matrix Game

The Problem. Sustainability is an abstract concept that cannot be directly measured.

FUZZY LOGIC CONTROLLER AS MODELING TOOL FOR THE BURNING PROCESS OF A CEMENT PRODUCTION PLANT. P. B. Osofisan and J. Esara

Input Control in Fuzzy Non-Homogeneous Markov Systems

Bulletin of the Transilvania University of Braşov Vol 8(57), No Series III: Mathematics, Informatics, Physics,

Fuzzy directed divergence measure and its application to decision making

Fuzzy Order Statistics based on α pessimistic

On Tuning OWA Operators in a Flexible Querying Interface

Previous Accomplishments. Focus of Research Iona College. Focus of Research Iona College. Publication List Iona College. Journals

type-2 fuzzy sets, α-plane, intersection of type-2 fuzzy sets, union of type-2 fuzzy sets, fuzzy sets

An enhanced fuzzy linear regression model with more flexible spreads

Ranking of Intuitionistic Fuzzy Numbers by New Distance Measure

Temperature Prediction Using Fuzzy Time Series

Constrained Optimization and Support Vector Machines

Learning from Examples

CONTROL SYSTEMS, ROBOTICS AND AUTOMATION Vol. XVII - Analysis and Stability of Fuzzy Systems - Ralf Mikut and Georg Bretthauer

Some remarks on conflict analysis

Fuzzy Local Trend Transform based Fuzzy Time Series Forecasting Model

SYSTEM identification treats the problem of constructing

Generalized Triangular Fuzzy Numbers In Intuitionistic Fuzzy Environment

A Comparative Study of Different Order Relations of Intervals

Computing a Transitive Opening of a Reflexive and Symmetric Fuzzy Relation

FUZZY ARITHMETIC BASED LYAPUNOV SYNTHESIS IN THE DESIGN OF STABLE FUZZY CONTROLLERS: A COMPUTING WITH WORDS APPROACH

Applying Fuzzy Linguistic Preferences to Kansei Evaluation

On the Continuity and Convexity Analysis of the Expected Value Function of a Fuzzy Mapping

A New Approach for Optimization of Real Life Transportation Problem in Neutrosophic Environment

First Order Non Homogeneous Ordinary Differential Equation with Initial Value as Triangular Intuitionistic Fuzzy Number

Fuzzy Sets and Fuzzy Techniques. Joakim Lindblad. Outline. Constructing. Characterizing. Techniques. Joakim Lindblad. Outline. Constructing.

Transactions on Modelling and Simulation vol 8, 1994 WIT Press, ISSN X

Design On-Line Tunable Gain Artificial Nonlinear Controller

Uncertain Systems are Universal Approximators

Why Trapezoidal and Triangular Membership Functions Work So Well: Towards a Theoretical Explanation

Application of the Fuzzy Weighted Average of Fuzzy Numbers in Decision Making Models

Solution of Fuzzy System of Linear Equations with Polynomial Parametric Form

Fuzzy reliability analysis of washing unit in a paper plant using soft-computing based hybridized techniques

An Effective Chromosome Representation for Evolving Flexible Job Shop Schedules

THE GENERAL INTERFERENCE MODEL IN THE FUZZY RELIABILITY ANALYSIS OF-SYSTEMS

Towards Decision Making under General Uncertainty

Fuzzy Modal Like Approximation Operations Based on Residuated Lattices

AS real numbers have an associated arithmetic and mathematical

Design of Decentralized Fuzzy Controllers for Quadruple tank Process

Failure Mode Screening Using Fuzzy Set Theory

Reducing Computation Time for the Analysis of Large Social Science Datasets

THE ANNALS OF "DUNAREA DE JOS" UNIVERSITY OF GALATI FASCICLE III, 2000 ISSN X ELECTROTECHNICS, ELECTRONICS, AUTOMATIC CONTROL, INFORMATICS

Applied Econometrics. Applied Econometrics. Applied Econometrics. Applied Econometrics. What is Autocorrelation. Applied Econometrics

Appendix A Wirtinger Calculus

Decomposition and Intersection of Two Fuzzy Numbers for Fuzzy Preference Relations

Multi-Criteria Optimization - an Important Foundation of Fuzzy System Design

Transcription:

A Linear Regression Model for Nonlinear Fuzzy Data Juan C. Figueroa-García and Jesus Rodriguez-Lopez Universidad Distrital Francisco José de Caldas, Bogotá - Colombia jcfigueroag@udistrital.edu.co, e.jesus.rodriguez.lopez@gmail.com Abstract. Fuzzy linear regression is an interesting tool for handling uncertain data samples as an alternative to a probabilistic approach. This paper sets forth uses a linear regression model for fuzzy variables; the model is optimized through convex methods. A fuzzy linear programming model has been designed to solve the problem with nonlinear fuzzy data by combining the fuzzy arithmetic theory with convex optimization methods. Two examples are solved through different approaches followed by a goodness of fit statistical analysis based on the measurement of the residuals of the model. 1 Introduction and Motivation The linear regression analysis called the Classical Linear Regression Model (CLRM) is important statistical tool to establish the relation between a set of independent variables and a dependent one is. A mathematical representation of the CLRM is: y j = β i x ij + ξ j j N m (1) Where y j is a dependent variable, x ij are the observed variables, β i is the weight of the i th independent variable and ξ j is the j th observation. i N n and j N m. As Bargiela et al. expressed in [1], the classical linear regression analysis is not able to find the assignment rule between a collection of variables when these are not numerical (Crisp) entities i.e. fuzzy numbers (See Zadeh in [9]) To address and solve this problem, Tanaka et al. [8] introduced the fuzzy linear regression (FLR) model. ỹ j = β i x ij + ξ j j N m (2) Where ỹ j is a fuzzy dependent variable, x ij are fuzzy observations, β i is the weight of the i th independent variable and ξ j is the j th observation. i N n and j N m. Fuzzy linear regression has the capability of dealing with linguistic variables through different methods such as the least squares method or the gradient-descent algorithm (See Bargiela in [1] and Gladysz in [2]). However, those methods are designed for Corresponding authors. D.-S. Huang et al. (Eds.): ICIC 2011, LNBI 6840, pp. 353 360, 2012. c Springer-Verlag Berlin Heidelberg 2012

354 J.C. Figueroa-García and J. Rodriguez-Lopez analyzing the most used sets, the symmetrical triangular fuzzy sets. However, the solution routines involve algorithms that require considerable amounts of resources e.g. software, computing machine and time. The present work presents a Linear Programming (LP) model capable of managing in a simple way, all type-1 fuzzy data. To decompose the information that each fuzzy data contains, we analyze several parameters that represent it. These parameters are called interesting values and are characterized by the following definition. Definition 1. Suppose A, B and C be fuzzy sets with membership functions A(x), B(x) and C(x) such that: A(x) = β i B i (x) (3) And suppose τ( ) a function which output is an interesting value of a given set ( ), hence for A(x) we have τ(a(x)) = β i τ(b i (x)) (4) Where β i is the weight of the fuzzy set B i (x), soτ(a(x)) is a linear combination of the weights of τ(b i (x)). It means that each parameter of A(X) can be expressed as a linear combination of β i and τ(b i (x)). Interesting values have an important attribute: interesting values from a fuzzy set that is a linear combination of a group of fuzzy sets, equal the same linear combination of the interesting values from the second fuzzy set. An LP model is designed based on the interesting values. Each constraint tries to set an equivalence between the interesting values of the sets Y and X considering slack or surplus; and the objective function is the minimization of their sum. The LP method is compared to other three proposals. The reason for compairing the four models is to evaluate their efficiency and efficacy. In addition to the comparisons we also discuss a case study, including a statistical analysis of its residuals. 2 A Linear Programming Fuzzy Regression Model 2.1 The Independent Variables In the classical linear regression model (CLRM) each observation corresponds to a single crisp value which measures a variable; these values, however, cannot encapsulate all information about the variable itself. These variables can bring noise or imprecisions in its measurement, moreover, the measurement process might not be accurate. These imprecise measures can be represented by fuzzy sets, therefore a regression model should deal with the imprecision involving fuzzy sets. An L-R fuzzy set is composed by the spread, position and shape defined by a central value, by the lower and the upper distance, and by the lower and the upper area.

A Linear Regression Model for Nonlinear Fuzzy Data 355 Let A denotes an L-R fuzzy set, and A(x) its membership function. According to fuzzy number properties (See Klir in [5] and [4]), we have that the central value (v c ) of A has 1 as the membership value, this means that A(v c )=1. In addition, the lower value (v l ) and upper value (v u ) of A are respectively the left and right boundaries of the support of A. Let d l denote the distance from the lower value to the central value of a fuzzy set, i.e. the lower distance of A, hence d l can be defined as d l = v c v l, and the upper distance can be defined as d u = v u v c.leta l denote the area between the lower and the central value of A,i.e.thelowerareaofA, thus a l can be numerically expressed as: a l = vc v l A(x)dx And analogously a u i.e. the upper area can be calculated as: a u = vu v c A(x)dx The central value of a fuzzy set is the value is the support element which α-cut equals to 1. Figure 1 shows the graphical representation of these values. 2.2 Fuzzy Arithmetic of Interesting Values The interesting values of a fuzzy number, resulting from operate several fuzzy numbers, can be calculated through the values of the interesting values of each one of the second fuzzy numbers. According to the Definition 1, Klir and Yuan in [5] and Klir and Folger in [4], we derive the following operations on fuzzy sets. Let B and C denote two L-R fuzzy sets, and B(x) and C(x) their membership functions respectively. Let also v ca,v cb and v cc indicate the central values for the fuzzy sets A, B and C respectively, and let n indicate any real number. If C = A + B then v cc = v ca + v cb.ifc = A B then v cc = v ca v cb.if C = na then v cc = n v ca. Fig. 1. Solution of the example as a function of α

356 J.C. Figueroa-García and J. Rodriguez-Lopez Then the central value of a fuzzy set is a linear combination of the central values of all fuzzy sets. Let Y and X i denote fuzzy numbers, and v Y and v cxi their respective central values, and let β i indicate a coefficient that is multiplying each X i fuzzy number. If Y = n β i X i then v cy = n β i v cxi On the other hand, conversely let d la, d lb and d lc indicate the lower distances for the fuzzy sets A, B and C respectively, and let d ua, d ub and d uc denote the upper distances for the fuzzy sets A, B and C respectively. If C = A + B then d lc = d la + d lb & d uc = d ua + d ub.ifc = A B then d lc = d la + d ub & d uc = d ua + d lb.ifn 0 & C = n A then d lc = n d la & d uc = n d ua.ifn<0& C = n A then d lc = n d ua & d uc = n d la On the other hand, let β i + and βi denote the possible values for β i such that: { β i + if β i 0 β i = (5) β i if β i < 0 Let d ly and d uy denote the lower and upper distance for the fuzzy number Y,andd lxi and d uxi denote the lower and upper distance for the fuzzy number X i. Hence d ly = β + i d li β i d uxi d uy = β + i d ux i β i d lx i Finally, let a la, a lb and a lc be the lower areas for the fuzzy sets A, B and C respectively, and let a ua, a ub and a uc denote the upper areas for the fuzzy sets A, B and C respectively. If C = A + B then a lc = a la + a lb and a uc = a ua + a ub.ifc = A B then a lc = a la + a ub and a uc = a ua + a lb.ifn 0 and C = n A then a lc = n a la and d uc = n d ua.ifn<0and C = n A then a lc = n a ua and a uc = n a la Let a ly and a uy denote the lower and the upper distance for the fuzzy number Y, and a lxi and a uxi denote the lower and the upper distance for the fuzzy X i. Thus, a ly = β i + a lx i βi a ux i a uy = β + i a ux i β i a lx i 2.3 Linear Programming Fuzzy Regression Model Based on the above results, we need two sets of variables; the first one for slack s( ) and the other one for surplus f( ). These variables are added to each constraint for each j observation. This allows the β i coefficients to make each equation fits, where Y j is the dependent variable and X ij are the explanatory variables. Finally, the objective function is the minimization of both the sum of the slack and the surplus variables. Formally,

A Linear Regression Model for Nonlinear Fuzzy Data 357 min z = s vcj + f vcj + s dlj + f dlj + s duj + f duj + s alj + f alj + s alj + f alj j=1 d lyj = d uyj = a ly j = a uy j = v cyj = s.t. β i v cxij + s vcj f vcj β + i d lij β + i d uxij β + i a l i β + i a uxi β i d ux ij + s dlj f dlj j N m j N m β i d lxij + s duj f duj j N m (6) β i a uxi + s alj f alj β i a lxi + s alj f alj j N m j N m The first constraint refers to the central value. The second and third constraints are focused in the estimation of the lower and upper distances, and finally the fourth and fifth constraints bound the lower and the upper area values. The presented model in (6) is defined for Type-1 L-R fuzzy numbers where its main goal is to get a regression model oriented to fit a set of fuzzy dependent variables Y j through a set of independent fuzzy variables X ij. The model focuses in getting an approximation of the complete membership function of Y j,y j (x) represented by their parameters and its area decomposed into a lower and an upper areas through each constraint of the model presented in (6). 3 Validation of the Model - A Comparison Case To measure its effectiveness the model is compared to the models proposed by Kao, Tanaka and Bargiela (See[3]). The problem consists of a single variable regression analysis. The input values (vl, vc, vu) characterize symmetrical triangular fuzzy sets, so we need less constraints since the area and distances are linear functions of their shapes. Table 1. Results for comparison case Proposal Bargiela Tanaka Kao Present proposal β 0 3,4467 3,201 3,565 2,6154 β 1 0,536 0,579 0,522 0,6923 Central value error 0.64627579 0.692704031 0.643133875 1.131409359 Distance error 0.094192 0.077542938 0.09996175 0.041422189 Total error (e) 0.860504381 0.877637151 0.862029944 1.082973475

358 J.C. Figueroa-García and J. Rodriguez-Lopez After computing the interesting values of the variables and applying the LP model (6) for eight observations, the obtained β s, the error values obtained and the RSMEbased error e = 1/8 ( 8 ) ( j=1 (v 8 ) cj vcj )2 +1/8 j=1 (d j d j )2 are shown in Table 1 where vcj and d j are estimations of v cj and d j respectively. Although the error of vcj obtained by the LP model is the highest, the error of d j is the lowest, which leads to less area errors. Moreover, its efficiency is improved since the structure of an LP model is even simpler than Tanaka s proposal (See [8]). 3.1 Shipping Company Case Study In this case, a shipping company wants to identify the role of several factors in the profit incoming. The factors considered are: price of service, shipping time, package weight and the return time of the service vehicle. The linguistic label for each X j is its Expected value. Each X j is defined by the average of the observations, and each j constraint uses the i, j observation as v cj,sothe membership function for each X j is defined as follows. ( ) 2 1.05 x 1 for 0.77 <x 1.05 Price - X 1(x) = 0.28 (7) 0 otherwise ( ) 2 2.331 x for 2.331 <x 3 ( 0.669 ) 4 Shipping time - X 2(x) = 3 x 1 for 3 <x 3.669 0.669 0 otherwise ( ) 2 x 15.07 for 5.65 <x 15.07 Weight - X 3(x) = 9.42 0 otherwise ( ) 4 x 6.606 for 2.22 <x 6.066 Return time - X 4(x) = 4.386 0 otherwise ( ) 2 7.38 x 1 for 5.115 <x 7.38 ( 2.265 ) 2 Profit - Y (x) = x 11.793 for 7.38 <x 11.793 4.413 0 otherwise (8) (9) (10) (11) The model was applied to 49 observations and 7 X j, with the following results: Y =0.6216 X 1 +7.9743 X 2 +0.4685 X 3 0.0073 X 4 5.1762 (12) Figure 2 shows the comparison between the estimated dependent variables (gray area) and expected dependent variables (black line) using the average of Y j and Ŷj as central values. At a first glance, Figure 2 shows that the LP model reaches a good approximation of the original Y j (x), but for decision making, the selected deffuzzification method

A Linear Regression Model for Nonlinear Fuzzy Data 359 Fig. 2. Graphical comparison of the results of the shipping company case study is the centroid since it can be obtained by a linear combination of the position, spread and area of the fuzzy sets X i, viewed as a fuzzy relational equation. Analysis of the Results. The obtained β s by the regression are used to obtain the estimated centroids, which yields into the following error measures: RMSE=2.29 and MSE=5.19 computed through ξ j (See Equations (1) and (2)). The determination coefficient obtained is R 2 =72.46, so the 72.476% of the behavior of the dependent variable is explained by the independent variables obtained by the application of the model. In addtion, some desirable properties of the residuals are tested as shown below Absence of Autocorrelation in the Residuals. Based on a 95% level of significance, the results of the autocorrelation analysis are shown in Table 2. Based on these results, there is no autocorrelation effect, therefore the residuals are randomly distributed. Table 2. Ljung-Box autocorrelations test on the residuals Lag 1 2 3 4 Autocorrelation -0,138-0,295-0,12 0,094 Ljung-Box statistic 0,99 5,624 6,409 6,903 Significance 0,32 0,06 0,093 0,141 Normal Distribution in the Residuals. For a 95% confidence level, the Kolmogorov- Smirnov test reaches a p-value of 0.150 and the Shapiro-Wilks test reaches a p-value of 0,121, so we can conclude that the residuals are normally distributed. Zero Mean Residuals. A One Sample Test was performed to test if ξ = μ =0.A difference test based on a normal distributed asymptotic behavior of the mean of the residuals ξ = 0, 320, with an obtained significance of 0,333. We can conclude that there is no statistical evidence that supports that ξ has no zero mean. Homoscedasticity of the Residuals. By dividing the residuals in three balanced groups and applying the F-test for variances between each pair, with a 95% confidence level, it is concluded that there is homoscedasticity in the residuals. (See Table 3).

360 J.C. Figueroa-García and J. Rodriguez-Lopez Table 3. Homocedasticity test Group 1-2 1-3 2-3 FSample 1,034 1,845 1,783 F Statistic 2,403 2,352 2,352 Significance 0,473 0,117 0,131 Main Conclusions. The LP model had good results since it reached normally independent, zero mean and homocedastic residuals. Thus, the β s and the regression analysis is valid. The β sshowsthatx 1 and X 3 (See (7) and (9)) have the highest contribution to the profit, so is recommended to review the pricing policy of the company. 4 Concluding Remarks The LP model presented in this paper focuses on the minimization of the errors between a linear combination of X j that estimates Y. The method used can deal with nonlinear fuzzy data, therefore our proposal is an alternative formulation for fuzzy regression. Some real problems involve uncertainty that can be treated as fuzzy sets, therefore the LP model presented in this paper is more efficient than other proposals because it can be handled through mixed fuzzy-convex optimization methods. Finally, we recommend the use Type-2 fuzzy sets as uncertainty measures to deal with the perception about a linguistic variable of a fuzzy set held by multiple experts. For further information see Melgarejo in [6], and Mendel in [7]. Acknowledgments. The authors would like to thank Jesica Rodriguez-Lopez for her invaluable support. References 1. Bargiela, A., et al.: Multiple regression with fuzzy data. Fuzzy Sets and Systems 158(4), 2169 2188 (2007) 2. Gladysz, B., Kuchta, D.: Least squares method for L-R fuzzy variable. In: 8th International Workshop on Fuzzy logic and Applications, vol. 8, pp. 36 43. IEEE, Los Alamitos (2009) 3. Kao, C., Chyu, C.: Least-Squares estimates in fuzzy regression analysis. European Journal of Operational Research 148(2), 426 435 (2003) 4. Klir, G.J., Folger, T.A.: Fuzzy Sets, Uncertainty and Information. Prentice Hall, Englewood Cliffs (1992) 5. Klir, G.J., Yuan, B.: Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall, Englewood Cliffs (1995) 6. Melgarejo, M.A.: Implementing Interval Type-2 Fuzzy processors. IEEE Computational Intelligence Magazine 2(1), 63 71 (2007) 7. Mendel, J.: Uncertain Rule-Based Fuzzy Logic Systems: Introduction and New Directions. Prentice Hall, Englewood Cliffs (1994) 8. Tanaka, H., et al.: Linear Regression analysis with Fuzzy Model. IEEE Transactions on Systems, Man and Cybernetics 12(4), 903 907 (1982) 9. Zadeh, L.A.: Toward a generalized theory of uncertainty (GTU) an outline. Information Sciences 172(1), 1 40 (2005)