Learning from Examples Adriano Cruz, adriano@nce.ufrj.br PPGI-UFRJ September 20 Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 / 40 Summary Introduction 2 Learning from Examples Algorithm 3 Modified Learning from Examples Algorithm Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 2 / 40
Section Summary Introduction 2 Learning from Examples Algorithm 3 Modified Learning from Examples Algorithm Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 3 / 40 A precise model is a contradiction. Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 4 / 40
Bibliography Kevin M. Passino, Stephen Yurkovich Fuzzy Control Chapter 5. Addison Wesley Longman, Inc, USA, 998. Timothy J. Ross Fuzzy Logic with Engineering Applications. John Wiley and Sons, Inc, USA, 200. L. Wang, J. Mendel, Generating Fuzzy Rules by Learning from Examples, IEEE Transactions on Systems, Man and Cybernetics, vol 22, no.6, november 992 Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 5 / 40 Constructing Fuzzy Systems from Examples How to construct a fuzzy system from numeric data? Using data obtained experimentally from a system, it is possible to identify the model. Find a model that fits the data by using fuzzy interpolation capabilities. Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 6 / 40
Current Situation Solutions are heuristic in nature Combine standard control processing methods and expert systems. Weakpoints: Problem dependent; No commom framework. Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 7 / 40 Problems Linguistic rules may be incomplete due to loss when humans express their knowledge. Input-output data pairs may be a problem because past experience may not cover all situations. Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 8 / 40
Generating Rules Consider a two input-one output problem as an example. So data is available as input-output pairs: (x (),x() 2 ;y() ),(x (2),x(2) 2 ;y(2) )...(x (i),x(i) 2 ;y(i) )...; F(x,x 2 ) y Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 9 / 40 Section Summary Introduction 2 Learning from Examples Algorithm 3 Modified Learning from Examples Algorithm Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 0 / 40
What does LFE generate? Learning from Examples (LFE) algorithm generates rules. Membership functions are predefined. The original article uses a modified Mamdani system. This is not necessary. Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 / 40 Step - Divide i/o space Divide the input and output space into 2N + fuzzy regions. N can be different for different variables. Assign to each region a membership function. A simple solution is to use triangular functions. One vertex lies at the center of the region and the two others at the centers of the neighboring regions. Other divisions and functions area possible. Similar to creating fuzzy sets over a universe of discourse. Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 2 / 40
Step µ S2 S CE B B2 0.6 0.8 (2) x () x X µ S3 S2 S CE B B2.0 0.7 B3 () x 2 (2) x 2 X2 Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 3 / 40 Step µ S2 S CE B B2 0.9 0.7 y (2) y () Y Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 4 / 40
Step 2 - Generate fuzzy rules Generate fuzzy rules from given data pairs. First determine the degree of each given (x (i),x(i) 2 and y (i) ) in differente sets. Second assign a given (x (i),x(i) 2 and y (i) ) to the sets with maximum degree. Finally, obtain one rule for each pair of input data. Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 5 / 40 Step 2 First input pair: (x (),x() 2 ;y() ) Maximum degrees:,0.8) B, (x() 2,0.7) S; (y(),0.9) CE] Rule : If x is B and x 2 is S then y is CE. [(x () - - - - - - - - - - - - Second input pair: (x (2),x(2) 2 ;y(2) ) Maximum degrees:,0.6) B, (x(2) 2,.0) CE; (y(2),0.7) B] Rule 2: If x is B and x 2 is CE then y is B. [(x (2) Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 6 / 40
Step 3 - Assign degrees Assign a degree to each rule based on the membership degrees. Each degree is calculated as D(rule (i) ) = µ(x (i) ) µ(x(i) 2 ) µ(y(i) ). rule () = [(x (),0.8) B, (x() 2,0.7) S; (y(),0.9) CE] D(rule () ) = 0.8 0.7 0.9 = 04. - - - - - - - - - - - - - - - - - rule (2) = [(x (2),0.6) B, (x(2) 2,.0) CE; (y(2),0.7) B] D(rule (2) ) = 0.6.0 0.7 = 0.42. Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 7 / 40 Step 3 - Human information In practice we may have information about the data. Some pairs may have be judged by a expert to be important and others just measurement errors. It is possible to assign a degree to each data pair that represent its usefulness. Therefore the data pair (x (),x() 2 ;y() ) will have a degree D(rule () ) = µ(x () ) µ(x() 2 ) (y() ) m () where m () is the degree given to the data pair by a human expert. Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 8 / 40
Step 4 - Combine fuzzy rules It is highly probable that there will be conflicting rules. Rules that have the same antecedents but different consequents. If there is more than one rule in one box of the fuzzy rule base, use the rule which has the highest degree. Finally, obtain one rule in each box. Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 9 / 40 Step 5 - Determine the Mapping Determine the mapping based on the combined fuzzy rule database. The paper prescribes the use o product for set intersection, then the degree of truth of rule i is µ i O = µ I i (x ) µ I i 2 (x 2 ). O i denotes the output region of the rule i. I i j denotes the input region of Rule i for the j component. Rule gives µ CE = µ B(x ) µ S (x 2 ). It also suggests the use of a different defuzzification method: K i= µi Oȳi y = K i= µi O where ȳ i represents the center value of the region O i. The smallest abs value among all values with membership equals to. K is the number of rules combined. It is not necessary to adopt these last prescriptions. Usual fuzzy Mamdani system is also an option. Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 20 / 40
Section Summary Introduction 2 Learning from Examples Algorithm 3 Modified Learning from Examples Algorithm Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 2 / 40 Differences The MLFE calculates both membership functions and rules. We will use delta functions for the outputs and Gaussian functions for the inputs. b i will be the output for the ith rule. c i j is the point in the jth input where the membership function for the ith rule achieves a maximum. σ i j is the spread of the membership function for the jth input and ith rule. θ is a vector composed of these parameters. Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 22 / 40
Gaussian Membership functions Examples will use Gaussian membership functions. [ µ(x) = exp ( ) ] xi c 2 i 2 σ i.2 0.8 µ(x i ) 0.6 σ i 0.4 0.2 c i 0 0 2 4 6 8 0 x i Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 23 / 40 Other definitions Typical rule: if premise and premise 2 then consequence. Center-average defuzzyfication. Product t-norm for premises and implication. f(x θ) = R i= b i R i= n j= exp [ n j= exp [ 2 2 i refers to rule number, j to membership function. ( ) ] 2 x j cj i σj i ( ) ] 2 x j cj i σj i R is the number of rules and n is the number of inputs. b i,i =,2,...,R are the centres of output membership functions. Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 24 / 40
Example Lets use the following data set as an example: Training Data Set x x 2 y 0 2 2 4 5 3 6 6 Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 25 / 40 Setting up Let us specify an initial system. We will use number of rules R =. For c,c 2 and b we use the first training data set x = 0,x 2 = 2 and y =, respectively. σ are assumed to be equal to. Therefore:. c = 0,c 2 = 2,b =,σ = σ 2 = The first rule is established with antecedent functions centred at c = 0,c 2 = 2 and consequent equals to b =. Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 26 / 40
First Rule µ(x) µ(x2) 0 2 0 2 3 4 5 x 0 0 2 3 4 5 6 7 8 9 x2 µ(b) 0 0 2 3 4 5 6 7 b Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 27 / 40 Setting up 2 The approximation tolerance will be ǫ f = 0.25. ǫ f is the accuracy with which the fuzzy system f will approximate the function g. We will also define a Weighting factor W which is used to calculate the spreads of the membership functions, or the amount of overlap between them. We will start with σ = and W = 2. Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 28 / 40
Using the second data-tuple (x 2,y 2 ) = x 2 = 2,x2 2 = 4,y2 = 5. Compare y 2 with the value given by the existing system which is composed of one rule. f(x θ) = [ exp [ exp 2 ( 2 0 2 ( 2 0 ) ] [ 2 exp ) ] [ 2 exp 2 ( 4 2 2 ( 4 2 ) 2 ] ) 2 ] =. Compare the value with the real value: f(x 2 θ) y 2 = 5 = 4. This error is greater than the tolerance ǫ f = 0.25, therefore, we add a new rule based on (x 2,y 2 ). Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 29 / 40 Observation Modify the widths σ i j for rule i = R (i = 2 for this example) to adjust the spacing between membership functions so that: The new rule does not distort what has already been learned. 2 There is smooth interpolation between training points. Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 30 / 40
The Spreads The next step is to calculate the spreads of the new membership functions. The MLFE does that through a series of steps. Calculate the distance between the new center and the existing ones from the same input. Find the closest center and use this to establish the spread of the new function. Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 3 / 40 The Spreads I Find the closest center and use this to establish the spread of the new function. The closest centers are: c 2 c = 2 0 = 2 and c2 2 c 2 = 4 2 = 2 Then Therefore σ i j = W ci j c min j σ 2 = W c2 c = 2 2 0 = σ 2 2 = W c2 2 c 2 = 2 4 2 = Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 32 / 40
So far? We now have a rule base consisting of two rules. θ = [b,b 2,c,c 2,c2,c2 2,σ,σ 2,σ2,σ2 2,]T Where: Rule Rule 2 b = b 2 = 5 c = 0 c2 = 2 c2 = 2 c2 2 = 4 σ = σ2 = σ 2 = σ2 2 = Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 33 / 40 Second Rule µ(x) µ(x2) 0 2 0 2 3 4 5 x 0 0 2 3 4 5 6 7 8 9 x2 µ(b) 0 0 2 3 4 5 6 7 b Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 34 / 40
Using the third data-tuple f(x 3 θ)= [ exp [ exp 2( 3 0 ) 2] [ exp [ 2( 3 0 ) 2] exp 2( 6 2 ) 2] [ +5 exp [ 2( 6 2 ) 2] +exp f(x 3 θ) 5.0 2( 3 2 ) 2] [ exp [ 2( 3 2 ) 2] exp 2( 6 4 ) 2] 2( 6 4 ) 2] Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 35 / 40 How accurate? Compare the value with the real value: f(x 3 θ) y 3 = 5 6 =. This error is greater than the tolerance ǫ f = 0.25, therefore, we add a new rule based on (x 3,y 3 ). b 3 = 6,c 3 = 3,c3 2 = 6. Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 36 / 40
Spreads for the third rule. Form the vector of distances. For rule 3 and input the vector is {3,}. i = 3,j = then { c 3 c, c3 c2 } For rule 3 and input 2 the vector is {4,2}. i = 3,j = 2 then { c 3 2 c 2, c3 2 c2 2 }. σ 3 = W c3 c 2 σ 3 = 2 3 2 σ 3 = σ2 3 = W c3 2 c2 2 σ2 3 = 2 6 4 σ2 3 = Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 37 / 40 Resulting rule base We now have a rule base consisting of three rules. θ = [b,b 2,b 3,c,c 2,c2,c2 2,c3,c3 2,σ,σ 2,σ2,σ2 2,σ3,σ3 2,]T Where: Rule Rule 2 Rule 3 b = b 2 = 5 b 3 = 6 c = 0 c2 = 2 c3 = 3 c2 = 2 c2 2 = 4 c3 2 = 6 σ = σ2 = σ3 = σ 2 = σ2 2 = σ3 2 = Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 38 / 40
Third Rule µ(x) µ(x2) 0 2 0 2 3 4 5 x 0 0 2 3 4 5 6 7 8 9 x2 µ(b) 0 0 2 3 4 5 6 7 b Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 39 / 40 The End Adriano Cruz, adriano@nce.ufrj.br (PPGI-UFRJ) Learning from Examples September 20 40 / 40