ADAPTIVE NEURO-FUZZY INFERENCE SYSTEMS RBFN and TS systems Equivalent if the following hold: Both RBFN and TS use same aggregation method for output (weighted sum or weighted average) Number of basis functions in RBFN equals number of rules in TS TS uses Gaussian membership functions with same as basis functions and rule firing is determined by multiplication RBFN response function (c i ) and TS rule consequents are equal 35
ANFIS Adaptive Neuro-Fuzzy Inference Systems (ANFIS) Takagi-Sugeno fuzzy system mapped onto a neural network structure. Different representations are possible, but one with 5 layers is the most common. Network nodes in different layers have different structures. 352 ANFIS Consider a first-order Sugeno fuzzy model, with two inputs, x and y, and one output, z. Rule set Rule : If x is A and y is B, then f = p x + q y + r Rule 2: If x is A 2 and y is B 2, then f 2 = p 2 x + q 2 y + r 2 A A X 2 B 2 X x y B Y Y w w 2 f = p x + q y + r f 2 = p 2 x + q 2 y + r 2 Weighted fuzzy-mean: wf wf 2 2 f w w wf 2 wf 2 2 353 2
ANFIS architecture Corresponding equivalent ANFIS architecture: 354 ANFIS layers Layer : every node is an adaptive node with node function: O, ( x ) i i i Parameters in this layer are called premise parameters. Layer 2: every node is fixed whose output (representing firing strength) is the product of the inputs: O w 2,i i j j Layer 3: every node is fixed (normalization): wi O3, i wi w j j 355 3
ANFIS layers Layer 4: every node is adaptive (consequent parameters) : O O f w( p px... px) 4, i 3, i i i n n Layer 5: single node, sums up inputs: wf i i i O5, i wf i i w i Adaptive network is functionally equivalent to a Sugeno fuzzy model! i i 356 ANFIS with multiple rules 357 4
Hybrid learning for ANFIS Consider the two rules ANFIS with two inputs x and y and one output z; Let the premise parameters be fixed; ANFIS output is given by linear combination of consequent parameters p, q and r: w w z f f w w w w 2 2 2 2 w ( p xq yr) w ( p xq yr ) 2 2 2 2 ( w x) p ( w y) q ( w ) r ( w x) p ( w y) q ( w ) r A 2 2 2 2 2 2 358 Hybrid learning for ANFIS Partition total parameters set S as: S : set of premise (nonlinear) parameters S 2 : set of consequent (linear) parameters q: unknown vector which elements are parameters in S 2 z = Aq: standard linear least-squares problem Best solution for q that minimizes Aq z 2 is the least-squares estimator q * : q * = (A T A) - A T z 359 5
Hybrid learning for ANFIS What if premise parameters are not optimal? Combine steepest descent and least-squares estimator to update parameters in adaptive network. Each epoch is composed of:. Forward pass: node outputs go forward until Layer 4 and consequent parameters are identified by leastsquares estimator; 2. Backward pass: error signals propagate backward and the premise parameters are updated by gradient descent. 36 Hybrid learning for ANFIS Error signals: derivative of error measure with respect to each node output. Forward pass Backward pass Premise parameters Fixed Gradient descent Consequent parameters Least-squares estimator Fixed Signals Node outputs Error signals Hybrid approach converges much faster by reducing the search space of pure backpropagation method. 36 6
Stone-Weierstrass theorem Let D be a compact space of N dimensions and let be a set of continuous real-valued functions on D satisfying:. Identity function: the constant f (x) = is in. 2. Separability: for any two points x x 2 in D, there is an f in such that f(x ) f(x 2 ). 3. Algebraic closure: if f and g are two functions in, then fg and af+bg are also in for any reals a and b. Then, is dense in the closure C(D) of D, i.e.: "e>, " gc(d), $ f : g(x) f(x) < e, " xd. 362 Universal approximator ANFIS According to Stone-Weierstrass theorem, an ANFIS has unlimited approximation power for matching any continuous nonlinear function arbitrarily well Identity: obtained by having a constant consequent Separability: obtained by selecting different parameters in the network 363 7
Algebraic closure Consider two systems with two rules and final outputs: Additive: z w f w f 2 and z w f w f w w w w2 2 2 2 2 wf wf 2 2 w f w2 f 2 azbz a b w w2 ww2 ww ( afbf) ww 2( afbf2) w2w( af2 bf) w2w2( af2 bf2) ww ww2 wwww2 2 2 Construct 4 rule inference system that computes: az bz 364 Algebraic closure Multiplicative: wf wf 2 2 w f w2 f 2 zz ww2 w w2 wwf f ww 2f f wwf 2 2 f ww 2 2f2 f w w w w2 w ww w2 2 2 2 2 Construct 4 rule inference system that computes: zz 365 8
Model building guidelines Select number of fuzzy sets per variable: empirically by examining data or trial and error using clustering techniques using regression trees (CART) Initially, distribute bell-shaped membership functions evenly: Using an adaptive step size can speed up training. 366 How to design ANFIS? Initialization Define number and type of inputs Define number and type of outputs Define number of rules and type of consequents Define objective function and stop conditions Collect data Normalize inputs Determine initial rules Initialize network TRAIN 367 9
Ex. : Two-input sinc function sin( x)sin( y) zsinc( xy, ) xy Input range: [-,][-,], 2 training data pairs. Multi-Layer Perceptron vs. ANFIS: MLP: 8 neurons in hidden layer, 73 parameters, quick propagation (best learning algorithm for backpropagation MLP). ANFIS: 6 rules, 4 membership functions per variable, 72 fitting parameters (48 linear, 24 nonlinear), hybrid learning rule. 368 MLP vs. ANFIS results Average of runs: MLP: different sets of initial random weights; ANFIS: step sizes between. and.. MLP s approximation power decrease due to: learning processes trapped in local minima or some neurons can be pushed into saturation during training. 369
ANFIS output Training data ANFIS Output.5.5 Y - - X Y - - X root mean squared error.2.5..5 error curve 5 5 2 25 epoch number step size.4.3.2. step size curve 5 5 2 25 epoch number 37 ANFIS model Degree of membership Degree of membership Initial MFs on X.5 - -5 5 input Final MFs on X.5 - -5 5 input Degree of membership Degree of membership Initial MFs on Y.5 - -5 5 input2 Final MFs on Y.5 - -5 5 input2 37
Ex. 2: 3-input nonlinear function.5.5 2 output x y z Two membership functions per variable, 8 rules Input ranges: [,6][,6][,6] 26 training data, 25 validation data root mean squared error.8.6.4.2 error curves training error checking error step size.2.5..5 step size curve 2 4 6 8 epoch number 2 4 6 8 epoch number 372 ANFIS model Degree of membership Degree of membership Initial MFs on X, Y and Z.5 2 3 4 5 6 input Final MFs on Y.5 2 3 4 5 6 input2 Degree of membership Degree of membership Final MFs on X.5 2 3 4 5 6 input Final MFs on Z.5 2 3 4 5 6 input3 373 2
Results comparison P Ti () Oi () APE Average Percentage Error.% P Ti () i Model Training error Checking error # Param. Training data size Checking data size ANFIS.43%.66% 5 26 25 GMDH model [] Fuzzy model [2] Fuzzy model 2 [2] 4.7% 5.7% - 2 2.5% 2.% 22 2 2.59% 3.4% 32 2 2 [] T. Kondo. Revised GMDH algorithm estimating degree of the complete polynomial. Trans. of the Society of Instrument and Control Engineers, 22(9):928:934, 986. [2] M. Sugeno and G. T. Kang, Structure Identification of fuzzy model. Fuzzy Sets and Systems, 28:5-33, 988. 374 Ex. 3: Modeling dynamic system Plant equation yk ( ).3 yk ( ).6 yk ( ) f( uk ( )) f(.) has the following form f( u).6sin( u).3sin(3 u).sin(5 u) Estimate nonlinear function F with ANFIS yk ˆ( ).3 yk ˆ( ).6 yk ˆ( ) Fuk ( ( )) Plant input: u( k) sin(2 k / 25) ANFIS parameters updated at each step (on-line) Learning rate: =.; forgetting factor: =.99 ANFIS can adapt even after the input changes Question: was the input signal rich enough? João M. C. Sousa 375 3
Plant and model outputs 376 Effect of number of MFs Initial MFs Final MFs 5 membership functions.8.6.4.2 - -.5.5.5 -.5 f(u) and ANFIS Outputs.8.6.4.2 - -.5.5.5 -.5 Each Rule's Outputs - - -.5.5 - - -.5.5 377 4
Effect of number of MFs Initial MFs Final MFs 4 membership functions.8.6.4.2 - -.5.5.5 -.5 f(u) and ANFIS Outputs.8.6.4.2 - -.5.5 - Each Rule's Outputs - - -.5.5 - -.5.5 378 Effect of number of MFs Initial MFs Final MFs 3 membership functions.8.6.4.2 - -.5.5.5 -.5 f(u) and ANFIS Outputs.8.6.4.2 - -.5.5.5 -.5 Each Rule's Outputs - - -.5.5 - - -.5.5 379 5
Ex. 4: Chaotic time series Consider a chaotic time series generated by Task: predict system output at some future instance t+p by using past outputs 5 training data, 5 validation data ANFIS input: [x(t 8), x(t 2), x(t 6), x(t)] ANFIS output: x(t + 6).2 xt ( ) xt (). xt () x ( t ) Two MFs per variable, 6 rules 4 parameters (24 premise, 8 consequent) Data generated from t =8 to t =7 38 ANFIS model Final MFs on Input, x(t - 8) Final MFs on Input 2, x(t - 2).8.8.6.6.4.4.2.2.6.8.2.6.8.2 Final MFs on Input 3, x(t - 6) Final MFs on Input 4, x(t).8.8.6.6.4.4.2.2.6.8.2.6.8.2 38 6
Model output 2.8 2.6 x -3 Error Curves Training Error Checking Error.5. Step Sizes 2.4 2.2.5 2 5 Desired and ANFIS Outputs. 5 x -3 Prediction Errors.2 5.8.6-5 2 4 6 8-2 4 6 8 382 3 rd order AR model 383 7
Order selection ( n) ( n) () y t y t y t y t u t () () () () () Select optimal order of AR model in order to prevent overfitting Select the order that minimizes the error on a test set 384 44 th order AR model 385 8
ANFIS output for P = 84 386 ANFIS extensions Different types of membership functions in layer Parameterized t-norms in layer 2 Interpretability constrained gradient descent optimization bounds on fuzziness E' Ewiln( wi) i parameterize to reflect constraints Structure identification N P 387 9