Fuzzy Systems (2/2) Francesco Masulli

Size: px

Start display at page:

Download "Fuzzy Systems (2/2) Francesco Masulli"

Polly Alexia Parrish
5 years ago
Views:

1 Learnng n FBF/ANFIS Networks Fuzzy Systems (/) Francesco Masull DIBRIS - Unversty of Genova, ITALY & S.H.R.O. - Sbarro Insttute for Cancer Research and Molecular Medcne Temple Unversty, Phladelpha, PA, USA emal: francesco.masull@unge.t MLCI 018

2 Outlne Learnng n FBF/ANFIS Networks 1 Learnng n FBF/ANFIS Networks 3

3 Learnng n FBF/ANFIS Networks Fuzzy Bass Functon (FBF) Network - Fuzzy System Mamdan Mendel & Wang (Wang, 199): Mamdan RULES: Heght defuzzfer IF x j 1 s A 1... AND x j n s A n THEN y l s B j FUZZY AND and mplcaton: product µ A j1 Aj n Bj (x 1,..., x n, y) = n µ j A (x ) µ B j (y) Sngleton fuzzfer: µ j A (x ) = { 1 f x = x 0 otherwse (x measured crsp value)

4 Learnng n FBF/ANFIS Networks Fuzzy Bass Functon (FBF) Network - Fuzzy System Mamdan Concluson: fact rule { } µ B j (y) = max µ j (x 1,...,x n) A (x 1,..., x n) µ n A (x 1,..., x j1 Aj n n, y) Bj [ = 1 Aj max (x 1,...,x n) { n µ A j max for x = x µ B j (y) = n }] (x ) µ j A (x ) µ B j (y) µ j A (x ) µ B j (y)

5 Learnng n FBF/ANFIS Networks Fuzzy Bass Functon (FBF) Network - Fuzzy System Mamdan If µ B j (y) normal and convex (fuzzy number) µ B j (ȳ) = 1 then and µ B j (ȳ) = n y h = µ j A (x ) µ B j (ȳ) = n J j=1 ȳ j n µ j A (x ) J n j=1 µ j A (x ) µ j A (x )

6 Learnng n FBF/ANFIS Networks Fuzzy Bass Functon (FBF) Network - Fuzzy System Mamdan New notaton: then where y h y x x y = f (x) = φ j (x) = J ȳ j φ j (x), j=1 I µ j A (x ) I µ j A (x ) J j=1 Fuzzy Bass Functons (FBF) (Mendel, 1895)

7 Learnng n FBF/ANFIS Networks Fuzzy Bass Functon (FBF) Network - Fuzzy System Mamdan The premse of a rule IF (x 1 s F k 1 and x n s F k n ) perfoms a fuzzy segmentaton of the rule nput space (unverse of the dscourse) Rule actvacton r j : r j = Normalzed actvaton of rule R j φ j (x) = I µ j A (x ) I µ j A (x ) I µ j A (x ) J j=1

8 Learnng n FBF/ANFIS Networks Fuzzy Bass Functon (FBF) Network - Fuzzy System Mamdan R j : IF x1j s Aj1 AND... xnj IS Akn THEN y 1j IS B1j AND... yi IS BIj FBF network (Mendel & Wang, 199) - Adaptve Fuzzy System - ASF (jou, 199; Wang, 1994) Mult Input - Mult Output (MIMO) fuzzy system Francesco Masull Fuzzy Systems (/)

9 Learnng n FBF/ANFIS Networks Fuzzy Bass Functon (FBF) Network - Fuzzy System Sugeno fuzzy nference system - Order 0 polynomal (constant) R j : IF x j 1 s A 1 and... x j n s A n THEN f j (x1,... x n) = b j (constant) If y = sngleton fuzzfer J j=1 w j f j (x) J j=1 w j = product t-norm (FUZZY AND) J f j (x)φ j (x) = j=1 J b j φ j (x) j=1 φ j (x) = Fuzzy Bass Functons -FBF I =1 µ A j (x ) I =1 µ A j (x ) J j=1 Adaptve Neuro-Fuzzy Inference System or Adaptve Network-based Fuzzy Inference System - ANFIS (Jang, 1993)

10 Learnng n FBF/ANFIS Networks Fuzzy Bass Functon (FBF) Network - Fuzzy System Notaton: s j = ȳ j (Mamdam ASF) s j = b j (Sugeno ANFIS) Notaton: µ j (x ) = µ j A (x ) s the membershp value of component x of the j-th rule. The shape of the membershp µ j (x ) s arbtrary. If we take µ j (x ) Gaussan ( µ j (x ) = exp (x m j ) σ j ), m j means; σ j varances the FBF/ANFIS network wll have smple learnng rules and wll hold the Unversal Functon Approxmaton property.

11 Learnng n FBF/ANFIS Networks Fuzzy Bass Functon (FBF) Network - Fuzzy System FBF networks can be consdered a connectonst system wth an hdden layer, unts correspondng to fuzzy rules P yl = Q j sj k µj (x ) P Q j k µj (x ) Parameters to be learned: sj mj means of Gaussans σj varances of Gaussans Approaches to parameter learnng: Evolutonary Algorthms Gradent Descent... Francesco Masull Fuzzy Systems (/)

12 Learnng n FBF/ANFIS Networks Ret FBF Gradent Descent - Delta rule We should fnd the values of parameters s j, m j, σ j that mnmse the cost functon Delta rule for s j : E = 1 [t l y l ] E s j = η s s j = η s (t l y l )( 1) y l l = η s (t l y l ) l s j [ l s j j s lj j k µ ] j(x ) k µ j(x ) k = η s (t l y l )δ µ j(x ) l k µ j(x ) dove δ l = l j { 1 se l = 0 altrment

13 Ret FBF Delta rule Learnng n FBF/ANFIS Networks Then s j = η s [t y ] k µ j(x ) k µ j(x ) = η s[t y ]φ j j where φ j def = k µ j(x ) k µ j(x ) j and y = j s j φ j

14 Ret FBF Delta rule Learnng n FBF/ANFIS Networks Delta Rule for m j : E m jk = η m m jk = η m (t l y l )( 1) y l l m jk = η m (t l y l ) ( r s lr ψ r ) l m jk = η m (t l y l ) ( r s lr ψ r ) ψ t ψ t m jk l t = η m (t l y l ) ( r s lr ψ r ) [ α µtα(xα) ψ t m jk l,t t α µtα(xα) ] = η m (t l y l ) ( r s lr ψ r ) α [ µtα(xα) µpq ] ψ t µ pq µtα(xα) l,t,p,q t α m jk

15 Ret FBF Delta rule Learnng n FBF/ANFIS Networks As µ pq(x q) = exp( xq mpq ), σpq µ pq = δ pj δ qk µ j (x k )( 1) (x k m j ) ( 1) m j = δ pj δ qk µ j (x k ) x k m j σ jk σ jk Then: m j = η m (t l y l ) ( r s lr ψ r ) [ α µtα(xα) ψ t µ pq l,t,p,q t = η m (t l y l ) ( r s lr ψ r ) α [ µtα(xα) ψ t µ j l,t t α µtα(xα) ]δ pjδ qk µ j (x k ) x k m j α µtα(xα) ]µ j(x k ) x k m j σ jk σ jk

16 Ret FBF Delta rule Learnng n FBF/ANFIS Networks Moreover, [ α µtα(xα) µ j t α µtα(xα) ] = 1 t α µtα(xα) [ µ tα(x α)] + µ j α + [ 1 µ tα(x α)] [ µ j α t α µtα(xα) ] 1 = j α µ jα(x δ jt [ µ j (x α)] + α) α k [ α k µ µ jα(x α) tα(x α)] ( α j α µ jα(x α)) = δ jt α k µ j(x α) j α µ jα(x [ α µtα(xα)] [ α k µ jα(x α) α) [ j α µ jα(x α)]

17 Ret FBF Delta rule Learnng n FBF/ANFIS Networks Then m j = η m (t l y l ) ( r s lr ψ r ) ψ t l,t l α k µ j(x α)µ j (x k ) x k m j j α µ jα(x α) σjk δ jt η m (t l y l ) ( r s lr ψ r ) [ α µtα(xα)] [ α k µ jα(x α)]µ j (x k ) ψ t [ l,t j α µ jα(x α)] = η m (t l y l )s α µ jα(x α) x lj k m j l j α µ jα(x α) σjk t η m (t l y l ) s lt α µtα(xα) α µ jα(x α) x t α µtα(xα) k m j j α µ jα(x α) σjk x k m j σ jk

18 Ret FBF Delta rule Learnng n FBF/ANFIS Networks As we have: t s lt α µ tα(x α) t α µ tα(x α) = y l and l σ jk α µ jα(x α) j α µ jα(x α) = φ j x k m j x k m j m j = η m (t l y l )s lj φ j η m (t l y l )y l φ j x k m j = η m (t l y l )(s lj y l )φ j l σ jk l σ jk

19 Ret FBF Delta rule Learnng n FBF/ANFIS Networks Delta Rule for σ j. E σ j = η σ σ j = η σ (t l y l )( 1) y l l σ j = η σ (t l y l ) ( r s lr ψ r ) l σ j ( r s lr ψ r ) ψ t ψ t σ j = η σ (t l y l ) l t = η σ (t l y l ) ( r s lr ψ r ) [ ψ t σ j l,t t = η σ (t l y l ) ( r s lr ψ r ) [ ψ t µ pq l,t,p,q α µtα(xα) α µtα(xα) ] α µtα(xα) µpq ] µtα(xα) t α σ j

20 Ret FBF Delta rule Learnng n FBF/ANFIS Networks ( ) As µ pq(x q) = exp xq mpq σpq, σ j = by substtuton we have: η σ l,t,p,q (t l y l ) ( µ pq σ j = δ pj δ qk µ j (x k )( 1) (x k m j ) 4σ j 4σ 4 jk = δ pj δ qk µ j (x k ) (x k m j ) r s lr ψ r ) ψ t = η σ l,t (t l y l ) ( r s lr ψ r ) ψ t σ 3 jk [ α µ tα(x α) µ pq t α µ tα(x α) [ α µ tα(x α) µ j t α µ tα(x α) ] δ pj δ qk µ j (x k ) (x k m j ) σ 3 jk ] µ j (x k ) (x k m j ) σ 3 jk

21 Ret FBF Delta rule Learnng n FBF/ANFIS Networks σ j = η σ (t l y l ) ( r s lr ψ r ) ψ t l,t α k µ j(x α)µ j (x k ) (x k m j ) j α µ jα(x α) σjk 3 δ jt η σ (t l y l ) ( r s lr ψ r ) [ α µtα(xα)] [ α k µ jα(x α)]µ j (x k ) ψ t [ l,t j α µ jα(x α)] = η σ (t l y l )s α µ jα(x α) (x lj k m j ) l j α µ jα(x α) σjk 3 t η σ (t l y l ) s lt α µtα(xα) α µ jα(x α) (x l t α µtα(xα) k m j ) j α µ jα(x α) σjk 3 (x k m j ) (x k m j ) = η σ (t l y l )s lj φ j η σ (t l y l )y l φ j l σ 3 jk (x k m j ) = η σ (t l y l )(s lj y l )φ j l σ 3 jk l σ 3 jk (x k m j) σjk 3

22 Ret FBF Delta rule - n sntes Learnng n FBF/ANFIS Networks By resumng: where s j = η s [t y ]φ j m j = η m [t y ][s j y ]φ j [x m j ]/σj σ j = η σ [t y ][s j y ]φ j [x m j ] /σj 3 φ j = µ j(x ) µ j(x ) j η s, η m, η σ learnng rate of s j, m j, σ j.

23 Learnng n FBF/ANFIS Networks Fuzzy Bass Functon (FBF) Network - Fuzzy System Fuzzy logc system expressed by the followng form (FBF): [ ( ) ] J j=1 ȳ I j =1 aj exp (x m j ) σj f (x) = [ ( ) ] (1) J I j=1 =1 aj exp (x m j ) σ j where ȳ j s the pont where µ G j reaches ts maxmum value, whch s assumed to be untary. The adaptable parameters of the fuzzy logc system are ȳ j, a j, m j, σ j, wth the constrants ȳ j V, a j (0, 1) (usually we assume a j = 1), m j U e σ j > 0.

24 Learnng n FBF/ANFIS Networks Unversal Approxmaton Proprety The fuzzy logc system FBF-Netowrk can approxmate any type of non-lnear functon on a compact set U R n wth any degree of precson assgned (Wang, 1994). In ths regard, we show the followng results. Theorem (Unversal approxmaton theorem) Gven a real contnuous functon g on a compact set U R n and a value ɛ > 0 arbtrarly small,there exsts a fuzzy functon f n the form of Eq.1 such that sup x U f (x) g(x) < ɛ. Ths result can be extended to the case of dscrete functons.

25 Learnng n FBF/ANFIS Networks Unversal Approxmaton Theorem (Stone - Weerstrass) Let Z be a set of contnuous real functons on a compact set U. If the followng condtons apply, under whch 1 Z s an algebra,.e., Z s closed under the operatons of addton, multplcaton and scalar multplcaton; Z separates ponts n U, that s for every x and y U, x y, there exsts a functon f Z such that f (x) f (y); 3 Z does not vansh at any pont of U, that s for every x there exsts a functon e f Z such that f (x) 0; then the unform closure of Z s composed of all real functons contnuous on U,.e., (Z,d ) s dense n (C[U],d ).

26 Learnng n FBF/ANFIS Networks Unversal Approxmaton Let Y be the set of all fuzzy logc systems of the form of Eq.1. To use the theorem of Stone - Weerstrass theorem to prove the unversal approxmaton, t s necessary to show that Y s an algebra that separates ponts n Y U and Y does not vansh at any pont of U, usng three lemmas.

27 Learnng n FBF/ANFIS Networks Unversal Approxmaton Lemma A.1 (Y,d ) s an algebra. Letf 1 and f Y expressed as J1 j=1 (s K 1j k=1 µ A1 j (x k )) k f 1 (x) = J1 j=1 ( K k=1 µ A1 j (x k )) k Then f 1 (x) + f (x) = f (x) = J j=1 (s K j k=1 µ A j (x k )) k J j=1 ( K k=1 µ A j (x k )). k J1 J j 1 =1 j =1 (s 1j 1 + s j )( K J1 j 1 =1 k=1 µ A1 j 1 (x k )µ j k A 1k (x k )). J j =1 ( K k=1 µ A1 j 1 k (x k )µ A j k (x k )) Snce µ j A1 e µ j k A are Gaussan functons, ther product µ j k A1 µ j k A s stll k a Gaussan functon and therefore f 1 + f Y has the form of Eq.1.

28 Learnng n FBF/ANFIS Networks Unversal Approxmaton Smlarly f 1 (x)f (x) = J1 J j 1 =1 j =1 (s 1j 1 s j )( K J1 j 1 =1 has the form of Eq.1 such that f 1 f Y. k=1 µ A1 j 1 (x k )µ j k A 1k (x k )) J j =1 ( K k=1 µ A1 j 1 (x k )µ j k A k (x k )) Fnally, for any c R J1 j=1 (cs 1j)( K k=1 µ A1 j (x k )) k cf 1 (x) = J1 j=1 ( K k=1 µ. A1 j (x k )) k stll has the form of Eq.1 and therefore cf 1 Y.

29 Learnng n FBF/ANFIS Networks Unversal Approxmaton Lemma A. (Y,d ) separates ponts n U. We buld now the f requested: We specfy the number of fuzzy sets defned n U and R (respectvely unverses of nput and output), the parameters of the membershp functon Gaussan, the number of fuzzy rules used, so that the resultng functon f, n the form of Eq.1, has the property that f (x 0 ) f (y 0 ) per qualsas x 0 y 0. Let x 0 = (x1 0,..., xk 0 ) and y 0 = (y1 0,..., yk 0 ), f x 0 y 0 we can defne two fuzzy sets (A 1, µ A 1) and (A, µ A ) n the th sub- space of U, where e [ µ A 1(x ) = exp [ µ A 1(y ) = exp (x x 0 ) (y y 0 ) ] ].

30 Learnng n FBF/ANFIS Networks Unversal Approxmaton If x 0 = y 0 we obtan A 1 = A and µ A 1 = µ A. Therefore, a sngle fuzzy set s defned n the th sub- space of U. We defne now two fuzzy sets (B 1, µ B 1) ed (B, µ B ) n the unverse output R, where [ µ B j (z) = exp wth j=1, whle s j s defned below. (z s j ) Let we choose two fuzzy rules, whch s J=. Snce we have specfed all the parameters of the structure, except s j, we have defned a functon f wth form as n Eq.1 wth M=. ],

31 Learnng n FBF/ANFIS Networks Unversal Approxmaton Usng ths functon f we obtan [ s 1 + s K f (x 0 k=1 exp ) = 1 + [ K k=1 exp (x 0 y 0 ) (x 0 y 0 ) ] ] = αs 1 + (1 α)s and f (y 0 ) = s + s 1 K k=1 exp [ 1 + K k=1 exp [ (x 0 y 0 ) (x 0 y 0 ) ] ] = αs + (1 α)s 1, where α = [ K k=1 exp (x 0 y 0 ) ].

32 Learnng n FBF/ANFIS Networks Unversal Approxmaton Snce x 0 y 0, there must be some such that x 0 y 0. In concluson, we otan [ ] K k=1 exp (x 0 y 0 ) 1 that α 1 α and choosng s 1 =0 and s =1, then xf (x 0 ) = 1 α α = f (y 0 ).

33 Learnng n FBF/ANFIS Networks Unversal Approxmaton Lemma A.3 (Y,d ) does not vansh at any pont n U. Let we consder the formula Eq.1, smply requrng all s j > 0, wth j=1,...,m, any f Y wth s j > 0 can be regarded as the f desred. Proof of Theorem Unversal Approxmaton. From Eq.1 we obtan that Y s a set of real functons contnuous on U. The Unversal Approxmaton Theorem s therefore a drect consequence the Stone - Weerstrass theorem and Lemmas A.1, A. e A.3.

34 Learnng n FBF/ANFIS Networks Unversal Approxmaton Corollary Gven a functon g L (U) and an arbtrary ɛ > 0,then there exsts a fuzzy logc system f n the form of Eq.1 such that ( U f (x) g(x) dx ) 1 < ɛ, where U { R n s compact, L (U) = g : U R U g(x) dx < }, where the ntegral s of Lebesgue type.

35 Learnng n FBF/ANFIS Networks Unversal Approxmaton Proof of Corollary. U beng compact, we have that U dx=v < and snce contnuous functons on U form a dense subset of L (U) (Rudn, 1960),for any g L (U)there exsts a contnuous functon ḡ n U ( such that U f (x) g(x) dx < ɛ. For the Unversal Approxmaton Theorem, there exsts a functon f Y such that sup x U f (x) ḡ(x) < ɛ. ) 1 V 1

36 Learnng n FBF/ANFIS Networks Unversal Approxmaton Therefore we get ( U f (x) g(x) dx ) 1 ( U f (x) g(x) dx ) 1 + ( U ḡ(x) g(x) dx ) 1 < ( U sup x U f (x) ḡ(x) dx ) 1 + ɛ 1 < ( ɛ V V ) ɛ + = ɛ.

37 Learnng n FBF/ANFIS Networks Approxmaton of the optmal Bayes classfer We wll demonstrate, followng Ruch et al.(1990), that a classfer based on a map (e.g., obtaned wth a connectonst system) able to perform functon approxmaton, can approxmate the Bayes dscrmnant functon. The demonstraton refers to a classfer n supervsed n (One For Class) modalty, n the large tranng set dmenson lmt, and to the two-class case.

38 Learnng n FBF/ANFIS Networks Approxmaton of the optmal Bayes classfer Let c 1 e c be two classes of patterns. Let χ be the set of all possble patterns x belongng to class c ; then χ = χ 1 χ represents the set of all patterns. The tranng set conssts of a subset of possble patterns belongng to the two classes: fnte set X = X 1 X, where X 1 = {x 1, x,..., x n1 } χ 1, X = {x n1 +1,..., x n1 +n } χ, and n 1 + n N.

39 Learnng n FBF/ANFIS Networks Approxmaton of the optmal Bayes classfer The system F s traned n such a way that: { +1 f x c1 F(x, w) = 1 f x c where w s the set of adaptve parameters of F. Ths s accomplshed by mmzng (e.g., wth the error backpropagaton technque (Rumelhart, 1986) the sample data error functon: E s (w) = x X 1 [F(x, w) 1] + x X [F(x, w) + 1].

40 Learnng n FBF/ANFIS Networks Approxmaton of the optmal Bayes classfer We shall demonstrate that n the large N lmt, when w mnmzes E s (w), then w mnmzes also : ɛ (w) = [F(x, w) g(x)] p(x)dx,.e., F(x, w) approxmates g(x). χ Let us defne the average error functon as the functon 1 E a (w) = lm N N E s(w) where N s the total number of pattern vectors. E a (w) represents the error surface that s obtaned when all possble vectors are used for the computaton.

41 Learnng n FBF/ANFIS Networks Approxmaton of the optmal Bayes classfer As assumed, n the large N lmt, we can suppose that the functon E s s a reasonable approxmaton for E a. Let us rewrte the functon E a as follows: E a (w) = lm N [n 1 N 1 [F(x, w) 1] + n n 1 N x X 1 1 [F(x, w)+1] ] n x X where n s the number of vectors belongng to the class c, and n 1 N and n N represent, n the large N lmt, the a-pror probabltes P(c 1 ) and P(c ), respectvely.

42 Learnng n FBF/ANFIS Networks Approxmaton of the optmal Bayes classfer By explotng the strong law of large number (Pavlds, 198): E a (w) = P(c 1 ) [F(x, w) 1] p(x c 1 )dx + ( χ +P(c ) [F(x, w) + 1] p(x c )dx χ = [F (x, w) + 1][p(x c 1 )P(c 1 ) + p(x c )P(c )]dx χ F(x, w)[p(x c 1 )P(c 1 ) p(x c )P(c )]dx χ

43 Learnng n FBF/ANFIS Networks Approxmaton of the optmal Bayes classfer The probablty densty functon of the nput vectors can be expressed as: p(x) = p(x c 1 )P(c 1 ) + p(x c )P(c ) By usng the Bayes rule: g(x)p(x) = [P(c 1 x) P(c x)]p(x) (3) = P(c 1 x)p(x) P(c x)p(x) = p(x c 1 )P(c 1 ) p(x c )P(c )

44 Learnng n FBF/ANFIS Networks Approxmaton of the optmal Bayes classfer Therefore: E a(w) = = = χ χ χ [F (x, w) + 1]p(x)dx F(x, w)g(x)p(x) (4) χ [F (x, w) F(x, w)g(x)]p(x)dx + p(x)dx [F(x, w) g(x)] p(x)dx = ɛ (w) + [1 g (x)]p(x)dx χ χ χ g (x)p(x)dx + The learnng algorthm mnmzes E s wth respect w and, as we assumed E s(w) to be a reasonable approxmaton for E a(w), ths algorthm mnmzes also E a wth respect w. χ p(x)dx Moreover, χ [1 g (x)]p(x)dx s a quantty that does not depend on w, hence the optmzaton algorthm mnmzes ɛ (w), too, whch was to be demonstrated.

45 Learnng n FBF/ANFIS Networks Approxmaton of the optmal Bayes classfer It s possble to extend ths result also to the multclass classfcaton problem, n the large tranng set dmenson lmt, Consder the mean square error MSE = k,n (y n k tn k ) where N s the cardnalty of the tranng set, y n = (yk n ) s the output of the system and t n = (tk n ) s the label of the n-th assocatve couple, whch components are defned as follows: I N { 1 f the pattern belongs to class j t j = 0 otherwse (5)

46 Learnng n FBF/ANFIS Networks Approxmaton of the optmal Bayes classfer For N that tends to nfnty, t can be shown that f MSE t s taken as a cost functon, when w mnmzes MSE, the system outputs of the y k approxmate the dscrmnatng functons optmal of Bayes, that s the probabltes of a-posteror class (Ruck, 1990).

APPENDIX A Some Linear Algebra

APPENDIX A Some Linear Algebra APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,