Motivation for the topic of the seminar

Bogdan M. Wilamoski Motivation for the topic of the seminar Constrains: Not to talk about AMNSTC (nano-micro) Bring a ne perspective to students Keep it to the state-of-the-art Folloing topics ere considered: Solving Engineering Problems ith Computer Advanced Netork Programming Analog Signal Processing Computational Intelligence Keynotes: AINA 4-th Conf. on Advanced Information Netorking and Applications, Perth, Australia (April) ICAISC - th International Conference on Artificial Intelligence and Soft Computing, Zakopane, Poland (June) ICIT -nd International Conf. on Information Technology, Gdansk, Poland (June) ISIE 9-th International Symposium on Industrial Electronics, Bari, Italy (July) ISRCS 3-rd International Symposium on Resilient Control Systems, Idaho Falls, USA (August) Bogdan M. Wilamoski Problems ith computational intelligence Introduction Neural Netork Learning Neural Netorks Architectures Challenges in Neural Netorks Fuzzy Systems Comparison of Neural and Fuzzy Systems Evolutionary Computation

3 4

6 3

7 8 4

3 4 7

6 8

7 8 9

WTA Winner Takes All Hamming layer linear layer binary inputs binary outputs unipolar neurons summing circuits pattern retrieval layer 3 The conclusion: The system of computational intelligence can be smarter than humans Is this ne technological revolution? years ago man poer as replaced by machines (steam and electric) years ago significant portion of man brain functions ere replaced by computer (calculations, administrative functions, voice and image recognitions etc) We are still claiming that e are the most intelligent creatures in the universe, but for ho much longer? 4

Find clusters Find number of clusters and its location in 4-dim. space 4-3 4 7 4-3 4 8-3 6 4-4 6 7-4 6-4 9 6-3 4-3 9 3-3 6-4 6 8 7 7-3 - 6-3 9 6-3 3-6 8 3-3 8 4-6 7-6 -4 3-3 6 7 6-4 - 7 7-3 - 6 6-3 7 4-3 4-3 6-3 - 6 8-3 - 6 7-6 4 7-3 - 6 8-4 4-4 4-7 - 6-9 7-3 7 6-4 8-3 4-4 7-4 9-4 8 6-3 -3 7-3 -6 3 7 4-4 9-4 4 - -3 7 4-4 3 8 3-6 8 4-3 6 9 7-3 3-4 6-4 6 6 - - 4 7-3 7 4-4 9 6-4 4-6 7-8 6-3 3-4 7 7-8 -4 4-4 8 - -3 4 7 - -4 6-8 4-4 3 6 6 - - 6 7-7 6-4 3-4 7-4 6 9 4-4 4 6-6 7-9 -4-7 - - 7-8 - 3 7 6 - -4 6 7-3 6-3 - 6-3 8 4-4 -4 7-4 Adding neurons as needed using minimum distance concept much simpler and more efficient then ART. First pattern is applied and the first neuron is introduced. Next pattern is applied and then: a) If distance form all existing clusters is larger then threshold then a ne neuron is added b) Else eights of the closest neuron are updated W k mw + = k αx m+ here m is the number of previous patterns of a given set hich ere used to update this particular neuron and α is the learning constant 6 3

X Y Fuzzifier Fuzzifier MIN operators Fuzzy rules MAX operators Defuzzifier Fuzzy systems Fuzzy controllers Practical Examples used from cameras to elevators Neural netorks out Diagnostic (medical, mechanical etc) Modeling (natural phenomena, or complex systems) Business and Military (various predictions) Evolutionary computation Are they replacing design processes? + + + Neurl Netorks 7 x x ( t ) dt y x Neural Netork or Fuzzy System x ( t ) dt y x n x ( t ) dt y n Block diagram of nonlinear dynamic system using neural netorks or fuzzy systems. 8 4

Introduction x nonlinear terms x ( t ) dt y x nonlinear terms x ( t ) dt y x n nonlinear terms x ( t ) dt y n y y y n = = = f f f ( x, x, Lxn, y, y, L yn) ( x, x, Lx, y, y, L y ) n L n ( x, x, Lx, y, y, L y ) n Block diagram of an arbitrary nonlinear dynamic system. n n dt dt dt 9 3 Another area ith 4 partitions neuron equations: y x + y > x + y > 3 x + y > x 3 x y + < ; 3 3; ; ; ; Weights in the first layer: Weights in the second layer: x+ y > x+ 3 y 3> x+ y > x y+ > 3

Another area ith 4 partitions first layer second layer 3 Design neural netork ith unipolar McCulloch -Pitts neurons, hich has to input and three outputs. Each output respond to the patterns located in three areas as shon on figure belo. Dra neural netork and indicate value of each eight. 3-3 4 3 3 - - 3 - - -3 x y + -. - 3-3 3 4 UNIPOLAR - - 6 - - 7 -. -. + 3 3 6

Bogdan M. Wilamoski Problems ith computational intelligence Introduction Neural Netork Learning Neural Netorks Architectures Fuzzy Systems Comparison of Neural and Fuzzy Systems Evolutionary Computation 33 Neural netorks as nonlinear elements + + + Feedforard neural netorks 34 7

Soft activation functions if net> sign( net) + o= f ( net) = =. if net = if net< o= f ( net) = sgn( net) = if net if net if net > = < o = f(net)= + exp f (-λnet) o= f(net)= tanh (.λnet) ' = λ ( o) o f ' = λ( o ) = + exp (-λnet) - 3 Neural Netork Learning Let us consider binary signals and eights such as x = + - - + - + + - if eights = x = + - - + - + + - then n net = i xi= 8 i= this is the maximum value net can have for any other combinations net ould be smaller 36 8

For the same pattern x = + - - + - + + - and slightly different eights = + + - + - - + - n net = i xi= 4 i= n net = i xi= n HD HD is the i= Neural Netork Learning Hamming Distance 37 Supervised learning rules for single neuron i = cδ x correlation rule (supervised): perceptron fixed rule: δ = d δ = d o perceptron adjustable rule - as above but the learning constant is modified to: T * x net α = α λ = α λ T LMS (Widro-Hoff) rule: δ = d x x net delta rule: δ = ( d o) f ' pseudoinverse rule (the same as LMS): = x ( T x x) x T d 38 9

EBP Error Back Propagation algorithm for single pattern Weights Layer j Layer k Neurons Weights Neurons Input vector Feed forard Backpropagation z V η learning constant net j V = ηδyz J f' (net j) initial eight vector j jδo δy = jδof'(netj) y = f(net j) o = f(net k) W net k W = ηδoy δo = [(dk-ok)f'(netk)] f' (net j) η K d - o - + d Desired output vector Output vector 39 EBP Error Back Propagation algorithm for single pattern x i ij j = α j-th + net j K [ ( dkp okp) gainkj x j] k= z j E j gain d ok gain kj = d net kj = f j ' ( net ) F '{ z } j j k j + o o k o K Illustration of the concept of the gain computation in neural netorks 4

4 Steepest descent method: g α = + k k Neton method: g A + = k k k here A k is Hessian and g is gradient vector if error is defined as: then: J J A T = and e J g T = here J is Jacobian and e is error vector LM or NBN algorithm N E E E M N N N N N E E E E E E E E E L M O M M L L Hessian Jacobian Gradient N is number of eights M is number of outputs = N MP MP MP N P P P N P P P N M M M n n e e e e e e e e e e e e e e e e e e L M M M L L M M M L M M M L L J ( ) = = = P p M m d pm o pm E Advantages of NBN algorithm over LM algorithm Both LM and NBN are very fast NBN do not calculate Jacobian matrix so it can handle problems ith basically unlimited number of patterns, hile LM can be used only for small problems LM needs the forard and back propagation processes to calculate Jacobian, hile NBN uses only forard calculation so it is faster (especially for netorks ith multiple outputs) NBN can handle arbitrarily connected neural netorks, hile LM as developed only for MLP

Number of iterations Training time Success rate 796 397ms 46% Sum of squared errors as a function of number of iterations for the Parity-4 problem using EBP algorithm, and runs Number of iterations Training time Success rate 7 ms 69% Result of parity-4 training using NBN algorithm ith 4-3-3- architecture, and runs Bogdan M. Wilamoski Problems ith computational intelligence Introduction Neural Netork Learning Neural Netorks Architectures Fuzzy Systems Comparison of Neural and Fuzzy Systems Evolutionary Computation 44

Functional Link Netorks outputs inputs nonlinear elements + Genetic algorithms? 4 Polynomial Netorks outputs inputs polynomial terms xy x y x y + Fourier or other series? Nonlinear regression? 46 3

The cascade correlation architecture + hidden neurons + output neurons + outputs inputs eights adjusted every step once adjusted eights and then frozen + 47 Hamming layer The counterpropagation netorks (ROM) inputs outputs - - - - - - - - unipolar neurons summing circuits - - - - - - - - - - - - 48 4

Hamming layer The counterpropagation netorks (analog memory) binary inputs outputs inputs - - - - outputs -3 - -3 unipolar neurons summing circuits - - - - - - - - - 3-4 - 3-49 R X analog memory ith analog address unipolar neurons summing circuits inputs -3 - -3 - - 3-4 - - - - 4 3 - -3 outputs -3 - -3-3 -4-3 - Consider using it as alternative to fuzzy systems Number of neurons = number of predefined values Easy implementation of systems ith multiple inputs

RBF - Radial Basis Function netorks minimum distance classifier inputs x is close to s hidden "neurons" s stored s stored stored s 3 s 4 stored summing circuit x s out= exp σ D y y y 3 y D y D y 3 D output normalization outputs Competitive Layer W LVQ Learning Vector Quantization Conterpropagation netork First layer detect V subclasses LinearLayer Second layer combines subclasses into a single class unipolar neurons summing circuits First layer computes Euclidean distances beteen input pattern and stored patterns. Wining neuron is ith the minimum distance 6

Input pattern transformation on a sphere Fix to Kohonen netork deficiency x x x n. z z z n z n+ R x xe xc xc3 xe3. -. xc xe - - - - - 3 Bogdan M. Wilamoski Problems ith computational intelligence Introduction Neural Netork Learning Neural Netorks Architectures Challenges in Neural Netorks Fuzzy Systems Comparison of Neural and Fuzzy Systems Evolutionary Computation 4 7

Boolean Fuzzy Fundamentals of Fuzzy Systems Comparison Boolean algebra ith fuzzy logic Fuzzy logic A B A B A B A B..3...3.3..8...8.8.7.3.3.7.3.7.7.8.7.7.8.8 Fuzzy systems Inputs can be any value from to. The basic fuzzy principle is similar to Boolean logic. Max and min operators are used instead of AND and OR. The NOT operator also becomes - #. A B C min{a, B,C} smallest value of A,B or C A B C max{a,b,c} largest value of A,B or C A - A Boolean A B one minus A complement Fuzzy A B A B A B..3..8.7.3.7.8 union...3.7..3..8.7.3.7.8 intersection.3.8.7.8 6 8

Boolean or Fuzzy systems Fuzzy systems Entropy measure of uncertainty E Entropy a l A) = = b l ( ( A, A ) near ( A, A ) far When a=b then E(A)= a A (.,.7) b A A Fuzzy entropy theorem E( A) = a b M = M ( A A) ( A A) a A A b A (.8,.3) 8 9

m A ( x i ) Fuzzy systems degree of association of variable x i ith fuzzy set A Size of fuzzy set ( A ) = ( ) n M i= Distance beteen fuzzy sets A and B m A x i l p ( A, B) = p n i= m A ( x ) m ( x ) here p is distance order If p= this is fuzzy Hamming distance ( A,) M ( A) l = If p= this is Euclidean distance i B i p A (.,.7) l = l =.4+.4=.8 B (.6,.3).4 +.4 =.7 9 Design Example of Mamdani Fuzzy Controller cold cool norm arm hot 3 3 x input - temperature Y et moist norm dry 3 X 3 Sum of membership functions = 6 y input - humidity centroid 3

Fuzzy systems X Y Fuzzyfier Fuzzyfier voltages Array of cluster cells out eighted currents X Y Fuzzifier Fuzzifier MIN operators Fuzzy rules Block diagram for Zadeh fuzzy controller MAX operators Defuzzifier out k (fuzzy) defuzzifier output (analog) k (fuzzy) normalization k (fuzzy) eighted sum output (analog) Takagi-Sugeno type defuzzifier 6 Fundamentals of Fuzzy Systems Fuzzy controllers X Y Fuzzifier Fuzzifier Rule selection cells min-max operations Defuzzification out X Y Fuzzyfier Fuzzyfier voltages Array of cluster cells eighted currents out Block diagram of a Mamdani type fuzzy controller Block diagram of a TSK (Takagi-Sugeno-Kang) fuzzy controller 3

Y 3 C D E E B B D E A B B B C D D C D E E E Y 3 3. 7. 9. 7. 3. 3. 6. 7.3 6. 3..8 3.7 4. 3.7.8.7..9..7 3 X 3 X Mamdani fuzzy controller TSK fuzzy controller 63 Mamdani controller TSK controller triangular triangular trapezoidal 64 trapezoidal 64 3

X y fuzzification Fuzzifier Fuzzifier multiplication Π Π Π sum all eights equal expected values division out. -. -. z Fuzzifier Π all eights equal -. - Fuzzy neural netorks 6 u v x y z u v x y z f f e e d d c c b b a a TSK fuzzy controller Sigmoidal pairs fuzzy controller ith sigmoidal membership functions 66 33

y u v x y z f f e B e d d c c b A C b u, v x y z x a a fuzzy controller ith sigmoidal membership functions 67 Neural Netorks or Fuzzy Systems Fuzzy Neural Number of inputs - + Analog implementation - + Digital implementation - + Speed - + Smoothens of the surface - + Design complexity + + So, most researches use FUZZY Why researches are frustrated ith neural 68? 34

Bogdan M. Wilamoski Problems ith computational intelligence Introduction Neural Netork Learning Neural Netorks Architectures Challenges in Neural Netorks Fuzzy Systems Comparison of Neural and Fuzzy Systems Evolutionary Computation 69 all eights = + Best neural netork architectures 3 4 6 7 8 - - - - + -7; -; -3; -; ; 3; ; 7 bipolar - eights = + + 3 4 eights (-8, -4,, 4, 8) bipolar eights = - out eights = + eights = + + -7. + -8-3. unipolar + -4 -. 3 + - -. out Layered bipolar neural netork ith one hidden layer for the parity-8 problem. Parity- implemented in fully connected bipolar neural netorks ith five neurons in the hidden layer. Parity- implemented ith 4 neurons in one cascade 3

Solution of the to spiral problem using MLP architecture ith 3 neurons (-4-4-4-). Solution of the to spiral problem using BMLP architecture ith 7 neurons (====). Solution of the to spiral problem using FCC architecture ith 6 neurons (======) Best neural netork architectures Most softare can train only MLP exceptions are SNNS and NBN 36

Performance of Neural Netorks - - - - - - - - - - - - Control surface of TSK fuzzy controller (a) required control surface (b) 8*6=48 defuzzyfication rules Performance of Neural Netorks - - - - - - - - - - - - Control surface obtained ith neural netorks (a) 3 neurons in cascade ( eights) Training Error=.49 (b) 4 neurons in cascade (8 eights) Training Error=.496 37

Performance of Neural Netorks - - - - - - - - - - - - Control surface obtained ith neural netorks (a) neurons in cascade ( eights) Training Error=.3973 (b) 8 neurons in cascade ( eights) Training Error=.8e- EBP is not able to train optimal architectures Comparison beteen EBP algorithm and NBN algorithm, for different number of neurons in fully connected cascade netorks: (a) average training time; (b) success rate 38

Common Mistakes Researchers are using rong architectures Researchers are using excessive number of neurons First order algorithm such as EBP is not able to train optimal netorks Second order algorithms such as LM can train only MLP netorks Success Rate.8.6.4. EBP NBN 8 9 3 4 6 Number of Neurons Nely developed NBN algoritm is not only very fast but it can train all neural netork architectures and it can find solutions for optimal neural netork architectures Bogdan M. Wilamoski Problems ith computational intelligence Introduction Neural Netork Learning Neural Netorks Architectures Challenges in Neural Netorks Fuzzy Systems Comparison of Neural and Fuzzy Systems Evolutionary Computation 78 39

Digital implementation Neural netork implementations usually require computation of the sigmoidal function f ( net) = + exp net for unipolar neurons, or ( ) f ( net) = tanh( net) = exp ( net) for bipolar neurons. These functions are relatively difficult to compute, making implementation on a microprocessor difficult. If the Elliott function is used: net f ( net) = + net.8.6.4. -. -.4 -.6 -.8 Elliott function sigmoidal unipolar Elliott sigmoidal bipolar - - -4-3 - - 3 4 instead of the sigmoidal, then the computations are relatively simple and the results are almost as good as in the case of sigmoidal function. 79.. -. -. - - required surface triangular.. -. -. - - trapezoidal Mamdani fuzzy ith MIN Gaussian 8 4

.. -. -. - - -. required surface.. -. -. - - Neural netork ith one hidden layer 8 to inputs cases Fuzzy ith 6 membership functions Mamdani fuzzy controllers require: *6+6 =8 analog values + rule table 36*3= 8 bits TSK fuzzy controllers require: *6+6*6 =48 analog values No rule table has to be stored Neural netorks hidden neurons *3+= analog values 3 hidden neurons 3*3+= 4 analog values 4 hidden neurons 4*3+6= 8 analog values... -. -. -. - - - 8 4

Comparison of various fuzzy and neural controllers Type of controller length of code processing time (ms) Error MSE Mamdani ith trapezoidal 34.9.94 Mamdani ith triangular 34.9.67 Mamdani ith Gaussian 34 39.8.8 Tagagi-Sugeno ith trapezoidal 8..39 Tagagi-Sugeno ith triangulal 8..9 Tagagi-Sugeno ith Gaussian 84.3.36 Neural netork ith 3 neurons in cascade Neural netork ith neurons in cascade Neural netork ith 6 neurons in one hidden layer 68.7.7 7 3.3.9 66 3.8.3 83 Bogdan M. Wilamoski Problems ith computational intelligence Introduction Neural Netork Learning Neural Netorks Architectures Challenges in Neural Netorks Fuzzy Systems Comparison of Neural and Fuzzy Systems Evolutionary Computation 84 4

Genetic Algorithms The genetic algorithms follo the evolution process in the nature to find the better solutions of some complicated problems. Foundations of genetic algorithms are given in Holland (97) and Goldberg (989) books. Genetic algorithms consist the folloing steps: Initialization Selection Reproduction ith crossover and mutation Selection and reproduction are repeated for each generation until a solution is reached During this procedure a certain strings of symbols, knon as chromosomes, evaluate toard better solution. 8 Genetic Algorithms All significant steps of the genetic algorithm ill be explained using a simple example of finding a maximum of the function (sin (x)-.*x) ith the range of x from to.6. Note, that in this range the function has global maximum at x=.39, and local maximum at x=.6..8.7.6..4.3....4.6.8..4.6 86 43

Genetic Algorithms.8.7.6..4.3....4.6.8..4.6 Coding and initialization At first, the variable x has to be represented as a string of symbols. With longer strings process converges usually faster, so less symbols for one string field are used it is the better. While this string may be the sequence of any symbols, the binary symbols "" and "" are usually used. In our example, let us use for coding six bit binary numbers having a decimal value of 4x. Process starts ith a random generation of the initial population given in Table 87 Genetic Algorithms 3 Initial Population string decimal value variable value function value fraction of total 4..633.46 4..433.686 3..4.6 4 37.9.37.97..4.8 6 49..743.89 7 39.97.39. 8 4..6.6 Total.68. 88 44

Genetic Algorithms 4 Selection and reproduction Selection of the best members of the population is an important step in the genetic algorithm. Many different approaches can be used to rank individuals. In our example the ranking function is given. Highest rank has member number 6 and loest rank has member number 3. Members ith higher rank should have higher chances to reproduce. The probability of reproduction for each member can be obtained as fraction of the sum of all objective function values. This fraction is shon in the last column of the Table. Using a random reproduction process the folloing population arranged in pairs could be generated: -> 4 -> 49 -> 37 -> 49 -> 39 -> 4 -> 49 -> 4 89 Genetic Algorithms Reproduction -> 4 -> 49 -> 37 -> 49 -> 39 -> 4 -> 49 - > 4 If size of the population from one generation to another is the same, therefore to parents should generate to children. By combining to strings to another strings should be generated. The simples ay to do it is to split in half each of the parent string and exchange substrings beteen parents. For example from parent strings and the folloing child strings ill be generated and. This process is knon as the crossover and resultant children are shon belo -> 47 -> 3 -> 33 -> 48 9 4

Genetic Algorithms 6 Mutation On the top of properties inherited from parents they are acquiring some ne random properties. This process is knon as mutation. In most cases mutation generates lo ranked children, hich are eliminated in reproduction process. Sometimes hoever, the mutation may introduce a better individual ith a ne property into. This prevents process of reproduction from degeneration. In genetic algorithms mutation plays usually secondary role. Mutation rate is usually assumed on the level much belo %. In our example mutation is equivalent to the random bit change of a given pattern. In this simple example ith short strings and small population ith a typical mutation rate of.%, our patterns remain practically unchanged by the mutation process. The second generation for our example is shon in Table 9 Genetic Algorithms 7 Population of Second Generation string number string decimal value variable value function value fraction of total 47.7.696.87 37.9.37.7 3 3.3.774.766 4 4..47.84 33.8.6.368 6 3.3.774.766 7 48..7.646 8 4..47.84 Total.4387. 9 46

Genetic Algorithms 8 Note, that to identical highest ranking members of the second generation are very close to the solution x=.39. The randomly chosen parents for third generation are -> 47 -> 3 -> 48 -> 4 -> 3 -> 48 -> 4 -> 3 hich produces folloing children: -> -> 48 -> 49 -> 4 -> -> 3 -> 4 -> 49 The best result in the third population is the same as in the second one. By careful inspection of all strings from second or third generation one may conclude that using crossover, here strings are alays split into half, the best solution -> ill never be 93 reached no matter ho many generations are created. Genetic Algorithms 9 94 47

Genetic Algorithms 9 Genetic Algorithms 96 48

Genetic Algorithms 97 Pulse Coded Neural Netorks 98 49

Pulse Coded Neural Netorks input node M3 M M C R C R V DD M4 MP MN eighted output currents NC Neural Cell Transient Graph time...ms V 9 8 7 6 4 3 VC VC 99 Pulse Coded Neural Netorks X NC R C 3 Y NC 3

Pulse Coded Neural Netorks 3 Pulse Coded Neural Netorks 4 V DD input coupling ith neighbors C R C R coupling ith neighbors output input spatial image mutualy coupled neurons output temporal pattern Burning Fire (complete image) 8 6 4 # of events 8 6 4 3 time

Pulse Coded Neural Netorks 3