Plan for Today Computational Intelligence Winter Term 29/ Organization (Lectures / Tutorials) Overview CI Introduction to ANN McCulloch Pitts Neuron (MCP) Minsky / Papert Perceptron (MPP) Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS ) Fakultät für Informatik TU Dortmund 2 Organizational Issues Organizational Issues Who are you? either studying Automation and Robotics (Master of Science) Module Optimization or studying Informatik - BA-Modul Einführung in die Computational Intelligence - Hauptdiplom-Wahlvorlesung (SPG 6 & 7) Who am I? Günter Rudolph Fakultät für Informatik, LS Guenter.Rudolph@tu-dortmund.de OH-4, R. 232 office hours: Tuesday, :3 :3am and by appointment best way to contactme if you want to see me 3 4
Organizational Issues Prerequisites Lectures Wednesday :5-:45 OH-4, R. E23 Tutorials Wednesday 6:5-7: OH-4, R. 34 group Thursday 6:5-7: OH-4, R. 34 group 2 Tutor Nicola Beume, LS Information http://ls www.cs.unidortmund.de/people/rudolph/ teaching/lectures/ci/ws29 /lecture.jsp Slides Literature see web see web Knowledge about mathematics, programming, logic is helpful. But what if something is unknown to me? covered in the lecture pointers to literature... and don t hesitate to ask! 5 6 Overview Computational Intelligence Overview Computational Intelligence What is CI? umbrella term for computational methods inspired by nature artifical neural networks evolutionary algorithms backbone fuzzy systems swarm intelligence artificial immune systems new developments growth processes in trees... term computational intelligence coined by John Bezdek (FL, USA) originally intended as a demarcation line establish border between artificial and computational intelligence nowadays: blurring border our goals:. know what CI methods are good for! 2. know when refrain from CI methods! 3. know why they work at all! 4. know how to apply and adjust CI methods to your problem! 7 8 2
Biological Prototype Abstraction Neuron human being: 2 neurons - Information gathering (D) electricity in mv range - Information processing (C) - Information propagation (A / S) speed: 2 m / s dendrites nucleus / cell body axon synapse cell body (C) axon (A) nucleus dendrite (D) synapse (S) signal input signal processing signal output 9 Model 943: Warren McCulloch / Walter Pitts description of neurological networks modell: McCulloch-Pitts-Neuron (MCP) funktion f f(,,, x n ) basic idea: - neuron is either active or inactive - skills result from connecting neurons x n McCulloch-Pitts-Neuron 943: x i {, } =: B considered static networks (i.e. connections had been constructed and not learnt) f: B n B 2 3
McCulloch-Pitts-Neuron McCulloch-Pitts-Neuron NOT n binary input signals,, x n n binary input signals,, x n threshold > threshold > y in addition: m binary inhibitory signals y,, y m boolean OR boolean AND if at least one y j =, then output = can be realized:... x n = x n... n = n 3 otherwise: - sum of inputs threshold, then output = else output = 4 Analogons Neurons Synapse Topology simple MISO processors (with parameters: e.g. threshold) connection between neurons (with parameters: synaptic weight) interconnection structure of net Assumption: inputs also available in inverted form, i.e. inverted inputs. Theorem: Every logical function F: B n B canbesimulated with a two-layered McCulloch/Pitts net. + Example: Propagation Training / Learning working phase of ANN processes input to output adaptation of ANN to certain data x 3 x 3 3 3 x 4 2 5 6 4
Proof: (by construction) Every boolean function F can be transformed in disjunctive normal form 2 layers (AND - OR). Every clause gets a decoding neuron with = n output = only if clause satisfied (AND gate) 2. All outputs of decoding neurons are inputs of a neuron with = (OR gate) q.e.d. Generalization: inputs with weights x 3,2 fires if,2 +,4 +,3 x 3,7,4,7 2 x,3 + 4 + 3 x 3 7 duplicate inputs! 7 x 3 equivalent! 7 8 Theorem: Weighted and unweighted MCP-nets are equivalent for weights Q +. Conclusion for MCP nets Proof: + feed-forward: able to compute any Boolean function Let N + recursive: able to simulate DFA very similar to conventional logical circuits Multiplication with yields inequality with coefficients in N difficult to construct Duplicate input x i, such that we get a i b b 2 b i- b i+ b n inputs. no good learning algorithm available Threshold = a b b n Set all weights to. q.e.d. 9 2 5
Perceptron (Rosenblatt 958) complex model reduced by Minsky & Papert to what is necessary Minsky-Papert perceptron (MPP), 969 = = AND OR NAND NOR What can a single MPP do? J isolation of yields: J Example: N J N N separating line separates R 2 in 2 classes XOR? xor w + w 2 < w 2 w w + w 2 < w, w 2 > w + w 2 2 contradiction! 2 22 969: Marvin Minsky / Seymor Papert how to leave the dead end : book Perceptrons analysis math. properties of perceptrons disillusioning result: perceptions fail to solve a number of trivial problems! - XOR-Problem - Parity-Problem - Connectivity-Problem conclusion : All artificial neurons have this kind of weakness! research in this field is a scientific dead end! consequence: research funding for ANN cut down extremely (~ 5 years) 23. Multilayer Perceptrons: 2. Nonlinear separating functions: XOR 2 2 realizes XOR g(, ) = 2 + 2 4 - with = g(,) = g(,) = + g(,) = + g(,) = 24 6
How to obtain weights w i and threshold? Perceptron Learning as yet: by construction example: NAND-gate x x NAND 2 w 2 w w + w 2 < requires solution of a system of linear inequalities ( P) (e.g.: w = w 2 = -2, = -3) Assumption: test examples with correct I/O behavior available Principle: () choose initial weights in arbitrary manner (2) fed in test pattern (3) if output of perceptron wrong, then change weights (4) goto (2) until correct output for al test paterns now: by learning / training graphically: translation and rotation of separating lines 25 26 Perceptron Learning P: set of positive examples N: set of negative examples Example. choose w at random, t = 2. choose arbitrary x P N 3. if x P and w t x > then goto 2 if x N and w t x then goto 2 4. if x P and w t x then w t+ = w t + x; t++; goto 2 5. if x N and w t x > then w t+ = w t x; t++; goto 2 I/O correct! let w x, should be >! (w+x) x = w x + x x > w x let w x >, should be! (w x) x = w x x x < w x threshold as a weight: w = (, w, w 2 ) - w w 2 suppose initial vector of weights is 6. stop? If I/O correct for all examples! w () = (, -, ) remark: algorithm converges, is finite, worst case: exponential runtime 27 28 7
We know what a single MPP can do. What can be achieved with many MPPs? Single MPP separates plane in two half planes Many MPPs in 2 layers canidentifyconvexsets A B a,b X: a + (- ) b X for (,). How? 2 layers! 2. Convex? 29 Single MPP separates plane in two half planes Many MPPs in 2 layers can identify convex sets Many MPPs in 3 layers can identify arbitrary sets Many MPPs in > 3 layers not really necessary! arbitrary sets:. partitioning of nonconvex set in several convex sets 2. two-layered subnet for each convex set 3. feed outputs of two-layered subnets in OR gate (third layer) 3 8