COMP304 Introduction to Neural Networks based on slides by:

COMP34 Introduction to Neural Networks based on slides by: Christian Borgelt http://www.borgelt.net/ Christian Borgelt Introduction to Neural Networks

Motivation: Why (Artificial) Neural Networks? (Neuro-)Biology / (Neuro-)Physiology / Psychology: Exploit similarity to real (biological) neural networks. Build models to understand nerve and brain operation by simulation. Computer Science / Engineering / Economics Mimic certain cognitive capabilities of human beings. Solve learning/adaptation, prediction, and optimization problems. Physics / Chemistry Use neural network models to describe physical phenomena. Special case: spin glasses (alloys of magnetic and non-magnetic metals). Christian Borgelt Introduction to Neural Networks

Motivation: Why Neural Networks in AI? Physical-Symbol System Hypothesis [Newell and Simon 976] A physical-symbol system has the necessary and sufficient means for general intelligent action. Neural networks process simple signals, not symbols. So why study neural networks in Artificial Intelligence? Symbol-based representations work well for inference tasks, but are fairly bad for perception tasks. Symbol-based expert systems tend to get slower with growing knowledge, human experts tend to get faster. Neural networks allow for highly parallel information processing. There are several successful applications in industry and finance. Christian Borgelt Introduction to Neural Networks 3

Biological Background Structure of a prototypical biological neuron terminal button synapsis dendrites cell core cell body (soma) axon myelin sheath Christian Borgelt Introduction to Neural Networks 4

Biological Background (Very) simplified description of neural information processing Axon terminal releases chemicals, called neurotransmitters. These act on the membrane of the receptor dendrite to change its polarization. (The inside is usually 7mV more negative than the outside.) Decrease in potential difference: excitatory synapse Increase in potential difference: inhibitory synapse If there is enough net excitatory input, the axon is depolarized. The resulting action potential travels along the axon. (Speed depends on the degree to which the axon is covered with myelin.) When the action potential reaches the terminal buttons, it triggers the release of neurotransmitters. Christian Borgelt Introduction to Neural Networks 5

Threshold Logic Units Christian Borgelt Introduction to Neural Networks 6

Threshold Logic Units A Threshold Logic Unit (TLU) is a processing unit for numbers with n inputs x,..., x n and one output y. The unit has a threshold θ and each input x i is associated with a weight w i. A threshold logic unit computes the function y =, if x w =, otherwise. n i= w i x i θ, x w.. θ y x n w n Christian Borgelt Introduction to Neural Networks 7

Threshold Logic Units: Examples Threshold logic unit for the conjunction x x. x 3 x 4 y x x 3x + x y 3 5 Threshold logic unit for the implication x x. x x y x x x x y Christian Borgelt Introduction to Neural Networks 8

Threshold Logic Units: Examples Threshold logic unit for (x x ) (x x 3 ) (x x 3 ). x x x 3 y x x x 3 i w i x i y 4 Christian Borgelt Introduction to Neural Networks 9

Threshold Logic Units: Geometric Interpretation Review of line representations Straight lines are usually represented in one of the following forms: with the parameters: Explicit Form: g x = bx + c Implicit Form: g a x + a x + d = Point-Direction Form: g x = p + k r Normal Form: g ( x p) n = b : c : p : r : n : Gradient of the line Section of the x axis Vector of a point of the line (base vector) Direction vector of the line Normal vector of the line Christian Borgelt Introduction to Neural Networks

Threshold Logic Units: Geometric Interpretation A straight line and its defining parameters. c x b = r r r n = (a, a ) q = d n n n g ϕ p d = p n x O Christian Borgelt Introduction to Neural Networks

Threshold Logic Units: Geometric Interpretation How to determine the side on which a point x lies. x z z = x n n n n q = d n n n g ϕ x x O Christian Borgelt Introduction to Neural Networks

Threshold Logic Units: Geometric Interpretation Threshold logic unit for x x. x 3 x 4 y x x A threshold logic unit for x x. x x y x x Christian Borgelt Introduction to Neural Networks 3

Threshold Logic Units: Geometric Interpretation Visualization of 3-dimensional Boolean functions: x 3 x (,, ) x (,, ) Threshold logic unit for (x x ) (x x 3 ) (x x 3 ). x x y x 3 x x 3 x Christian Borgelt Introduction to Neural Networks 4

Threshold Logic Units: Limitations The biimplication problem x x : There is no separating line. x x y x x Formal proof by reductio ad absurdum: since (, ) : θ, () since (, ) : w < θ, () since (, ) : w < θ, (3) since (, ) : w + w θ. (4) () and (3): w + w < θ. With (4): θ > θ, or θ >. Contradiction to (). Christian Borgelt Introduction to Neural Networks 5

Threshold Logic Units: Limitations Total number and number of linearly separable Boolean functions. ([Widner 96] as cited in [Zell 994]) inputs Boolean functions linearly separable functions 4 4 6 4 3 56 4 4 65536 774 5 4.3 9 9457 6.8 9 5. 6 For many inputs a threshold logic unit can compute almost no functions. Networks of threshold logic units are needed to overcome the limitations. Christian Borgelt Introduction to Neural Networks 6

Networks of Threshold Logic Units Solving the biimplication problem with a network. Idea: logical decomposition x x (x x ) (x x ) x x computes y = x x 3 computes y = x x computes y = y y y = x x Christian Borgelt Introduction to Neural Networks 7

Networks of Threshold Logic Units Solving the biimplication problem: Geometric interpretation x d c g g = y b ac g 3 a b d x y The first layer computes new Boolean coordinates for the points. After the coordinate transformation the problem is linearly separable. Christian Borgelt Introduction to Neural Networks 8

Representing Arbitrary Boolean Functions Let y = f(x,..., x n ) be a Boolean function of n variables. (i) Represent f(x,..., x n ) in disjunctive normal form. That is, determine D f = K... K m, where all K j are conjunctions of n literals, i.e., K j = l j... l jn with l ji = x i (positive literal) or l ji = x i (negative literal). (ii) Create a neuron for each conjunction K j of the disjunctive normal form (having n inputs one input for each variable), where {, if lji = x w ji = i, and θ, if l ji = x i, j = n + n w ji. (iii) Create an output neuron (having m inputs one input for each neuron that was created in step (ii)), where i= w (n+)k =, k =,..., m, and θ n+ =. Christian Borgelt Introduction to Neural Networks 9

Training Threshold Logic Units Christian Borgelt Introduction to Neural Networks

Training Threshold Logic Units Geometric interpretation provides a way to construct threshold logic units with and 3 inputs, but: Not an automatic method (human visualization needed). Not feasible for more than 3 inputs. General idea of automatic training: Start with random values for weights and threshold. Determine the error of the output for a set of training patterns. Error is a function of the weights and the threshold: e = e(w,..., w n, θ). Adapt weights and threshold so that the error gets smaller. Iterate adaptation until the error vanishes. Christian Borgelt Introduction to Neural Networks

Training Threshold Logic Units Single input threshold logic unit for the negation x. x w θ y x y Output error as a function of weight and threshold. w e θ error for x = w e θ error for x = w e θ sum of errors Christian Borgelt Introduction to Neural Networks

Training Threshold Logic Units The error function cannot be used directly, because it consists of plateaus. Solution: If the computed output is wrong, take into account, how far the weighted sum is from the threshold. Modified output error as a function of weight and threshold. 4 e 4 e 4 e w θ error for x = w θ error for x = w θ sum of errors Christian Borgelt Introduction to Neural Networks 3

Training Threshold Logic Units Example training procedure: Online and batch training. w θ Online-Lernen w θ Batch-Lernen 4 w e θ Batch-Lernen x y x Christian Borgelt Introduction to Neural Networks 4

Training Threshold Logic Units: Delta Rule Formal Training Rule: Let x = (x,..., x n ) be an input vector of a threshold logic unit, o the desired output for this input vector and y the actual output of the threshold logic unit. If y o, then the threshold θ and the weight vector w = (w,..., w n ) are adapted as follows in order to reduce the error: i {,..., n} : θ (new) = θ (old) + θ with θ = η(o y), = w (old) i + w i with w i = η(o y)x i, w (new) i where η is a parameter that is called learning rate. It determines the severity of the weight changes. This procedure is called Delta Rule or Widrow Hoff Procedure [Widrow and Hoff 96]. Online Training: Adapt parameters after each training pattern. Batch Training: Adapt parameters only at the end of each epoch, i.e. after a traversal of all training patterns. Christian Borgelt Introduction to Neural Networks 5

Training Threshold Logic Units: Delta Rule Turning the threshold value into a weight: = x w = θ x w x w x w θ y x w y.. w n w n x n x n n i= w i x i θ n i= w i x i θ Christian Borgelt Introduction to Neural Networks 6

Training Threshold Logic Units: Delta Rule procedure online training (var w, var θ, L, η); var y, e; (* output, sum of errors *) begin repeat e := ; (* initialize the error sum *) for all ( x, o) L do begin (* traverse the patterns *) if ( w x θ) then y := ; (* compute the output *) else y := ; (* of the threshold logic unit *) if (y o) then begin (* if the output is wrong *) θ := θ η(o y); (* adapt the threshold *) w := w + η(o y) x; (* and the weights *) e := e + o y ; (* sum the errors *) end; end; until (e ); (* repeat the computations *) end; (* until the error vanishes *) Christian Borgelt Introduction to Neural Networks 7

Training Threshold Logic Units: Not epoch x o x w y e θ w θ w.5.5.5.5.5.5.5.5.5 3.5.5.5.5 4.5.5.5.5 5.5.5.5.5 6.5.5.5.5 Christian Borgelt Introduction to Neural Networks 8

Training Threshold Logic Units: Conjunction Threshold logic unit with two inputs for the conjunction. x w x w θ y x x y x y x Christian Borgelt Introduction to Neural Networks 9

Training Threshold Logic Units: Conjunction epoch x x o x w y e θ w w θ w w 3 3 4 3 5 3 3 3 6 3 3 3 3 3 Christian Borgelt Introduction to Neural Networks 3

Training Threshold Logic Units: Biimplication epoch x x o x w y e θ w w θ w w 3 3 3 Christian Borgelt Introduction to Neural Networks 3

Training Threshold Logic Units: Convergence Convergence Theorem: Let L = {( x, o ),... ( x m, o m )} be a set of training patterns, each consisting of an input vector x i IR n and a desired output o i {, }. Furthermore, let L = {( x, o) L o = } and L = {( x, o) L o = }. If L and L are linearly separable, i.e., if w IR n and θ IR exist, such that ( x, ) L : w x < θ and ( x, ) L : w x θ, then online as well as batch training terminate. The algorithms terminate only when the error vanishes. Therefore the resulting threshold and weights must solve the problem. For not linearly separable problems the algorithms do not terminate. Christian Borgelt Introduction to Neural Networks 3

Training Networks of Threshold Logic Units Single threshold logic units have strong limitations: They can only compute linearly separable functions. Networks of threshold logic units can compute arbitrary Boolean functions. Training single threshold logic units with the delta rule is fast and guaranteed to find a solution if one exists. Networks of threshold logic units cannot be trained, because there are no desired values for the neurons of the first layer, the problem can usually be solved with different functions computed by the neurons of the first layer. When this situation became clear, neural networks were seen as a research dead end. Christian Borgelt Introduction to Neural Networks 33