v=1 v= 1 v= 1 v= 1 v= 1 v=1 optima 2) 3) 5) 6) 7) 8) 9) 12) 11) 13) INTRDUCTIN T ARTIFICIAL INTELLIGENCE DATA15001 EPISDE 8: NEURAL NETWRKS
TDAY S MENU 1. NEURAL CMPUTATIN 2. FEEDFRWARD NETWRKS (PERCEPTRN) 3. RECURRENT NETWRKS 4. SM
NEURAL CMPUTATIN The traditional model of computation based on Turing machines is only one of many frameworks Computation in natural nervous systems is quite different from the Turing machine model: parallel, stochastic and adaptive Neural computation is one of key approaches to AI 1960 saw a decline in interest ("AI Winter") but currently there's a new wave ("Deep learning")
NEURAL CMPUTATIN
NEURAL CMPUTATIN In our dichotomy of AI approaches, neural networks belong to "modern AI" rather than GFAI SUBSYMBLIC/DIGITAL VS SYMBLIC Some NNs are probabilistic (but not all), so probabilistic methods can be applied Perhaps the bigger question mark: a NN is a "black box", i.e., it very hard to interpret and say why a given output is produced many learning algorithms are poorly undersood ("optimal brain damage", "Cuckoo search",...)
NEURAL CMPUTATIN Source: Ertel: Introduction to Artificial Intelligence, Springer, 2011.
NEURAL CMPUTATIN NATURAL NEURAL NETWRK Source: Ertel: Introduction to Artificial Intelligence, Springer, 2011.
NEURAL CMPUTATIN ARTIFICIAL NEURAL NETWRK CPY JUST THE IDEA: A NUMBER F SIMPLE PRCESSING UNITS CNNECTED TGETHER AS A LARGE NETWRK Source: Ertel: Introduction to Artificial Intelligence, Springer, 2011.
N AT U R A L NATURAL NEURAL NETWRKS are usually: asynchronous binary (spike or not) recurrent (feedback) massive VS ARTIFICIAL ARTIFICIAL NEURAL NETWRKS are usually: synchronous continuous valued feedforward (no feedback) large but not massive
BASIC NEURN Real-valued or binary inputs x 1,..., x n Real-valued weights w i1,..., w in Activation function f => output x i Note: one neuron's output is another's input Image source: Ertel: Introduction to Artificial Intelligence, Springer, 2011.
CASE: PERCEPTRN A perceptron neuron can be used in isolation (a single neuron) Activation function is the step-function: f(z) = 1, if z < 0, z = Σ w ij x j j=1 1, otherwise n Image source: Ertel: Introduction to Artificial Intelligence, Springer, 2011.
CASE: PERCEPTRN The output is binary { 1,1} The goal is to adjust the weights so that the output of the neuron (from the activation function) is as we like The input can be, for example, a set of pixels in an image The goal would be to recognize whether the image represents a given pattern (e.g., number '3') Adjusting the weights can be hard Solution: Machine learning on some training data!
PERCEPTRN: THE MATH PART The effect of the weigths on the argument of the activation function, z, is linear Therefore, the decision boundary is a "hyperplane" (the generalization of a surface in high dimensions) For example, if the input is two-dimensional (x 1, x 2 ): x 1 f(z) = 1 iff the point (x 1, x 2 ) is on the "right" side of a straight line that passes through the origin DECISIN BUNDARY w 1 x 1 + w 2 x 2 = 0 x 2
PERCEPTRN ALGRITHM (RSENBLATT, 1958) The following simple algorithm finds an optimal hyperplane (weights)......if the data are "linearly separable" perceptron(data): 1: w = [0,...,0] # array of size p 2: while error(data, w) > 0: 3: (x,y) = choose_random_item(data) 4: z = w[0]x[0] +... + w[p-1]x[p-1] 5: if z 0 and y = -1: # -1 classified as 1 6: w = w x # subtract vector x 7: if z < 0 and y = 1: # 1 classified as -1 8: w = w + x # add vector x 9: return(w)
PERCEPTRN ALGRITHM (RSENBLATT, 1958) An illustration in 2D Image source: Ertel: Introduction to Artificial Intelligence, Springer, 2011.
PERCEPTRN ALGRITHM The problem is that usually data is not linearly separable then the algorithm keep updating the weights forever Variants exists for: finding the weights with minimal error finding the hyperplane that maximizes the margin between the two classes: "support vector machine (SVM)" In any case, the linearity of the classifier is a severe restriction Two approaches for constructing non-linear classifiers: multilayer perceptron kernel trick (similar idea as in non-linear regression)
MULTILAYER PERCEPTRN (MLP) Connecting many perceptron units together the activation function is usually sigmoid The output is differentiable wrt. parameters, which means that it is easier to optimize the weights Rule for learning MLPs: backpropagation UTSIDE THE SCPE F THIS CURSE
MULTILAYER PERCEPTRN (MLP) The MLP can represent "anything" (with enough hidden layers) The backpropagation algorithm doesn't guarantee the optimal weights, only a local optimum Restarting from a different starting point may give a different (better or worse) solution
RECURRENT NEURAL NETWRKS The MLP is a feedforward network because the information always flows in one direction (towards the output layer) In recurrent networks, this is not the case, and there can be feedback loops This can cause complex dynamic interactions which are often harder to model
HPFIELD NETWRK inputs weights EVERYTHING IS CNNECTED T EVERYTHING
HPFIELD NETWRK Learning occurs when a series of input configurations (each neuron's value) are presented inputs weights Weights will measure how often two neurons are at the same state (either both on, or both off) After learning: 1. the network is initialized in a new input configuration 2. each neurons may change its state according to the states of the other neurons (inputs) and the weights 3. the new states then become the input 4. this is repeated until convergence
HPFIELD NETWRK inputs The learning rule: n w ij = Σ q ik q jk / n k=1 where q ik = +1 if neuron i is on in the k'th training sample; 1 otherwise The activation rule the same as in a perceptron: x i = +1 if Σ w ij x j > 0 j i 1 otherwise weights
BLTZMANN MACHINE Another example of a recurrent network A probabilistic version of the Hopfield network The input is usually a subset of the neurons inputs "Restricted Boltmann machine": not everything is connected to everything A somewhat different learning rule
BLTZMANN MACHINE
SELF-RGANIZING MAP (SM) (KHNEN, 1982)
SELF-RGANIZING MAP (SM) The neurons form a two-dimensional grid Each neuron has a state vector The input vector activates the neuron whose state vector is nearest to the input vector: the "winner" input
SELF-RGANIZING MAP (SM) The neurons form a two-dimensional grid Each neuron has a state vector The input vector activates the neuron whose state vector is nearest to the input vector: the "winner" The state of the winner is updated to be more similar to the input input
SELF-RGANIZING MAP (SM) The neurons form a two-dimensional grid Each neuron has a state vector The input vector activates the neuron whose state vector is nearest to the input vector: the "winner" The state of the winner is updated to be more similar to the input input
SELF-RGANIZING MAP (SM) The neighbors of the winner are also updated input
SELF-RGANIZING MAP (SM) The neighbors of the winner are also updated input
SELF-RGANIZING MAP (SM) The neighbors of the winner are also updated input
SELF-RGANIZING MAP (SM) The neighbors of the winner are also updated input
SELF-RGANIZING MAP (SM) The neighbors of the winner are also updated input
SELF-RGANIZING MAP (SM) The neighbors of the winner are also updated input
SELF-RGANIZING MAP (SM) The neighbors of the winner are also updated The neighbors gradually become more and more similar As the learning proceeds, the size of the neighborhood is made smaller The updates also get smaller and the network converges input
SELF-RGANIZING MAP (SM) The input can be any vector Examples: speech recognition: = audio recording process control: = status of (e.g. a paper) machine information retrieval: = word occurrences in a document
SELF-RGANIZING MAP (SM)
SUMMARY N NEURAL NETWRKS Feedforward networks perceptron multilayer perceptron (MLP)... Recurrent networks Hopfield network Boltzmann machine... Self-organizing map
SUMMARY N NEURAL NETWRKS Different network type have different applications Feedforward networks can be used for supervised machine learning and as function approximators deep learning is typically based on feedforward networks with "convolutional" layers (recognizing image patterns) Recurrent networks can be used as a model of associative memory Self-organizing maps can be used for visualization