Artificial Neural Networks Q550: Models in Cognitive Science Lecture 5
"Intelligence is 10 million rules." --Doug Lenat The human brain has about 100 billion neurons. With an estimated average of one thousand connections between each neuron and its neighbors, we have about 100 trillion connections, each capable of a simultaneous calculation... (but) only 200 calculations per second... With 100 trillion connections, each computing at 200 calculations per second, we get 20 million billion calculations per second. This is a conservatively high estimate... by the year 2020, (a massively parallel neural net computer) will have doubled about 23 times (from 1997's $2,000 modestly parallel computer that could perform around 2 billion connection calculations per second)... resulting in a speed of about 20 million billion neural connection calculations per second, which is equal to the human brain. Ray Kurzweil, "The Age of Spiritual Machines", 1999
Biologically Inspired Models We want to add biological constraints to our model as well as behavioral (output data) constraints Our molar level tends to be the neuron If neurons are the hardware that the system we re trying to model is operating with, let s consider what we know about them
Basic Structure of a Neuron We typically think of the neuron as the basic unit of thought: Ø Dendrite (input) Ø Cell body/soma (integrator) Ø Axon (communication line) Ø Terminal (output) Ø Synapses (connections) Ø Neurotransmitters (messages)
Some Structural Facts There is competition for connections among neurons, and unconnected cells die (50% before birth) 10 11 neurons are left in the brain after birth (with no new generation) 10 15 neural connections. Each neuron connects to a small fraction of others (few hundred or thousand) Unused or unconnected neurons die Excitatory connections increase a cell s firing potential; inhibitory connections reduce it Electrical response of neuron is an all-or-none potential or spike
Some Structural Facts A single cell can have several thousand synapses on it The inputs from different synapses approximately add at the cell body Function relating integrated activation to firing rate is a nonlinear sigmoidal function OK, now lets incorporate some of these constraints into our models (starting with simple tasks)
Types of Neural Nets: 1. Feedforward Perceptrons, linear associators 2. Recurrent Hidden layer s state depends on it s state at a previous time SRNs, Hopfield nets 3. Stochastic Boltzman machines (noisy networks)
Types of Training: 1. Supervised Give data and correct response 2. Unsupervised Give data only 3. Reinforcement Data not provided, but determined by the models interaction w/ environment. Goal is to discover a policy for selecting actions that minimizes longterm costs Autoassociative vs. heteroassociative Learning rules: delta rule, backprop, gradient descent, evolutionary, expectation maximization
Datasets online: ANNs are very good a pattern recognition/classification On our website, you ll find some sets of training and testing exemplars to try the algorithms out on Feature lists for capital letters (from Rumelhart & McClelland) Handwritten letters, numbers, and math symbols (NIST and some of my own) Fingerprint vectors (NIST) Elman s cat chase mouse artificial language VSM vectors on Reuters's documents
Rumelhart & McClelland (1981) PR 1 6 8 7 14 2 13 9 5 12 11 10 3 4 Uppercase letters are represented by binary vectors of these 14 features
1 6 8 7 14 2 13 9 5 12 11 10 3 4 E = [1 0 0 1 1 1 0 0 1 0 0 0 1 0]
1 6 8 7 14 2 13 9 5 12 11 10 3 4 X = [0 0 0 0 0 0 0 1 0 1 0 1 0 1]
0 0 1 1 0 1 0 1 0 1 0 1 1.95.85.79.23.04.01..
Single-Layer Perceptron X 1 X 2 X 3 X 4 w 1 w 2 w 3 w 4 w n Σ X n "The road to wisdom? Well, it's plain and simple to express: Err and err and err again, but less and less and less." - Piet Hein Δw ij = α(t i y i )x j
Single-Layer Perceptron X 1 w 1 X 2 X 3 X 4 w 2 w 3 w 4 w n Σ -1 +1 X n Artificial Retina Input Nodes Input Summation THD (McCulloch-Pitts neuron) If y = t, do nothing If y t, then delta update: Δw ij = α(t i y i )x j
Multiclass Single-Layer Perceptron X 1 X 2 X 3 X 4 Σ Σ Σ X n
SLP Classification Model: X 1 X 2 A o i = Nx j=1 x j w i, j X 3 X 4 B Δw ij = α(t i o i )x j X n How do we decide which output node to choose? Choose the highest Choose the highest in a field of Gaussian decision noise Luce s choice axiom (separate decision process and classification process)
Shepard-Luce Choice Rule More realistic decision rule: The probability of choosing category A is based on a ratio of strength of the output activations The ratio rule (from Luce, 1959): p(a x) = e λ A e λo A + eλ B, where λ > 0 Lamda is a sensitivity parameter which determines the sensitivity of choice probability to the activation of each category Reducing λ makes responding more random, and increasing λ makes responding more deterministic
Shepard-Luce Choice Rule More realistic decision rule: The probability of choosing category A is based on a ratio of strength of the output activations The ratio rule (from Luce, 1959): p(a x) = eλo A Σe λ i Lamda is a sensitivity parameter which determines the sensitivity of choice probability to the activation of each category Reducing λ makes responding more random, and increasing λ makes responding more deterministic
Our SLP Classification Model: X 1 X 2 A o i = Nx j=1 x j w i, j 1 o i = 1+ exp o i X 3 X 4 B p(a x) = e λ A e λo A + eλ B X n Δw ij = α(t i o i )x j
The death of perceptrons Minsky & Papert (1969) Perceptrons They can only learn categories that are linearly separable They cannot do XOR This led to a decline in funding, and it took more than a decade for neural nets to make a comeback with Grossberg s work
Multilayer Perceptrons
Multilayer Perceptrons But let s use a smarter update rule based on gradient descent and assigning blame where blame is due a.k.a. Error Backpropagation
O Reilly: Fitting behavioral data without biological constraints is of questionable value in understanding how the brain actually subserves behavior Ratcliff: but few neurally plausible models fit data as well as global memory models