Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory
Announcements Be making progress on your projects!
Three Types of Learning Unsupervised Supervised Reinforcement
Perceptron Learning Rule Typically applied one example at a time choosing samples at random Guaranteed to find a perfect boundary if data are linearly separable Example Run
Perceptron Learning Rule Not guaranteed to find optimal solution if data are not linearly separable below it gets to the minimal error many times, but keeps changing the weights With Non-Separable Data
Perceptron Learning Rule Not guaranteed to find optimal solution if data are not linearly separable below it gets to the minimal error many times, but keeps changing the weights decreasing learning rate (setting alpha to 1/numIterations) will find a minimum-error solution With Non-Separable Data With Cooling
Perceptron Learning Rule Lots!,, With Linearly Separable Data With Non-Separable Data With Cooling
Logistic Regression in contrast to linear regression Threshold ( step-function ) has drawbacks non-differentiable always produces a 1 or 0 classification - even near the boundary
Logistic Regression Threshold ( step-function ) has drawbacks non-differentiable always produces a 1 or 0 classification - even near the boundary Solutions the differentiable logistic function output is a probability of the class - 0.5 near boundary towards 0 or 1 farther away
Logistic Regression Sigmoid for Earthquakes vs. Explosions
No closed-form solution Logistic Regression Weight update rule (see book for derivation) Linearly Separable Data Bit slower than linear regression, but more predictable not linearly separable, fixed learning rate Faster than linear regression not linearly separable, decaying learning rate Faster than linear regression
Logistic Regression
Your brain is a neural network Neural Networks
Neural Networks We create computational brains called Artificial Neural Networks or ANNs
Neural Networks Nodes feed forward - layered
Neural Networks Nodes feedforward - layers recurrent - allows memory recurrent
Neural Networks Connections excitatory or inhibitory different weights (magnitudes) typically real valued
Neural Networks Nodes often layered function of inputs - step/threshold - sigmoid - other...
Neural Networks Nodes often layered function of inputs - step/threshold - sigmoid - other... biases - often treated as another input set to 1 - always use them! - otherwise decision boundary has to pass through origin - also good when default output is non-zero, such as a robot that moves forward unless X - make sure they are on for your projects!
Neural Networks Simple example: kiwi classifier binary inputs weights =?? bias = 1 Roundess Furriness T=0 = 1 0 f(x) = 1 if >=0, Bias 0 otherwise
Non-Linear Activation Functions Combining many of them builds an ANN that can represent non-linear functions - because activation functions are nonlinear
Outputs Can have single or multiple outputs e.g. one output for each class
Perceptron Perceptron: Single unit Default activation function: step Sigmoid: sigmoid perceptron Perceptron Network: single-layer network McCullough & Pitts (1943)
Perceptron Training Rule Training: step activation function: perceptron learning rule logistic activation function: gradient descent both work for linearly separable data - we already new that just a new metaphor now
Neural Networks Outputs 1 if >= number in node 1 neuron2 (right) 1 2D Non-Linear Mapping 1 Outputs of each node x y Left Center Right Output 1 1 1 1 1 0 1 0 1 0 0 1 0 1 0 0 1 1 0 0 0 0 0 0 neuron3 (center) 3D neuron1 (left)
Neural Networks neuron2 (right) 1 1 neuron3 (center) neuron1 (left) 3D
Neural Networks Cover s theorem: The probability that classes are linearly separable increases when the features are nonlinearly mapped to a higher dimensional feature space. [Coover 1965] The output layer requires linear separability. The purpose of the hidden layers is to make the problem linearly separable! From: http://130.236.96.13/edu/courses/tbmi26/pdfs/lectures/le5.pdf
Neural Networks Multi-layer networks thus allow non-linear regression
Neural Networks Multi-layer networks thus allow non-linear regression Single hidden layer (often very large): can represent any continuous function Two hidden layers: can represent any discontinuous function
Neural Networks Multi-layer networks thus allow non-linear regression Single hidden layer (often very large): can represent any continuous function Two hidden layers: can represent any discontinuous function But how do we train them?
Training Multi-Layer Neural Networks General Idea: Propagate the error backwards Called Backpropagation
Backpropagation Can still write the equation of a neuron Activation of Neuron 5 Bias
Backpropagation Can still write the equation of a neuron Allows us to calculate loss and take derivative Activation of Neuron 5 Bias
Backpropagation Can still write the equation of a neuron Allows us to calculate loss and take derivative At the output, the loss eq. is the same as before prev. gradient descent eq. now generalized to multiple output nodes
Backpropagation But what is the error at hidden nodes? We don t know what the right answer is for hidden nodes
Backpropagation
Backpropagation Detailed Pseudocode in Book
ANNs How do we choose the topology? number of layers number of neurons per layer Too many neurons = memorization/overfitting No perfect answer try different ones and cross-validate evolve (take my class in the Spring!) many others
Deep Learning Hierarchically composed feature representations Deep Neural Networks ~6+ Layers Each layer transforms data to new space could be higher-dimensional, lower, or just different
Hierarchically composed feature representations
Learning features relevant to the data Lehman, Clune, & Risi 2015 Lee et al. 2007
Deep Learning Follows biology closely V1, V2, etc. in Visual Cortex go from edges to more abstract features eventually to a Halle Berry Neuron responded to her picture a line drawing of her her picture in her Catwoman costume her name (typed out) Quiroga et al. Nature 2005 For more, see: http://goo.gl/zcrgic