Artificial Intelligence - PDF Free Download

Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory

Announcements Be making progress on your projects!

Three Types of Learning Unsupervised Supervised Reinforcement

Perceptron Learning Rule Typically applied one example at a time choosing samples at random Guaranteed to find a perfect boundary if data are linearly separable Example Run

Perceptron Learning Rule Not guaranteed to find optimal solution if data are not linearly separable below it gets to the minimal error many times, but keeps changing the weights With Non-Separable Data

Perceptron Learning Rule Not guaranteed to find optimal solution if data are not linearly separable below it gets to the minimal error many times, but keeps changing the weights decreasing learning rate (setting alpha to 1/numIterations) will find a minimum-error solution With Non-Separable Data With Cooling

Perceptron Learning Rule Lots!,, With Linearly Separable Data With Non-Separable Data With Cooling

Logistic Regression in contrast to linear regression Threshold ( step-function ) has drawbacks non-differentiable always produces a 1 or 0 classification - even near the boundary

Logistic Regression Threshold ( step-function ) has drawbacks non-differentiable always produces a 1 or 0 classification - even near the boundary Solutions the differentiable logistic function output is a probability of the class - 0.5 near boundary towards 0 or 1 farther away

Logistic Regression Sigmoid for Earthquakes vs. Explosions

No closed-form solution Logistic Regression Weight update rule (see book for derivation) Linearly Separable Data Bit slower than linear regression, but more predictable not linearly separable, fixed learning rate Faster than linear regression not linearly separable, decaying learning rate Faster than linear regression

Logistic Regression

Your brain is a neural network Neural Networks

Neural Networks We create computational brains called Artificial Neural Networks or ANNs

Neural Networks Nodes feed forward - layered

Neural Networks Nodes feedforward - layers recurrent - allows memory recurrent

Neural Networks Connections excitatory or inhibitory different weights (magnitudes) typically real valued

Neural Networks Nodes often layered function of inputs - step/threshold - sigmoid - other...

Neural Networks Nodes often layered function of inputs - step/threshold - sigmoid - other... biases - often treated as another input set to 1 - always use them! - otherwise decision boundary has to pass through origin - also good when default output is non-zero, such as a robot that moves forward unless X - make sure they are on for your projects!

Neural Networks Simple example: kiwi classifier binary inputs weights =?? bias = 1 Roundess Furriness T=0 = 1 0 f(x) = 1 if >=0, Bias 0 otherwise

Non-Linear Activation Functions Combining many of them builds an ANN that can represent non-linear functions - because activation functions are nonlinear

Outputs Can have single or multiple outputs e.g. one output for each class

Perceptron Perceptron: Single unit Default activation function: step Sigmoid: sigmoid perceptron Perceptron Network: single-layer network McCullough & Pitts (1943)

Perceptron Training Rule Training: step activation function: perceptron learning rule logistic activation function: gradient descent both work for linearly separable data - we already new that just a new metaphor now

Neural Networks Outputs 1 if >= number in node 1 neuron2 (right) 1 2D Non-Linear Mapping 1 Outputs of each node x y Left Center Right Output 1 1 1 1 1 0 1 0 1 0 0 1 0 1 0 0 1 1 0 0 0 0 0 0 neuron3 (center) 3D neuron1 (left)

Neural Networks neuron2 (right) 1 1 neuron3 (center) neuron1 (left) 3D

Neural Networks Cover s theorem: The probability that classes are linearly separable increases when the features are nonlinearly mapped to a higher dimensional feature space. [Coover 1965] The output layer requires linear separability. The purpose of the hidden layers is to make the problem linearly separable! From: http://130.236.96.13/edu/courses/tbmi26/pdfs/lectures/le5.pdf

Neural Networks Multi-layer networks thus allow non-linear regression

Neural Networks Multi-layer networks thus allow non-linear regression Single hidden layer (often very large): can represent any continuous function Two hidden layers: can represent any discontinuous function

Training Multi-Layer Neural Networks General Idea: Propagate the error backwards Called Backpropagation

Backpropagation Can still write the equation of a neuron Activation of Neuron 5 Bias

Backpropagation Can still write the equation of a neuron Allows us to calculate loss and take derivative Activation of Neuron 5 Bias

Backpropagation Can still write the equation of a neuron Allows us to calculate loss and take derivative At the output, the loss eq. is the same as before prev. gradient descent eq. now generalized to multiple output nodes

Backpropagation But what is the error at hidden nodes? We don t know what the right answer is for hidden nodes

Backpropagation

Backpropagation Detailed Pseudocode in Book

ANNs How do we choose the topology? number of layers number of neurons per layer Too many neurons = memorization/overfitting No perfect answer try different ones and cross-validate evolve (take my class in the Spring!) many others

Deep Learning Hierarchically composed feature representations Deep Neural Networks ~6+ Layers Each layer transforms data to new space could be higher-dimensional, lower, or just different

Hierarchically composed feature representations

Learning features relevant to the data Lehman, Clune, & Risi 2015 Lee et al. 2007

Deep Learning Follows biology closely V1, V2, etc. in Visual Cortex go from edges to more abstract features eventually to a Halle Berry Neuron responded to her picture a line drawing of her her picture in her Catwoman costume her name (typed out) Quiroga et al. Nature 2005 For more, see: http://goo.gl/zcrgic