Artificial Intelligence

Size: px

Start display at page:

Download "Artificial Intelligence"

August Hoover
5 years ago
Views:

1 Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory

2 Announcements Be making progress on your projects!

3 Three Types of Learning Unsupervised Supervised Reinforcement

4 Perceptron Learning Rule Typically applied one example at a time choosing samples at random Guaranteed to find a perfect boundary if data are linearly separable Example Run

5 Perceptron Learning Rule Not guaranteed to find optimal solution if data are not linearly separable below it gets to the minimal error many times, but keeps changing the weights With Non-Separable Data

6 Perceptron Learning Rule Not guaranteed to find optimal solution if data are not linearly separable below it gets to the minimal error many times, but keeps changing the weights decreasing learning rate (setting alpha to 1/numIterations) will find a minimum-error solution With Non-Separable Data With Cooling

7 Perceptron Learning Rule Lots!,, With Linearly Separable Data With Non-Separable Data With Cooling

8 Logistic Regression in contrast to linear regression Threshold ( step-function ) has drawbacks non-differentiable always produces a 1 or 0 classification - even near the boundary

9 Logistic Regression Threshold ( step-function ) has drawbacks non-differentiable always produces a 1 or 0 classification - even near the boundary Solutions the differentiable logistic function output is a probability of the class near boundary towards 0 or 1 farther away

10 Logistic Regression Sigmoid for Earthquakes vs. Explosions

No closed-form solution Logistic Regression Weight update rule (see book for derivation) Linearly Separable Data Bit slower than linear regression, but more

11 No closed-form solution Logistic Regression Weight update rule (see book for derivation) Linearly Separable Data Bit slower than linear regression, but more predictable not linearly separable, fixed learning rate Faster than linear regression not linearly separable, decaying learning rate Faster than linear regression

12 Logistic Regression

13 Your brain is a neural network Neural Networks

15 Neural Networks We create computational brains called Artificial Neural Networks or ANNs

16 Neural Networks Nodes feed forward - layered

17 Neural Networks Nodes feedforward - layers recurrent - allows memory recurrent

18 Neural Networks Connections excitatory or inhibitory different weights (magnitudes) typically real valued

19 Neural Networks Nodes often layered function of inputs - step/threshold - sigmoid - other...

20 Neural Networks Nodes often layered function of inputs - step/threshold - sigmoid - other... biases - often treated as another input set to 1 - always use them! - otherwise decision boundary has to pass through origin - also good when default output is non-zero, such as a robot that moves forward unless X - make sure they are on for your projects!

21 Neural Networks Simple example: kiwi classifier binary inputs weights =?? bias = 1 Roundess Furriness T=0 = 1 0 f(x) = 1 if >=0, Bias 0 otherwise

22 Non-Linear Activation Functions Combining many of them builds an ANN that can represent non-linear functions - because activation functions are nonlinear

23 Outputs Can have single or multiple outputs e.g. one output for each class

24 Perceptron Perceptron: Single unit Default activation function: step Sigmoid: sigmoid perceptron Perceptron Network: single-layer network McCullough & Pitts (1943)

25 Perceptron Training Rule Training: step activation function: perceptron learning rule logistic activation function: gradient descent both work for linearly separable data - we already new that just a new metaphor now

26 Neural Networks Outputs 1 if >= number in node 1 neuron2 (right) 1 2D Non-Linear Mapping 1 Outputs of each node x y Left Center Right Output neuron3 (center) 3D neuron1 (left)

27 Neural Networks neuron2 (right) 1 1 neuron3 (center) neuron1 (left) 3D

28 Neural Networks Cover s theorem: The probability that classes are linearly separable increases when the features are nonlinearly mapped to a higher dimensional feature space. [Coover 1965] The output layer requires linear separability. The purpose of the hidden layers is to make the problem linearly separable! From:

29 Neural Networks Multi-layer networks thus allow non-linear regression

30 Neural Networks Multi-layer networks thus allow non-linear regression Single hidden layer (often very large): can represent any continuous function Two hidden layers: can represent any discontinuous function

31 Neural Networks Multi-layer networks thus allow non-linear regression Single hidden layer (often very large): can represent any continuous function Two hidden layers: can represent any discontinuous function But how do we train them?

32 Training Multi-Layer Neural Networks General Idea: Propagate the error backwards Called Backpropagation

33 Backpropagation Can still write the equation of a neuron Activation of Neuron 5 Bias

34 Backpropagation Can still write the equation of a neuron Allows us to calculate loss and take derivative Activation of Neuron 5 Bias

35 Backpropagation Can still write the equation of a neuron Allows us to calculate loss and take derivative At the output, the loss eq. is the same as before prev. gradient descent eq. now generalized to multiple output nodes

36 Backpropagation But what is the error at hidden nodes? We don t know what the right answer is for hidden nodes

37 Backpropagation

38 Backpropagation Detailed Pseudocode in Book

39 ANNs How do we choose the topology? number of layers number of neurons per layer Too many neurons = memorization/overfitting No perfect answer try different ones and cross-validate evolve (take my class in the Spring!) many others

40 Deep Learning Hierarchically composed feature representations Deep Neural Networks ~6+ Layers Each layer transforms data to new space could be higher-dimensional, lower, or just different

41 Hierarchically composed feature representations

42 Learning features relevant to the data Lehman, Clune, & Risi 2015 Lee et al. 2007

Halle Berry Neuron responded to her picture a line drawing of her her picture

43 Deep Learning Follows biology closely V1, V2, etc. in Visual Cortex go from edges to more abstract features eventually to a Halle Berry Neuron responded to her picture a line drawing of her her picture in her Catwoman costume her name (typed out) Quiroga et al. Nature 2005 For more, see:

Revision: Neural Network

Revision: Neural Network Exercise 1 Tell whether each of the following statements is true or false by checking the appropriate box. Statement True False a) A perceptron is guaranteed to perfectly learn