Feedforward Neural Nets and Backpropagation

Size: px
Start display at page:

Download "Feedforward Neural Nets and Backpropagation"

Transcription

1 Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, / 23

2 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features x i. Could also use basis functions or kernels. 2 / 23

3 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features x i. Could also use basis functions or kernels. Unsupervised Learning: Learn a representation z i based on features x i. Also used for supervised learning: use z i as features. 2 / 23

4 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features x i. Could also use basis functions or kernels. Unsupervised Learning: Learn a representation z i based on features x i. Also used for supervised learning: use z i as features. Supervised Learning: Learn features z i that are good for supervised learning. 2 / 23

5 Supervised Learning Roadmap Linear Model yi w1 w2 w3 wd xi1 xi2 xi3 xid 3 / 23

6 Supervised Learning Roadmap Change of Basis Linear Model yi w1 w2 w3 wk yi zi1 zi2 zi3 zik w1 w2 w3 wd xi1 xi2 xi3 xid Basis xi1 xi2 xi3 xid 3 / 23

7 Supervised Learning Roadmap Change of Basis Basis from Latent-Factor Model yi Linear Model yi w1 w2 w3 wk yi zi1 zi2 zi3 zik w1 w2 w3 wk zi1 zi2 zi3 zik w1 w2 w3 wd xi1 xi2 xi3 xid Basis w and W are trained separately zi1 zi2 zi3 zik W11 Wkd xi1 xi2 xi3 xid xi1 xi2 xi3 xid 3 / 23

8 Supervised Learning Roadmap Linear Model Change of Basis yi Basis from Latent-Factor Model yi Simultaneously Learn Features for Task and Regression Model w1 w2 w3 wk yi zi1 zi2 zi3 zik w1 w2 w3 wk zi1 zi2 zi3 zik yi w1 w2 w3 wk w1 w2 w3 wd xi1 xi2 xi3 xid Basis w and W are trained separately zi1 zi2 zi3 zik W11 Wkd zi1 zi2 zi3 zik xi1 xi2 xi3 xid W11 Wkd xi1 xi2 xi3 xid xi1 xi2 xi3 xid 3 / 23

9 Supervised Learning Roadmap Linear Model Change of Basis yi Basis from Latent-Factor Model yi Simultaneously Learn Features for Task and Regression Model w1 w2 w3 wk yi zi1 zi2 zi3 zik w1 w2 w3 wk zi1 zi2 zi3 zik yi w1 w2 w3 wk w1 w2 w3 wd xi1 xi2 xi3 xid Basis w and W are trained separately zi1 zi2 zi3 zik W11 Wkd zi1 zi2 zi3 zik xi1 xi2 xi3 xid W11 Wkd xi1 xi2 xi3 xid xi1 xi2 xi3 xid These are all examples of Feedforward Neural Networks. 3 / 23

10 Feed-forward Neural Network Information always moves one direction. No loops. Never goes backwards. Forms a directed acyclic graph. 4 / 23

11 Feed-forward Neural Network Information always moves one direction. No loops. Never goes backwards. Forms a directed acyclic graph. Each node receives input only from immediately preceding layer. Simplest type of artificial neural network. 4 / 23

12 Neural Networks - Historical Background 1943: McCulloch and Pitts proposed first computational model of neuron 1949: Hebb proposed the first learning rule 1958: Rosenblatt s work on perceptrons 1969: Minsky and Papert s paper exposed limits of theory 1970s: Decade of dormancy for neural networks s: Neural network return (self-organization, back-propagation algorithms, etc.) 5 / 23

13 Model of Single Neuron McCulloch and Pitts (1943): integrate and fire model (no hidden layers) yi w1 w2 w3 wd xi1 xi2 xi3 xid Denote the d input values for sample i by x i1, x i2,..., x id. Each of the d inputs has a weight w 1, w 2,..., w d. 6 / 23

14 Model of Single Neuron McCulloch and Pitts (1943): integrate and fire model (no hidden layers) yi w1 w2 w3 wd xi1 xi2 xi3 xid Denote the d input values for sample i by x i1, x i2,..., x id. Each of the d inputs has a weight w 1, w 2,..., w d. Compute prediction as weighted sum, d ŷ i = w 1x i1 + w 2x i2 + + w d x id = w jx ij j=1 6 / 23

15 Model of Single Neuron McCulloch and Pitts (1943): integrate and fire model (no hidden layers) yi w1 w2 w3 wd xi1 xi2 xi3 xid Denote the d input values for sample i by x i1, x i2,..., x id. Each of the d inputs has a weight w 1, w 2,..., w d. Compute prediction as weighted sum, d ŷ i = w 1x i1 + w 2x i2 + + w d x id = w jx ij j=1 Use ŷ i in some loss function: 1 (yi ŷi)2 2 6 / 23

16 Incorporating Hidden Layers Consider more than one neuron: yi w1 w2 w3 wk zi1 zi2 zi3 zik W11 Wkd xi1 xi2 xi3 xid 7 / 23

17 Incorporating Hidden Layers Consider more than one neuron: yi w1 w2 w3 wk zi1 zi2 zi3 zik W11 Wkd xi1 xi2 xi3 xid Input to hidden layer: function h of features from latent-factor model: z i = h(w x i). Each neuron has directed connection to ALL neurons of a subsequent layer. 7 / 23

18 Incorporating Hidden Layers Consider more than one neuron: yi w1 w2 w3 wk zi1 zi2 zi3 zik W11 Wkd xi1 xi2 xi3 xid Input to hidden layer: function h of features from latent-factor model: z i = h(w x i). Each neuron has directed connection to ALL neurons of a subsequent layer. This function h is often called the activation function. Each unit/node applies an activation function. 7 / 23

19 Linear Activation Function A linear activation function has the form where a is called bias (intercept). h(w x i) = a + W x i, 8 / 23

20 Linear Activation Function A linear activation function has the form where a is called bias (intercept). h(w x i) = a + W x i, Example: linear regression with linear bias (linear-linear model) Representation: z i = h(w x i ) (from latent-factor model) Prediction: ŷ i = w T z i Loss: 1 2 (y i ŷ i ) 2 8 / 23

21 Linear Activation Function A linear activation function has the form where a is called bias (intercept). h(w x i) = a + W x i, Example: linear regression with linear bias (linear-linear model) Representation: z i = h(w x i ) (from latent-factor model) Prediction: ŷ i = w T z i Loss: 1 2 (y i ŷ i ) 2 To train this model, we solve: 1 n argmin (y i w T z i) 2 = 1 w IR k,w IR k d 2 2 i=1 }{{} linear regression with z i as features n (y i w T h(w x i)) 2 i=1 8 / 23

22 Linear Activation Function A linear activation function has the form where a is called bias (intercept). h(w x i) = a + W x i, Example: linear regression with linear bias (linear-linear model) Representation: z i = h(w x i ) (from latent-factor model) Prediction: ŷ i = w T z i Loss: 1 2 (y i ŷ i ) 2 To train this model, we solve: 1 n argmin (y i w T z i) 2 = 1 w IR k,w IR k d 2 2 i=1 }{{} linear regression with z i as features n (y i w T h(w x i)) 2 i=1 But this is just a linear model: w T (W x i) = (W T w) T x i = w T x i 8 / 23

23 Binary Activation Function To increase flexibility, something needs to be non-linear. 9 / 23

24 Binary Activation Function To increase flexibility, something needs to be non-linear. A Heaviside step function has the form 1 if v a h(v) = 0 otherwise where a is the threshold. 9 / 23

25 Binary Activation Function To increase flexibility, something needs to be non-linear. A Heaviside step function has the form 1 if v a h(v) = 0 otherwise where a is the threshold. Example: Let a = 0, This yields a binary z i = h(w x i). 9 / 23

26 Binary Activation Function To increase flexibility, something needs to be non-linear. A Heaviside step function has the form 1 if v a h(v) = 0 otherwise where a is the threshold. Example: Let a = 0, This yields a binary z i = h(w x i). W x i has a concept encoded by each of its 2 k possible signs. 9 / 23

27 Perceptrons Minsky and Papert (late 50s). Algorithm for supervised learning of binary classifiers. Decides whether input belongs to a specific class or not. 10 / 23

28 Perceptrons Minsky and Papert (late 50s). Algorithm for supervised learning of binary classifiers. Decides whether input belongs to a specific class or not. Uses binary activation function. Can learn AND, OR, NOT functions. 10 / 23

29 Perceptrons Minsky and Papert (late 50s). Algorithm for supervised learning of binary classifiers. Decides whether input belongs to a specific class or not. Uses binary activation function. Can learn AND, OR, NOT functions. Perceptrons only capable of learning linearly separable patterns. 10 / 23

30 Learning Algorithm for Perceptrons The perceptron learning rule is given as follows: 1 For each x i and desired output y i in training set, ( ( 1 Calculate output: ŷ (t) i = h w (t)) ) T xi. 11 / 23

31 Learning Algorithm for Perceptrons The perceptron learning rule is given as follows: 1 For each x i and desired output y i in training set, ( ( 1 Calculate output: ŷ (t) i = h w (t)) ) T xi. 2 Update the weights: w (t+1) j = w (t) j + (y i ŷ (t) i )x ji 1 j d. 11 / 23

32 Learning Algorithm for Perceptrons The perceptron learning rule is given as follows: 1 For each x i and desired output y i in training set, ( ( 1 Calculate output: ŷ (t) i = h w (t)) ) T xi. 2 Update the weights: w (t+1) j = w (t) j + (y i ŷ (t) i )x ji 1 j d. 2 Continue these steps until 1 n 2 i=1 y i ŷ (t) i < γ (threshold). 11 / 23

33 Learning Algorithm for Perceptrons The perceptron learning rule is given as follows: 1 For each x i and desired output y i in training set, ( ( 1 Calculate output: ŷ (t) i = h w (t)) ) T xi. 2 Update the weights: w (t+1) j = w (t) j + (y i ŷ (t) i )x ji 1 j d. 2 Continue these steps until 1 n 2 i=1 y i ŷ (t) i < γ (threshold). If training set is linearly separable, then guaranteed to converge. (Rosenblatt, 1962). 11 / 23

34 Learning Algorithm for Perceptrons The perceptron learning rule is given as follows: 1 For each x i and desired output y i in training set, ( ( 1 Calculate output: ŷ (t) i = h w (t)) ) T xi. 2 Update the weights: w (t+1) j = w (t) j + (y i ŷ (t) i )x ji 1 j d. 2 Continue these steps until 1 n 2 i=1 y i ŷ (t) i < γ (threshold). If training set is linearly separable, then guaranteed to converge. (Rosenblatt, 1962). An upper bound exists on number of times weights adjusted in training. 11 / 23

35 Learning Algorithm for Perceptrons The perceptron learning rule is given as follows: 1 For each x i and desired output y i in training set, ( ( 1 Calculate output: ŷ (t) i = h w (t)) ) T xi. 2 Update the weights: w (t+1) j = w (t) j + (y i ŷ (t) i )x ji 1 j d. 2 Continue these steps until 1 n 2 i=1 y i ŷ (t) i < γ (threshold). If training set is linearly separable, then guaranteed to converge. (Rosenblatt, 1962). An upper bound exists on number of times weights adjusted in training. Can also use gradient descent if function is differentiable. 11 / 23

36 Non-Linear Continuous Activation Function What about a smooth approximation to the step function? 12 / 23

37 Non-Linear Continuous Activation Function What about a smooth approximation to the step function? A sigmoid function has the form h(v) = e v 12 / 23

38 Non-Linear Continuous Activation Function What about a smooth approximation to the step function? A sigmoid function has the form h(v) = e v Applying the sigmoid function element-wise, 1 z ic =. 1 + e W c T x i This is called a multi-layer perceptron or neural network. 12 / 23

39 Why Neural Network? A typical neuron. Neuron has many dendrites. Each dendrite takes input. Neuron has a single axon. Axon sends output. 13 / 23

40 Why Neural Network? A typical neuron. Neuron has many dendrites. Each dendrite takes input. Neuron has a single axon. Axon sends output. With the right input to dendrites: Action potential along axon. 13 / 23

41 Why Neural Network? First neuron: Each dendrite: x i1, x i1,..., x id Nucleus: computes Wc T x i 1 Axon: sends binary signal 1+e W c T x i Axon terminal: neurotransmitter, synapse Second neuron: Each dendrite receives: z i1, z i2,..., z ik Nucleus: computes w T z i Axon: sends signal ŷ i 14 / 23

42 Why Neural Network? First neuron: Each dendrite: x i1, x i1,..., x id Nucleus: computes Wc T x i 1 Axon: sends binary signal 1+e W c T x i Axon terminal: neurotransmitter, synapse Second neuron: Each dendrite receives: z i1, z i2,..., z ik Nucleus: computes w T z i Axon: sends signal ŷ i Describes a neural network with one hidden layer (2 neurons). 14 / 23

43 Why Neural Network? First neuron: Each dendrite: x i1, x i1,..., x id Nucleus: computes Wc T x i 1 Axon: sends binary signal 1+e W c T x i Axon terminal: neurotransmitter, synapse Second neuron: Each dendrite receives: z i1, z i2,..., z ik Nucleus: computes w T z i Axon: sends signal ŷ i Describes a neural network with one hidden layer (2 neurons). Human brain has approximately neurons. 14 / 23

44 Why Neural Network? First neuron: Each dendrite: x i1, x i1,..., x id Nucleus: computes Wc T x i 1 Axon: sends binary signal 1+e W c T x i Axon terminal: neurotransmitter, synapse Second neuron: Each dendrite receives: z i1, z i2,..., z ik Nucleus: computes w T z i Axon: sends signal ŷ i Describes a neural network with one hidden layer (2 neurons). Human brain has approximately neurons. Each neuron connected to approximately 10 4 neurons. 14 / 23

45 Why Neural Network? First neuron: Each dendrite: x i1, x i1,..., x id Nucleus: computes Wc T x i 1 Axon: sends binary signal 1+e W c T x i Axon terminal: neurotransmitter, synapse Second neuron: Each dendrite receives: z i1, z i2,..., z ik Nucleus: computes w T z i Axon: sends signal ŷ i Describes a neural network with one hidden layer (2 neurons). Human brain has approximately neurons. Each neuron connected to approximately 10 4 neurons / 23

46 Artificial Neural Nets vs. Real Neural Nets yi Artificial neural network: x i is measuring of the world. z i is internal representation of the world. ŷ i as output of neuron for classification/regression. W11 w1 w2 w3 wk zi1 zi2 zi3 zik Wkd xi1 xi2 xi3 xid 15 / 23

47 Artificial Neural Nets vs. Real Neural Nets yi Artificial neural network: x i is measuring of the world. z i is internal representation of the world. ŷ i as output of neuron for classification/regression. W11 w1 w2 w3 wk zi1 zi2 zi3 zik Wkd xi1 xi2 xi3 xid Real neural networks are more complicated: Timing of action potentials seems to be important. Rate coding: frequency of action potentials simulates continuous output. 15 / 23

48 Artificial Neural Nets vs. Real Neural Nets yi Artificial neural network: x i is measuring of the world. z i is internal representation of the world. ŷ i as output of neuron for classification/regression. W11 w1 w2 w3 wk zi1 zi2 zi3 zik Wkd xi1 xi2 xi3 xid Real neural networks are more complicated: Timing of action potentials seems to be important. Rate coding: frequency of action potentials simulates continuous output. Neural networks don t reflect sparsity of action potentials. 15 / 23

49 Artificial Neural Nets vs. Real Neural Nets yi Artificial neural network: x i is measuring of the world. z i is internal representation of the world. ŷ i as output of neuron for classification/regression. W11 w1 w2 w3 wk zi1 zi2 zi3 zik Wkd xi1 xi2 xi3 xid Real neural networks are more complicated: Timing of action potentials seems to be important. Rate coding: frequency of action potentials simulates continuous output. Neural networks don t reflect sparsity of action potentials. How much computation is done inside neuron? 15 / 23

50 Artificial Neural Nets vs. Real Neural Nets yi Artificial neural network: x i is measuring of the world. z i is internal representation of the world. ŷ i as output of neuron for classification/regression. W11 w1 w2 w3 wk zi1 zi2 zi3 zik Wkd xi1 xi2 xi3 xid Real neural networks are more complicated: Timing of action potentials seems to be important. Rate coding: frequency of action potentials simulates continuous output. Neural networks don t reflect sparsity of action potentials. How much computation is done inside neuron? Brain is highly organized (e.g., substructures and cortical columns). Connection structure changes. 15 / 23

51 Artificial Neural Nets vs. Real Neural Nets yi Artificial neural network: x i is measuring of the world. z i is internal representation of the world. ŷ i as output of neuron for classification/regression. W11 w1 w2 w3 wk zi1 zi2 zi3 zik Wkd xi1 xi2 xi3 xid Real neural networks are more complicated: Timing of action potentials seems to be important. Rate coding: frequency of action potentials simulates continuous output. Neural networks don t reflect sparsity of action potentials. How much computation is done inside neuron? Brain is highly organized (e.g., substructures and cortical columns). Connection structure changes. Different types of neurotransmitters. 15 / 23

52 The Universal Approximation Theorem One of the first versions proven by George Cybenko (1989). 16 / 23

53 The Universal Approximation Theorem One of the first versions proven by George Cybenko (1989). For feedforward neural networks, the universal approximation theorem:... claims that every continuous function defined on a compact set can be arbitrarily well approximated by a neural network with one hidden layer. 16 / 23

54 The Universal Approximation Theorem One of the first versions proven by George Cybenko (1989). For feedforward neural networks, the universal approximation theorem:... claims that every continuous function defined on a compact set can be arbitrarily well approximated by a neural network with one hidden layer. Thus, a simple neural network capable of representing a wide variety of functions when given appropriate parameters. 16 / 23

55 The Universal Approximation Theorem One of the first versions proven by George Cybenko (1989). For feedforward neural networks, the universal approximation theorem:... claims that every continuous function defined on a compact set can be arbitrarily well approximated by a neural network with one hidden layer. Thus, a simple neural network capable of representing a wide variety of functions when given appropriate parameters. Algorithmic learnability of those parameters? 16 / 23

56 Artificial Neural Networks With squared loss, our objective function is: 1 argmin w IR k,w IR k d 2 n (y i w T h(w x i)) 2 i=1 17 / 23

57 Artificial Neural Networks With squared loss, our objective function is: 1 argmin w IR k,w IR k d 2 n (y i w T h(w x i)) 2 i=1 Usual training procedure: stochastic gradient. Compute gradient of random example i, update w and W. Computing the gradient is known as backpropagation. 17 / 23

58 Artificial Neural Networks With squared loss, our objective function is: 1 argmin w IR k,w IR k d 2 n (y i w T h(w x i)) 2 i=1 Usual training procedure: stochastic gradient. Compute gradient of random example i, update w and W. Computing the gradient is known as backpropagation. Adding regularization to w and/or W is known as weight decay. 17 / 23

59 Backpropagation Output values ŷ i compared to correct values y i to compute error. 18 / 23

60 Backpropagation Output values ŷ i compared to correct values y i to compute error. Error is fed back through the network (backpropagation of errors). 18 / 23

61 Backpropagation Output values ŷ i compared to correct values y i to compute error. Error is fed back through the network (backpropagation of errors). Weights are adjusted to reduce value of error function by small amount. 18 / 23

62 Backpropagation Output values ŷ i compared to correct values y i to compute error. Error is fed back through the network (backpropagation of errors). Weights are adjusted to reduce value of error function by small amount. After number of cycles, network converges to state where error is small. 18 / 23

63 Backpropagation Output values ŷ i compared to correct values y i to compute error. Error is fed back through the network (backpropagation of errors). Weights are adjusted to reduce value of error function by small amount. After number of cycles, network converges to state where error is small. To adjust weights, use non-linear optimization method gradient descent. Backpropagation can only be applied to differentiable activation functions. 18 / 23

64 Backpropagation Algorithm repeat until some stopping criterion is satisfied 19 / 23

65 Backpropagation Algorithm repeat for each weight w i,j in the network do w i,j a small random number for each example (x, y) do until some stopping criterion is satisfied 19 / 23

66 Backpropagation Algorithm repeat for each weight w i,j in the network do w i,j a small random number for each example (x, y) do /*Propagate the inputs forward to compute the outputs*/ /*Propagate the deltas backwards from output layer to input layer*/ /*Update every weight in network using deltas*/ until some stopping criterion is satisfied 19 / 23

67 Backpropagation Algorithm repeat for each weight w i,j in the network do w i,j a small random number for each example (x, y) do /*Propagate the inputs forward to compute the outputs*/ for each node i in the input layer do a i x i for l = 2 to L do for each node j in layer l do v j i wi,jai a j h(v j) /*Propagate the deltas backwards from output layer to input layer*/ /*Update every weight in network using deltas*/ until some stopping criterion is satisfied 19 / 23

68 Backpropagation Algorithm repeat for each weight w i,j in the network do w i,j a small random number for each example (x, y) do /*Propagate the inputs forward to compute the outputs*/ for each node i in the input layer do a i x i for l = 2 to L do for each node j in layer l do v j i wi,jai a j h(v j) /*Propagate the deltas backwards from output layer to input layer*/ for each node j in output layer do [j] h (v j)(y j a j) for l = L 1 to 1 do for each node i in layer l do [i] h (v i) j wi,j [j] /*Update every weight in network using deltas*/ until some stopping criterion is satisfied 19 / 23

69 Backpropagation Algorithm repeat for each weight w i,j in the network do w i,j a small random number for each example (x, y) do /*Propagate the inputs forward to compute the outputs*/ for each node i in the input layer do a i x i for l = 2 to L do for each node j in layer l do v j i wi,jai a j h(v j) /*Propagate the deltas backwards from output layer to input layer*/ for each node j in output layer do [j] h (v j)(y j a j) for l = L 1 to 1 do for each node i in layer l do [i] h (v i) j wi,j [j] /*Update every weight in network using deltas*/ for each weight w i,j in network do w i,j w i,j + αa i [j] until some stopping criterion is satisfied 19 / 23

70 Backpropagation Derivations of the back-propagation equations are very similar to the gradient calculation for logistic regression. Uses the multivariable chain rule more than once. 20 / 23

71 Backpropagation Derivations of the back-propagation equations are very similar to the gradient calculation for logistic regression. Uses the multivariable chain rule more than once. Consider the loss for a single node: f(w, W ) = 1 k 2 (yi w jh(w jx i)) 2. j=1 20 / 23

72 Backpropagation Derivations of the back-propagation equations are very similar to the gradient calculation for logistic regression. Uses the multivariable chain rule more than once. Consider the loss for a single node: f(w, W ) = 1 k 2 (yi w jh(w jx i)) 2. j=1 Derivatives with respect to w j (weights connect hidden to output layer): f(w, W ) w j = (y i k w jh(w jx i)) h(w jx i) j=1 20 / 23

73 Backpropagation Derivations of the back-propagation equations are very similar to the gradient calculation for logistic regression. Uses the multivariable chain rule more than once. Consider the loss for a single node: f(w, W ) = 1 k 2 (yi w jh(w jx i)) 2. j=1 Derivatives with respect to w j (weights connect hidden to output layer): f(w, W ) w j = (y i k w jh(w jx i)) h(w jx i) j=1 Derivative with respect to W ij (weights connect input to hidden layer): f(w, W ) W ij = (y i k w jh(w jx i)) w jh (W jx i)x ij j=1 20 / 23

74 Backpropagation Notice repeated calculations in gradients: f(w, W ) w j f(w, W ) W ij = (y i = (y i k w jh(w jx)) h(w jx) j=1 k w jh(w jx)) w jh (W jx)x i j=1 21 / 23

75 Backpropagation Notice repeated calculations in gradients: f(w, W ) w j f(w, W ) W ij = (y i = (y i k w jh(w jx)) h(w jx) j=1 k w jh(w jx)) w jh (W jx)x i j=1 Same value for f(w,w ) w j with all j and f(w,w ) W ij for all i and j. Same value for f(w,w ) W ij with all i. 21 / 23

76 Disadvantages of Neural Networks Cannot be used if very little available data. 22 / 23

77 Disadvantages of Neural Networks Cannot be used if very little available data. Many free parameters (e.g., number of hidden nodes, learning rate, minimal error, etc.) 22 / 23

78 Disadvantages of Neural Networks Cannot be used if very little available data. Many free parameters (e.g., number of hidden nodes, learning rate, minimal error, etc.) Not great for precision calculations. 22 / 23

79 Disadvantages of Neural Networks Cannot be used if very little available data. Many free parameters (e.g., number of hidden nodes, learning rate, minimal error, etc.) Not great for precision calculations. Do not provide explanations (unlike slopes in linear models that represent correlations). 22 / 23

80 Disadvantages of Neural Networks Cannot be used if very little available data. Many free parameters (e.g., number of hidden nodes, learning rate, minimal error, etc.) Not great for precision calculations. Do not provide explanations (unlike slopes in linear models that represent correlations). Network overfits training data, fails to capture true statistical process behind the data. Early stopping can be used to avoid this. 22 / 23

81 Disadvantages of Neural Networks Cannot be used if very little available data. Many free parameters (e.g., number of hidden nodes, learning rate, minimal error, etc.) Not great for precision calculations. Do not provide explanations (unlike slopes in linear models that represent correlations). Network overfits training data, fails to capture true statistical process behind the data. Early stopping can be used to avoid this. Speed of convergence. Local minimia. 22 / 23

82 Advantages of Neural Networks Easy to maintain. Versatile. Used on problems for which analytical methods do not yet exist. 23 / 23

83 Advantages of Neural Networks Easy to maintain. Versatile. Used on problems for which analytical methods do not yet exist. Models non-linear dependencies. Quickly work out patterns, even with noisy data. 23 / 23

84 Advantages of Neural Networks Easy to maintain. Versatile. Used on problems for which analytical methods do not yet exist. Models non-linear dependencies. Quickly work out patterns, even with noisy data. Always outputs answer, even when input is incomplete. 23 / 23

85 Advantages of Neural Networks Easy to maintain. Versatile. Used on problems for which analytical methods do not yet exist. Models non-linear dependencies. Quickly work out patterns, even with noisy data. Always outputs answer, even when input is incomplete. 23 / 23

Lecture 4: Feed Forward Neural Networks

Lecture 4: Feed Forward Neural Networks Lecture 4: Feed Forward Neural Networks Dr. Roman V Belavkin Middlesex University BIS4435 Biological neurons and the brain A Model of A Single Neuron Neurons as data-driven models Neural Networks Training

More information

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA   1/ 21 Neural Networks Chapter 8, Section 7 TB Artificial Intelligence Slides from AIMA http://aima.cs.berkeley.edu / 2 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,

More information

Introduction to Artificial Neural Networks

Introduction to Artificial Neural Networks Facultés Universitaires Notre-Dame de la Paix 27 March 2007 Outline 1 Introduction 2 Fundamentals Biological neuron Artificial neuron Artificial Neural Network Outline 3 Single-layer ANN Perceptron Adaline

More information

CSC242: Intro to AI. Lecture 21

CSC242: Intro to AI. Lecture 21 CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages

More information

Neural networks. Chapter 20, Section 5 1

Neural networks. Chapter 20, Section 5 1 Neural networks Chapter 20, Section 5 Chapter 20, Section 5 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 20, Section 5 2 Brains 0 neurons of

More information

Neural networks. Chapter 19, Sections 1 5 1

Neural networks. Chapter 19, Sections 1 5 1 Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10

More information

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav Neural Networks 30.11.2015 Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav 1 Talk Outline Perceptron Combining neurons to a network Neural network, processing input to an output Learning Cost

More information

Neural networks. Chapter 20. Chapter 20 1

Neural networks. Chapter 20. Chapter 20 1 Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Lecture 4: Perceptrons and Multilayer Perceptrons

Lecture 4: Perceptrons and Multilayer Perceptrons Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons

More information

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

More information

Artifical Neural Networks

Artifical Neural Networks Neural Networks Artifical Neural Networks Neural Networks Biological Neural Networks.................................. Artificial Neural Networks................................... 3 ANN Structure...........................................

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement

More information

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided

More information

Introduction Biologically Motivated Crude Model Backpropagation

Introduction Biologically Motivated Crude Model Backpropagation Introduction Biologically Motivated Crude Model Backpropagation 1 McCulloch-Pitts Neurons In 1943 Warren S. McCulloch, a neuroscientist, and Walter Pitts, a logician, published A logical calculus of the

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE 4: Linear Systems Summary # 3: Introduction to artificial neural networks DISTRIBUTED REPRESENTATION An ANN consists of simple processing units communicating with each other. The basic elements of

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning

More information

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1 Neural Networks Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1 Brains as Computational Devices Brains advantages with respect to digital computers: Massively parallel Fault-tolerant Reliable

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

CS:4420 Artificial Intelligence

CS:4420 Artificial Intelligence CS:4420 Artificial Intelligence Spring 2018 Neural Networks Cesare Tinelli The University of Iowa Copyright 2004 18, Cesare Tinelli and Stuart Russell a a These notes were originally developed by Stuart

More information

Revision: Neural Network

Revision: Neural Network Revision: Neural Network Exercise 1 Tell whether each of the following statements is true or false by checking the appropriate box. Statement True False a) A perceptron is guaranteed to perfectly learn

More information

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau Last update: October 26, 207 Neural networks CMSC 42: Section 8.7 Dana Nau Outline Applications of neural networks Brains Neural network units Perceptrons Multilayer perceptrons 2 Example Applications

More information

Neural Networks: Introduction

Neural Networks: Introduction Neural Networks: Introduction Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others 1

More information

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November.

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November. COGS Q250 Fall 2012 Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November. For the first two questions of the homework you will need to understand the learning algorithm using the delta

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)

More information

Multilayer Neural Networks

Multilayer Neural Networks Multilayer Neural Networks Multilayer Neural Networks Discriminant function flexibility NON-Linear But with sets of linear parameters at each layer Provably general function approximators for sufficient

More information

Multilayer Perceptrons and Backpropagation

Multilayer Perceptrons and Backpropagation Multilayer Perceptrons and Backpropagation Informatics 1 CG: Lecture 7 Chris Lucas School of Informatics University of Edinburgh January 31, 2017 (Slides adapted from Mirella Lapata s.) 1 / 33 Reading:

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Sections 18.6 and 18.7 Artificial Neural Networks

Sections 18.6 and 18.7 Artificial Neural Networks Sections 18.6 and 18.7 Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline The brain vs. artifical neural

More information

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5 Artificial Neural Networks Q550: Models in Cognitive Science Lecture 5 "Intelligence is 10 million rules." --Doug Lenat The human brain has about 100 billion neurons. With an estimated average of one thousand

More information

Sections 18.6 and 18.7 Artificial Neural Networks

Sections 18.6 and 18.7 Artificial Neural Networks Sections 18.6 and 18.7 Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline The brain vs artifical neural networks

More information

4. Multilayer Perceptrons

4. Multilayer Perceptrons 4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output

More information

Neural Networks and the Back-propagation Algorithm

Neural Networks and the Back-propagation Algorithm Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Artificial Neural Networks and Nonparametric Methods CMPSCI 383 Nov 17, 2011! Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011! 1 Todayʼs lecture" How the brain works (!)! Artificial neural networks! Perceptrons! Multilayer feed-forward networks! Error

More information

Supervised (BPL) verses Hybrid (RBF) Learning. By: Shahed Shahir

Supervised (BPL) verses Hybrid (RBF) Learning. By: Shahed Shahir Supervised (BPL) verses Hybrid (RBF) Learning By: Shahed Shahir 1 Outline I. Introduction II. Supervised Learning III. Hybrid Learning IV. BPL Verses RBF V. Supervised verses Hybrid learning VI. Conclusion

More information

Artificial Neural Networks. Historical description

Artificial Neural Networks. Historical description Artificial Neural Networks Historical description Victor G. Lopez 1 / 23 Artificial Neural Networks (ANN) An artificial neural network is a computational model that attempts to emulate the functions of

More information

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Neural Networks CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Perceptrons x 0 = 1 x 1 x 2 z = h w T x Output: z x D A perceptron

More information

18.6 Regression and Classification with Linear Models

18.6 Regression and Classification with Linear Models 18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight

More information

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.

More information

Neural Networks biological neuron artificial neuron 1

Neural Networks biological neuron artificial neuron 1 Neural Networks biological neuron artificial neuron 1 A two-layer neural network Output layer (activation represents classification) Weighted connections Hidden layer ( internal representation ) Input

More information

Artificial Neural Networks

Artificial Neural Networks Introduction ANN in Action Final Observations Application: Poverty Detection Artificial Neural Networks Alvaro J. Riascos Villegas University of los Andes and Quantil July 6 2018 Artificial Neural Networks

More information

Simple Neural Nets For Pattern Classification

Simple Neural Nets For Pattern Classification CHAPTER 2 Simple Neural Nets For Pattern Classification Neural Networks General Discussion One of the simplest tasks that neural nets can be trained to perform is pattern classification. In pattern classification

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET Unit-. Definition Neural network is a massively parallel distributed processing system, made of highly inter-connected neural computing elements that have the ability to learn and thereby acquire knowledge

More information

COMP9444 Neural Networks and Deep Learning 2. Perceptrons. COMP9444 c Alan Blair, 2017

COMP9444 Neural Networks and Deep Learning 2. Perceptrons. COMP9444 c Alan Blair, 2017 COMP9444 Neural Networks and Deep Learning 2. Perceptrons COMP9444 17s2 Perceptrons 1 Outline Neurons Biological and Artificial Perceptron Learning Linear Separability Multi-Layer Networks COMP9444 17s2

More information

Lecture 7 Artificial neural networks: Supervised learning

Lecture 7 Artificial neural networks: Supervised learning Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in

More information

Introduction To Artificial Neural Networks

Introduction To Artificial Neural Networks Introduction To Artificial Neural Networks Machine Learning Supervised circle square circle square Unsupervised group these into two categories Supervised Machine Learning Supervised Machine Learning Supervised

More information

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders

More information

Master Recherche IAC TC2: Apprentissage Statistique & Optimisation

Master Recherche IAC TC2: Apprentissage Statistique & Optimisation Master Recherche IAC TC2: Apprentissage Statistique & Optimisation Alexandre Allauzen Anne Auger Michèle Sebag LIMSI LRI Oct. 4th, 2012 This course Bio-inspired algorithms Classical Neural Nets History

More information

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks Topics in Machine Learning-EE 5359 Neural Networks 1 The Perceptron Output: A perceptron is a function that maps D-dimensional vectors to real numbers. For notational convenience, we add a zero-th dimension

More information

Artificial Neural Networks The Introduction

Artificial Neural Networks The Introduction Artificial Neural Networks The Introduction 01001110 01100101 01110101 01110010 01101111 01101110 01101111 01110110 01100001 00100000 01110011 01101011 01110101 01110000 01101001 01101110 01100001 00100000

More information

Neural Networks. Xiaojin Zhu Computer Sciences Department University of Wisconsin, Madison. slide 1

Neural Networks. Xiaojin Zhu Computer Sciences Department University of Wisconsin, Madison. slide 1 Neural Networks Xiaoin Zhu erryzhu@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison slide 1 Terminator 2 (1991) JOHN: Can you learn? So you can be... you know. More human. Not

More information

Unit III. A Survey of Neural Network Model

Unit III. A Survey of Neural Network Model Unit III A Survey of Neural Network Model 1 Single Layer Perceptron Perceptron the first adaptive network architecture was invented by Frank Rosenblatt in 1957. It can be used for the classification of

More information

Multilayer Neural Networks

Multilayer Neural Networks Multilayer Neural Networks Introduction Goal: Classify objects by learning nonlinearity There are many problems for which linear discriminants are insufficient for minimum error In previous methods, the

More information

Artificial Neural Network

Artificial Neural Network Artificial Neural Network Contents 2 What is ANN? Biological Neuron Structure of Neuron Types of Neuron Models of Neuron Analogy with human NN Perceptron OCR Multilayer Neural Network Back propagation

More information

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller 2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks Todd W. Neller Machine Learning Learning is such an important part of what we consider "intelligence" that

More information

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009 AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is

More information

Neural Networks. Fundamentals of Neural Networks : Architectures, Algorithms and Applications. L, Fausett, 1994

Neural Networks. Fundamentals of Neural Networks : Architectures, Algorithms and Applications. L, Fausett, 1994 Neural Networks Neural Networks Fundamentals of Neural Networks : Architectures, Algorithms and Applications. L, Fausett, 1994 An Introduction to Neural Networks (nd Ed). Morton, IM, 1995 Neural Networks

More information

CSC321 Lecture 5: Multilayer Perceptrons

CSC321 Lecture 5: Multilayer Perceptrons CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Threshold units Gradient descent Multilayer networks Backpropagation Hidden layer representations Example: Face Recognition Advanced topics 1 Connectionist Models Consider humans:

More information

Neural Nets Supervised learning

Neural Nets Supervised learning 6.034 Artificial Intelligence Big idea: Learning as acquiring a function on feature vectors Background Nearest Neighbors Identification Trees Neural Nets Neural Nets Supervised learning y s(z) w w 0 w

More information

Chapter ML:VI. VI. Neural Networks. Perceptron Learning Gradient Descent Multilayer Perceptron Radial Basis Functions

Chapter ML:VI. VI. Neural Networks. Perceptron Learning Gradient Descent Multilayer Perceptron Radial Basis Functions Chapter ML:VI VI. Neural Networks Perceptron Learning Gradient Descent Multilayer Perceptron Radial asis Functions ML:VI-1 Neural Networks STEIN 2005-2018 The iological Model Simplified model of a neuron:

More information

Artificial Neural Networks. Edward Gatt

Artificial Neural Networks. Edward Gatt Artificial Neural Networks Edward Gatt What are Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning Very

More information

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs) Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w x + w 2 x 2 + w 0 = 0 Feature x 2 = w w 2 x w 0 w 2 Feature 2 A perceptron can separate

More information

2018 EE448, Big Data Mining, Lecture 5. (Part II) Weinan Zhang Shanghai Jiao Tong University

2018 EE448, Big Data Mining, Lecture 5. (Part II) Weinan Zhang Shanghai Jiao Tong University 2018 EE448, Big Data Mining, Lecture 5 Supervised Learning (Part II) Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html Content of Supervised Learning

More information

Machine Learning (CSE 446): Neural Networks

Machine Learning (CSE 446): Neural Networks Machine Learning (CSE 446): Neural Networks Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 6, 2017 1 / 22 Admin No Wednesday office hours for Noah; no lecture Friday. 2 /

More information

Machine Learning. Neural Networks

Machine Learning. Neural Networks Machine Learning Neural Networks Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 Biological Analogy Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 THE

More information

Learning and Memory in Neural Networks

Learning and Memory in Neural Networks Learning and Memory in Neural Networks Guy Billings, Neuroinformatics Doctoral Training Centre, The School of Informatics, The University of Edinburgh, UK. Neural networks consist of computational units

More information

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ

More information

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs) Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w 1 x 1 + w 2 x 2 + w 0 = 0 Feature 1 x 2 = w 1 w 2 x 1 w 0 w 2 Feature 2 A perceptron

More information

Unit 8: Introduction to neural networks. Perceptrons

Unit 8: Introduction to neural networks. Perceptrons Unit 8: Introduction to neural networks. Perceptrons D. Balbontín Noval F. J. Martín Mateos J. L. Ruiz Reina A. Riscos Núñez Departamento de Ciencias de la Computación e Inteligencia Artificial Universidad

More information

y(x n, w) t n 2. (1)

y(x n, w) t n 2. (1) Network training: Training a neural network involves determining the weight parameter vector w that minimizes a cost function. Given a training set comprising a set of input vector {x n }, n = 1,...N,

More information

AI Programming CS F-20 Neural Networks

AI Programming CS F-20 Neural Networks AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco 20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols

More information

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks. MGS Lecture 2 Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation

More information

Logistic Regression & Neural Networks

Logistic Regression & Neural Networks Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability

More information

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural Networks. Nicholas Ruozzi University of Texas at Dallas Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify

More information

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

More information

Artificial neural networks

Artificial neural networks Artificial neural networks Chapter 8, Section 7 Artificial Intelligence, spring 203, Peter Ljunglöf; based on AIMA Slides c Stuart Russel and Peter Norvig, 2004 Chapter 8, Section 7 Outline Brains Neural

More information

Neural networks COMS 4771

Neural networks COMS 4771 Neural networks COMS 4771 1. Logistic regression Logistic regression Suppose X = R d and Y = {0, 1}. A logistic regression model is a statistical model where the conditional probability function has a

More information

Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso

Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Fall, 2018 Outline Introduction A Brief History ANN Architecture Terminology

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Oliver Schulte - CMPT 310 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will focus on

More information

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler + Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Prof. Alexander Ihler Linear Classifiers (Perceptrons) Linear Classifiers a linear classifier is a mapping which partitions

More information

Neural networks (not in book)

Neural networks (not in book) (not in book) Another approach to classification is neural networks. were developed in the 1980s as a way to model how learning occurs in the brain. There was therefore wide interest in neural networks

More information

Machine Learning Lecture 10

Machine Learning Lecture 10 Machine Learning Lecture 10 Neural Networks 26.11.2018 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Today s Topic Deep Learning 2 Course Outline Fundamentals Bayes

More information

Artificial Neural Network

Artificial Neural Network Artificial Neural Network Eung Je Woo Department of Biomedical Engineering Impedance Imaging Research Center (IIRC) Kyung Hee University Korea ejwoo@khu.ac.kr Neuron and Neuron Model McCulloch and Pitts

More information

C4 Phenomenological Modeling - Regression & Neural Networks : Computational Modeling and Simulation Instructor: Linwei Wang

C4 Phenomenological Modeling - Regression & Neural Networks : Computational Modeling and Simulation Instructor: Linwei Wang C4 Phenomenological Modeling - Regression & Neural Networks 4040-849-03: Computational Modeling and Simulation Instructor: Linwei Wang Recall.. The simple, multiple linear regression function ŷ(x) = a

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture

More information

Artificial Neural Network and Fuzzy Logic

Artificial Neural Network and Fuzzy Logic Artificial Neural Network and Fuzzy Logic 1 Syllabus 2 Syllabus 3 Books 1. Artificial Neural Networks by B. Yagnanarayan, PHI - (Cover Topologies part of unit 1 and All part of Unit 2) 2. Neural Networks

More information

Artificial Neuron (Perceptron)

Artificial Neuron (Perceptron) 9/6/208 Gradient Descent (GD) Hantao Zhang Deep Learning with Python Reading: https://en.wikipedia.org/wiki/gradient_descent Artificial Neuron (Perceptron) = w T = w 0 0 + + w 2 2 + + w d d where

More information

Ch.8 Neural Networks

Ch.8 Neural Networks Ch.8 Neural Networks Hantao Zhang http://www.cs.uiowa.edu/ hzhang/c145 The University of Iowa Department of Computer Science Artificial Intelligence p.1/?? Brains as Computational Devices Motivation: Algorithms

More information

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of

More information

Input layer. Weight matrix [ ] Output layer

Input layer. Weight matrix [ ] Output layer MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2003 Recitation 10, November 4 th & 5 th 2003 Learning by perceptrons

More information

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples Back-Propagation Algorithm Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples 1 Inner-product net =< w, x >= w x cos(θ) net = n i=1 w i x i A measure

More information

Artificial Neural Networks. Part 2

Artificial Neural Networks. Part 2 Artificial Neural Netorks Part Artificial Neuron Model Folloing simplified model of real neurons is also knon as a Threshold Logic Unit x McCullouch-Pitts neuron (943) x x n n Body of neuron f out Biological

More information

Lab 5: 16 th April Exercises on Neural Networks

Lab 5: 16 th April Exercises on Neural Networks Lab 5: 16 th April 01 Exercises on Neural Networks 1. What are the values of weights w 0, w 1, and w for the perceptron whose decision surface is illustrated in the figure? Assume the surface crosses the

More information

CSC 411 Lecture 10: Neural Networks

CSC 411 Lecture 10: Neural Networks CSC 411 Lecture 10: Neural Networks Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 10-Neural Networks 1 / 35 Inspiration: The Brain Our brain has 10 11

More information