Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Size: px
Start display at page:

Download "Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington"

Transcription

1 Neural Networks CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1

2 Perceptrons x 0 = 1 x 1 x 2 z = h w T x Output: z x D A perceptron is a function that maps D-dimensional vectors to real numbers. For notational convenience, we add a zero-th dimension to every input vector, that is always equal to 1. x 0 is called the bias input. It is always equal to 1. x = 1 x 1 x 2 x D w 0 is called the bias weight. It is optimized during training. 2

3 Perceptrons x 0 = 1 x 1 x 2 x D z = h w T x Output: z A perceptron computes its output z in two steps: First step: a = w T x = Second step: z = h a D i=0 w i x i D In a single formula: z = h i=0 w i x i 3

4 Perceptrons x 0 = 1 x 1 x 2 x D z = h w T x Output: z A perceptron computes its output z in two steps: First step: a = w T x = Second step: z = h a D i=0 w i x i h is called an activation function. For example, h could be the sigmoidal function σ a = 1 1+e a 4

5 Perceptrons x 0 = 1 x 1 x 2 x D z = h w T x Output: z We have seen perceptrons before, we just did not call them perceptrons. For example, logistic regression produces a classifier function y x = σ w Τ φ(x). If we set φ x = x and h = σ, then y x is a perceptron. 5

6 Perceptrons x 0 = 1 and Neurons x 1 z = h w T x Output: z x 2 x D Perceptrons are inspired by neurons. Neurons are the cells forming the nervous system, and the brain. Neurons somehow sum up their inputs, and if the sum exceeds a threshold, they "fire". Since brains are "intelligent", computer scientists have been hoping that perceptron-based systems can be used to model intelligence. 6

7 Activation Functions A perceptron produces output z = h w T x. One choice for the activation function h: the step function. h a = 0, if a < 0 1, if a 0 The step function is useful for providing some intuitive examples. It is not useful for actual real-world systems. Reason: it is not differentiable, it does not allow optimization via gradient descent. 7

8 Activation Functions A perceptron produces output z = h w T x. Another choice for the activation function h(a): the sigmoidal function. σ a = 1 1+e a The sigmoidal is often used in real-world systems. It is a differentiable function, it allows use of gradient descent. 8

9 Example: The AND Perceptron Suppose we use the step function for activation. Suppose boolean value false is represented as number 0. Suppose boolean value true is represented as number 1. Then, the perceptron below computes the boolean AND function: x 0 = 1 false AND false = false false AND true = false true AND false = false true AND true = true x 1 z = h w T x Output: z x 2 9

10 Example: The AND Perceptron Verification: If x 1 = 0 and x 2 = 0: w T x = = 1.5. h w T x = h 1.5 = 0. Corresponds to case false AND false = false. x 0 = 1 false AND false = false false AND true = false true AND false = false true AND true = true x 1 z = h w T x Output: z x 2 10

11 Example: The AND Perceptron Verification: If x 1 = 0 and x 2 = 1: w T x = = 0.5. h w T x = h 0.5 = 0. Corresponds to case false AND true = false. x 0 = 1 false AND false = false false AND true = false true AND false = false true AND true = true x 1 z = h w T x Output: z x 2 11

12 Example: The AND Perceptron Verification: If x 1 = 1 and x 2 = 0: w T x = = 0.5. h w T x = h 0.5 = 0. Corresponds to case true AND false = false. x 0 = 1 false AND false = false false AND true = false true AND false = false true AND true = true x 1 z = h w T x Output: z x 2 12

13 Example: The AND Perceptron Verification: If x 1 = 1 and x 2 = 1: w T x = = 0.5. h w T x = h 0.5 = 1. Corresponds to case true AND true = true. x 0 = 1 false AND false = false false AND true = false true AND false = false true AND true = true x 1 z = h w T x Output: z x 2 13

14 Example: The OR Perceptron Suppose we use the step function for activation. Suppose boolean value false is represented as number 0. Suppose boolean value true is represented as number 1. Then, the perceptron below computes the boolean OR function: x 0 = 1 false OR false = false false OR true = true true OR false = true true OR true = true x 1 z = h w T x Output: z x 2 14

15 Example: The OR Perceptron Verification: If x 1 = 0 and x 2 = 0: w T x = = 0.5. h w T x = h 0.5 = 0. Corresponds to case false OR false = false. x 0 = 1 false OR false = false false OR true = true true OR false = true true OR true = true x 1 z = h w T x Output: z x 2 15

16 Example: The OR Perceptron Verification: If x 1 = 0 and x 2 = 1: w T x = = 0.5. h w T x = h 0.5 = 1. Corresponds to case false OR true = true. x 0 = 1 false OR false = false false OR true = true true OR false = true true OR true = true x 1 z = h w T x Output: z x 2 16

17 Example: The OR Perceptron Verification: If x 1 = 1 and x 2 = 0: w T x = = 0.5. h w T x = h 0.5 = 1. Corresponds to case true OR false = true. x 0 = 1 false OR false = false false OR true = true true OR false = true true OR true = true x 1 z = h w T x Output: z x 2 17

18 Example: The OR Perceptron Verification: If x 1 = 1 and x 2 = 1: w T x = = 1.5. h w T x = h 1.5 = 1. Corresponds to case true OR true = true. x 0 = 1 false OR false = false false OR true = true true OR false = true true OR true = true x 1 z = h w T x Output: z x 2 18

19 Example: The NOT Perceptron Suppose we use the step function for activation. Suppose boolean value false is represented as number 0. Suppose boolean value true is represented as number 1. Then, the perceptron below computes the boolean NOT function: x 0 = 1 NOT(false) = true NOT(true) = false x 1 w 1 = 1 z = h w T x Output: z 19

20 Example: The NOT Perceptron Verification: If x 1 = 0: w T x = = 0.5. h w T x = h 0.5 = 1. Corresponds to case NOT(false) = true. x 0 = 1 NOT(false) = true NOT(true) = false x 1 w 1 = 1 z = h w T x Output: z 20

21 Example: The NOT Perceptron Verification: If x 1 = 1: w T x = = 0.5. h w T x = h 0.5 = 0. Corresponds to case NOT(true) = false. x 0 = 1 NOT(false) = true NOT(true) = false x 1 w 1 = 1 z = h w T x Output: z 21

22 The XOR Function false XOR false = false false XOR true = true true XOR false = true true XOR true = false As before, we represent false with 0 and true with 1. The figure shows the four input points of the XOR function. green corresponds to output value true. red corresponds to output value false. The two classes (true and false) are not linearly separable. Therefore, no perceptron can compute the XOR function. 22

23 Neural Networks A neural network is built using perceptrons as building blocks. The inputs to some perceptrons are outputs of other perceptrons. Here is an example neural network computing the XOR function. Unit 3 x 1 Unit 5 Output: x 2 Unit 4 23

24 Neural Networks To simplify the picture, we do not show the bias input anymore. We just show the bias weights w j,0. Besides the bias input, there are two inputs: x 1, x 2. Unit 3 x 1 Unit 5 Output: x 2 Unit 4 24

25 Neural Networks This neural network example consists of six units: Three input units (including the not-shown bias input). Three perceptrons. Yes, inputs count as units. Unit 3 x 1 Unit 5 Output: x 2 Unit 4 25

26 Neural Networks Weights are denoted as w ji. Weight w ji belongs to the edge that connects the output of unit i with an input of unit j. Units 0, 1,, D are the input units (units 0, 1, 2 in this example). Unit 3 x 1 Unit 5 Output: x 2 Unit 4 26

27 Neural Network Layers Oftentimes, neural networks are organized into layers. The input layer is the initial layer of input units (units 0, 1, 2 in our example). The output layer is at the end (unit 5 in our example). Zero, one or more hidden layers can be between the input and output layers. Unit 3 x 1 Unit 5 Output: x 2 Unit 4 27

28 Neural Network Layers There is only one hidden layer in our example, containing units 4 and 5. Each hidden layer's inputs are outputs from the previous layer. Each hidden layer's outputs are inputs to the next layer. The first hidden layer's inputs come from the input layer. The last hidden layer's outputs are inputs to the output layer. Unit 3 x 1 Unit 5 Output: x 2 Unit 4 28

29 Feedforward Networks Feedforward networks are networks where there are no directed loops. If there are no loops, the output of a neuron cannot (directly or indirectly) influence its input. While there are varieties of neural networks that are not feedforward or layered, our main focus will be layered feedforward networks. Unit 3 x 1 Unit 5 Output: x 2 Unit 4 29

30 Computing the Output Notation: L is the number of layers. Layer 1 is the input layer, layer L is the output layer. Given values for the input units, output is computed as follows: For (l = 2; l L; l = l + 1): Compute the outputs of layer L, given the outputs of layer L-1. Unit 3 x 1 Unit 5 Output: x 2 Unit 4 30

31 Computing the Output To compute the outputs of layer l (where l > 1), we simply need to compute the output of each perceptron belonging to layer l. For each such perceptron, its inputs are coming from outputs of perceptrons at layer l 1. Remember, we compute layer outputs in increasing order of l. Unit 3 x 1 Unit 5 Output: x 2 Unit 4 31

32 What Neural Networks Can Compute An individual perceptron is a linear classifier. The weights of the perceptron define a linear boundary between two classes. Layered feedforward neural networks with one hidden layer can compute any continuous function. Layered feedforward neural networks with two hidden layers can compute any mathematical function. This has been known for decades, and is one reason scientists have been optimistic about the potential of neural networks to model intelligent systems. Another reason is the analogy between neural networks and biological brains, which have been a standard of intelligence we are still trying to achieve. There is only one catch: How do we find the right weights? 32

33 Training a Neural Network In linear regression, for the sum-of-squares error, we could find the best weights using a closed-form formula. In logistic regression, for the cross-entropy error, we could find the best weights using an iterative method. In neural networks, we cannot find the best weights (unless we have an astronomical amount of luck). We only have optimization methods that find local minima of the error function. Still, in recent years such methods have produced spectacular results in real-world applications. 33

34 Notation for Training Set We define w to be the vector of all weights in the neural network. We have a set X of N training examples. X = {x 1, x 2,, x N } Each x n is a (D+1)-dimensional column vector. Dimension 0 is the bias input, always set to 1. x n = (1, x n1, x n2,, x nd ) We also have a set T of N target outputs. T = t 1, t 2,, t N t n is the target output for training example x n. Each t n is a K-dimensional column vector: t n = (t n1, t n2,, t nk ) Note: K typically is not equal to D. 34

35 Perceptron Learning Before we discuss how to train an entire neural network, we start with a single perceptron. Remember: given input x n, a perceptron computes its output z using this formula: z(x) = h w T x We use sum-of-squares as our error function. E n (w) is the contribution of training example x n : E n w = 1 2 z(x n) t n 2 N The overall error E is defined as: E w = n=1 E n w 35

36 Perceptron Learning Suppose that a perceptron is using the step function as its activation function h. h a = 0, if a < 0 1, if a 0 z(x) = h w T x = 0, if wt x < 0 1, if w T x 0 Can we apply gradient descent in that case? No, because E(w) is not differentiable. Small changes of w usually lead to no changes in h w T x. The only exception is when the change in w causes h w T x to switch signs (from positive to negative, or from negative to positive). 36

37 Perceptron Learning A better option is setting h to the sigmoid function: z x = h w T x = e wt x Then, measured just on a single training object x n, the error E n (w) is defined as: E n w = 1 2 t n z x n 2 = 1 2 t n e wt x n 2 Note: here we use the sum-of-squares error, and not the crossentropy error that we used for logistic regression. Also note: if our neural network is a single perceptron, then the target output t n is one-dimensional. 37

38 Computing the Gradient E n w = 1 2 t n z x n 2 = 1 2 t n 1 1+e wt xn 2 In this form, E n w is differentiable. If we do the calculations, the gradient turns out to be: E n w = 1 2 t n z x n z x n 1 z x n x n Note that E n is a (D+1) dimensional vector. It is a w scalar (shown in red) multiplied by vector x n. 38

39 Weight Update E n w = 1 2 t n z x n z x n 1 z x n x n So, we update the weight vector w as follows: w = w η t n z x n z x n 1 z x n x n As before, η is the learning rate parameter. It is a positive real number that should be chosen carefully, so as not to be too big or too small. In terms of individual weights w d, the update rule is: w d = w d η t n z x n z x n 1 z x n x nd 39

40 Perceptron Learning - Summary Input: Training inputs x 1,,, x N, target outputs t 1,, t N 1. Extend each x n to a (D+1) dimensional vector, by adding 1 (the bias input) as the value for dimension Initialize weights w d to small random numbers. For example, set each w d between -0.1 and For n = 1 to N: 1. Compute z x n. 2. For d = 0 to D: w d = w d η t n z x n z x n 1 z x n x nd 4. If some stopping criterion has been met, exit. 5. Else, go to step 3. 40

41 Stopping Criterion At step 4 of the perceptron learning algorithm, we need to decide whether to stop or not. One thing we can do is: Compute the cumulative squared error E(w) of the perceptron at that point: N E w = E n w n=1 = N n=1 1 2 t n z x n 2 Compare the current value of E w with the value of E w computed at the previous iteration. If the difference is too small (e.g., smaller than ) we stop. 41

42 Using Perceptrons for Multiclass Problems A perceptron outputs a number between 0 and 1. This is sufficient only for binary classification problems. For more than two classes, there are many different options. We will follow a general approach called one-versusall classification. 42

43 One-Versus-All Perceptrons Suppose we have K classes C 1,, C K, where K > 2. We have training inputs x 1,,, x N, and target values t 1,, t N. Each target value t n is a K-dimensional vector: t n = (t n1, t n2,, t nk ) t nk = 0 if the class of x n is not C k. t nk = 1 if the class of x n is C k. For each class C k, train a perceptron z k by using t nk as the target value for x n. So, perceptron z k is trained to recognize if an object belongs to class C k or not. In total, we train K perceptrons, one for each class. 43

44 One-Versus-All Perceptrons To classify a test pattern x: Compute the responses z k (x) for all K perceptrons. Find the perceptron z k such that the value z k (x) is higher than all other responses. Output that the class of x is C k. In summary: we assign x to the class whose perceptron produced the highest output value for x. 44

45 Neural Network Notation U is the total number of units in the neural network. Each unit, is denoted as P j, where 0 j U 1. Units P 0,, P D are the input units. Unit P 0 is the bias input, always equal to 1. We denote by w ji the weight of the edge connecting the output of P i to an input of P j. We denote by z j the output of unit P j. If 0 j D, then z j = x j. We denote by vector z the vector of all outputs: z = (z 0, z 1,, z U 1 ) 45

46 Target Value Notation To make notation more convenient, we treat target value t n as a U-dimensional vector. In other words, the dimensionality of t n is equal to the number of units in the network. If P j is the k-th output unit, and x n belongs to class C k, then: t nj = 1. t n will have values 0 in all other dimensions. This way, there is a one-to-one correspondence between dimensions of t n and dimensions of z. We do this only to simplify notation. See formula for defining error, in the next slide. 46

47 Squared Error for Neural Networks We define Y to be the set of output units: Y = P j P j belongs to the output layer As usual, we denote by E n (w) the contribution that training input x n makes to the overall error E(w). E n (w) = 1 2 j:p j Y t nj z j 2 Note that only the output from output units contributes to the error. If P j is not an output unit, then t nj and z j get ignored. 47

48 Squared Error for Neural Networks As usual, we denote by E(w) the overall error over all training examples: N E w = E n w n=1 = 1 2 N n=1 j:p j Y t nj z j 2 This is now a double summation. We sum over all training examples x n. For each x n, we sum over all perceptrons in the output layer. 48

49 Training Neural Networks To train a neural network, we follow the same approach of sequential learning that we followed for training single perceptrons: Given a training example x n and target output t n : Compute the training error E n (w). Compute the gradient E n w. Based on the gradient, we can update all weights. The process of computing the gradient and updating neural network weights is called backpropagation. We will see the solution when we use the sigmoidal function as activation function h. 49

50 Computing the Gradient Overall, we want to compute E n w. E n w is a vector containing all weights w ji. Therefore, it suffices to compute, for each w ji, the partial derivative E n w ji. In order to compute E n w ji, we will use this strategy: Decompose E n into a composition of simpler functions. Compute the derivative of each of those simpler functions. Apply the chain rule to obtain E n w ji. 50

51 Decomposing the Error Function Let P j be a perceptron in the neural network. Define function a j x n, w = w ji z i x n, w i a j x n, w is simply the weighted sum of the inputs of P j, given current input x n and given the current value of w. Define function z j α = σ α = 1 1+e a. The output of P j is z j a j x n, w. 51

52 Decomposing the Error Function Define y j to be a vector containing all outputs of all perceptrons belonging to the same layer as P j. Define function E nj y j to be the error of the network given the outputs of all perceptrons of the layer that P j belongs to. Intuition for E nj y j : Suppose that you do not know anything about layers before the layer of perceptron P j. Suppose that you know y j, and all layers after the layer of P j. Then, you can still compute the output of the network, and the error E n. 52

53 Visualizing Function E nj As long as we can see the outputs of the layer that P j belongs to, and all layers after the layer of P j, we can compute the output of the network, and the error E n. Previous layers: unknown?????? Layer of P j Unit P q Unit P q+1 Unit P j Unit P r Unit Unit Unit Unit Output layer Unit Unit 53

54 Decomposing the Error Function We have defined three auxiliary functions: a j x n, w z j α E nj y j Suppose that the perceptrons belonging to the same layer as P j are indexed as P q, P q+1,, P r 1, P r Then, E n is a composition of functions E nj, z j, a j. E n x n, w = E nj y j = E nj z q a q x n, w,, z r a r x n, w 54

55 Computing the Gradient of E n E n x n, w = E nj z q a q x n, w,, z r a r x n, w Then, we can compute E n w ji by applying the chain rule: E n w ji = E nj z j z j a j a j w ji We will compute each of these three terms. 55

56 Computing a j w ji E n w ji = E nj z j z j a j a j w ji a j = w ji u w ju z u x n, w w ji = z i x n, w Remember, z i x n, w is just the output of unit P i. The outputs of all units are straightforward to compute, given x n and w. So, computing a j w ji is entirely straightforward. 56

57 Computing z j a j E n w ji = E nj z j z j a j a j w ji z j = a j σ a j a j = σ a j 1 σ a j = z j 1 z j One of the reasons we like using the sigmoidal function for activation is that its derivative has such a simple form. 57

58 Computing E nj z j, Case 1: If P j Is an Output Unit If P j is an output unit, then z j is an output of the entire network. z j contributes to the error the term 1 t 2 2 nj z j Therefore: E nj z j = 1 2 t nj z j 2 z j = z j t nj 58

59 Updating Weights of Output Units If P j is an output unit, then we have computed all the terms we need for E n w ji. E n w ji = E nj z j z j a j a j w ji E n w ji = z j t nj z j 1 z j z i So, if w ji is the weight of an output unit, we update it as follows: w ji = w ji η z j t nj z j 1 z j z i 59

60 Computing E nj z j, Case 2: If P j Is a Hidden Unit Let P j be a hidden unit. Define L to be the set of all units in the layer after P j. E nj z j = u L E nj z u z u a u a u z j We need to compute these three terms. 60

61 Computing E nj z j, Case 2: If P j Is a Hidden Unit E nj z j = u L E nj z u z u a u a u z j a u = i w ui z i x n, w z j z j = w uj We computed this already, a few slides ago. z u a u = z u 1 z u 61

62 Computing E nj z j, Case 2: If P j Is a Hidden Unit E nj z j = u L E nj z u z u a u a u z j In the previous slide, we computed z u a u and a u z j. So, the formula becomes: E nj E nj = z z j z u 1 z u w uj u u L 62

63 Computing E nj z j, Case 2: If P j Is a Hidden Unit E nj z j = u L E nj z u z u 1 z u w uj Notice that E nj z j is defined using E nj z u. This is a recursive definition. To compute the values for a layer, we use the values from the next layer. This is why the whole algorithm is called backpropagation. We propagate computations from the output layer backwards towards the input layer. 63

64 Formula for Hidden Units From the previous slides, we have these formulas: E n w ji = E nj z j z j a j a j w ji = E nj z j z j 1 z j z i E nj z j = u L E nj z u z u 1 z u w uj We can combine these formulas, to compute E n w ji for any weight of any hidden unit. Just remember to start the computations from the output layer, and move backwards towards the input layer. 64

65 Simplifying Notation The previous formulas are sufficient and will work, but look complicated. We can simplify the formulas considerably, by defining: δ j = E nj z j z j a j Then, if we combine calculations we already did: If P j is an output unit, then: δ j = z j t nj z j 1 z j If P j is a hidden unit, then: δ j = δ u w uj u L z j 1 z j 65

66 Final Backpropagation Formula Using the definition of δ j from the previous slide, we finally get a very simple formula: E n w ji = δ j z i Therefore, given a training input x n, and given a positive learning rate η, each weight w ji is updated using this formula: w ji = w ji ηδ j z i 66

67 Backpropagation for One Object Step 1: Initialize Input Layer We will now see how to apply backpropagation, step by step, in pseudocode style, for a single training object. First, given a training example x n, and its target output t n, we must initialize the input units: // Array z will store, for every perceptron P j, its output. double z[] = new double[u] // Update the input layer, set inputs equal to x n. For j = 0 to D: z[j] = x nj // x nj is the j-th dimension of x n. 67

68 Backpropagation for One Object Step 2: Compute Outputs // we create an array a, which will store, for every // perceptron P j, the weighted sum of the inputs of P j. double a[] = new double[u] // Update the rest of the layers: For l = 2 to L: // L is the number of layers: For each perceptron P j in layer l: a[j] = z[j] = h a[j] = w ji z[i] i // weighted sum of inputs of P j 1 1+ e a[j] // output of unit P j 68

69 Backpropagation for One Object Step 3: Compute New δ Values // we create an array δ, which will store, for every // perceptron P j, value δ j. double δ[] = new double[u] For each output unit P j : δ j = z[j] t nj z[j] 1 z[j] For l = L 1 to 2: // MUST be decreasing order of l For each perceptron P j in layer l: δ j = u: P u layer l+1 δ[u]w uj z j 1 z j 69

70 Backpropagation for One Object Step 4: Update Weights For l = 2 to L: // Order does not matter here, we can go // from 2 to L or from L to 2. For each perceptron P j in layer l: For each perceptron P i in the preceding layer l 1: w ji = w ji η δ j z[i] IMPORTANT: Do Step 3 before Step 4. Do NOT do steps 3 and 4 as a single loop. All δ values must be computed using the old values of weights. Then, all weights must be updated using the new δ values. 70

71 Inputs: Backpropagation Summary N D-dimensional training objects x 1,, x N. The associated target values t 1,, t N, which are U-dimensional vectors. 1. Extend each x n to a (D+1) dimensional vector, by adding the bias input as the value for the zero-th dimension. 2. Initialize weights w ji to small random numbers. For example, set each w u,v between -0.1 and last_error = E(w) 4. For n = 1 to N: Given x n, update weights w ji as described in the previous slides. 5. err = E(w) 6. If err last_error < threshold, exit. // threshold can be Else: last_error = err, go to step 4. 71

72 Classification with Neural Networks Suppose we have K classes C 1,, C K, where K > 2. Each class C k corresponds to an output perceptron P u. Given a test pattern x to classify: Extend x to a (D+1)-dimensional vector, by adding the bias input as value for dimension 0. Compute outputs for all units of the network, working from the input layer towards the output layer. Find the output unit P u with the highest output z u. Return the class that corresponds to P u. 72

73 Structure of Neural Networks Backpropagation describes how to learn weights. However, it does not describe how to learn the structure: How many layers? How many units at each layer? These are parameters that we have to choose somehow. A good way to choose such parameters is by using a validation set, containing examples and their class labels. The validation set should be separate (disjoint) from the training set. 73

74 Structure of Neural Networks To choose the best structure for a neural network using a validation set, we try many different parameters (number of layers, number of units per layer). For each choice of parameters: We train several neural networks using backpropagation. We measure how well each neural network classifies the validation examples. Why not train just one neural network? 74

75 Structure of Neural Networks To choose the best structure for a neural network using a validation set, we try many different parameters (number of layers, number of units per layer). For each choice of parameters: We train several neural networks using backpropagation. We measure how well each neural network classifies the validation examples. Why not train just one neural network? Each network is randomly initialized, so after backpropagation it can end up being different from the other networks. At the end, we select the neural network that did best on the validation set. 75

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks Topics in Machine Learning-EE 5359 Neural Networks 1 The Perceptron Output: A perceptron is a function that maps D-dimensional vectors to real numbers. For notational convenience, we add a zero-th dimension

More information

y(x n, w) t n 2. (1)

y(x n, w) t n 2. (1) Network training: Training a neural network involves determining the weight parameter vector w that minimizes a cost function. Given a training set comprising a set of input vector {x n }, n = 1,...N,

More information

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal

More information

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging

More information

Multilayer Perceptron

Multilayer Perceptron Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4

More information

Lab 5: 16 th April Exercises on Neural Networks

Lab 5: 16 th April Exercises on Neural Networks Lab 5: 16 th April 01 Exercises on Neural Networks 1. What are the values of weights w 0, w 1, and w for the perceptron whose decision surface is illustrated in the figure? Assume the surface crosses the

More information

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009 AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is

More information

Neural Networks and the Back-propagation Algorithm

Neural Networks and the Back-propagation Algorithm Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely

More information

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4 Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:

More information

Revision: Neural Network

Revision: Neural Network Revision: Neural Network Exercise 1 Tell whether each of the following statements is true or false by checking the appropriate box. Statement True False a) A perceptron is guaranteed to perfectly learn

More information

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ

More information

1 What a Neural Network Computes

1 What a Neural Network Computes Neural Networks 1 What a Neural Network Computes To begin with, we will discuss fully connected feed-forward neural networks, also known as multilayer perceptrons. A feedforward neural network consists

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Neural Networks (Part 1) Goals for the lecture

Neural Networks (Part 1) Goals for the lecture Neural Networks (Part ) Mark Craven and David Page Computer Sciences 760 Spring 208 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed

More information

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen Neural Networks - I Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Neural Networks 1 /

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 The Regression Problem Training data: A set of input-output

More information

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

More information

Artifical Neural Networks

Artifical Neural Networks Neural Networks Artifical Neural Networks Neural Networks Biological Neural Networks.................................. Artificial Neural Networks................................... 3 ANN Structure...........................................

More information

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural Networks. Nicholas Ruozzi University of Texas at Dallas Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify

More information

Unit 8: Introduction to neural networks. Perceptrons

Unit 8: Introduction to neural networks. Perceptrons Unit 8: Introduction to neural networks. Perceptrons D. Balbontín Noval F. J. Martín Mateos J. L. Ruiz Reina A. Riscos Núñez Departamento de Ciencias de la Computación e Inteligencia Artificial Universidad

More information

Multilayer Neural Networks

Multilayer Neural Networks Multilayer Neural Networks Multilayer Neural Networks Discriminant function flexibility NON-Linear But with sets of linear parameters at each layer Provably general function approximators for sufficient

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,

More information

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer CSE446: Neural Networks Spring 2017 Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer Human Neurons Switching time ~ 0.001 second Number of neurons 10 10 Connections per neuron 10 4-5 Scene

More information

Feedforward Neural Nets and Backpropagation

Feedforward Neural Nets and Backpropagation Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features

More information

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions BACK-PROPAGATION NETWORKS Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks Cannot approximate (learn) non-linear functions Difficult (if not impossible) to design

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

Machine Learning. Neural Networks

Machine Learning. Neural Networks Machine Learning Neural Networks Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 Biological Analogy Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 THE

More information

CSCI567 Machine Learning (Fall 2018)

CSCI567 Machine Learning (Fall 2018) CSCI567 Machine Learning (Fall 2018) Prof. Haipeng Luo U of Southern California Sep 12, 2018 September 12, 2018 1 / 49 Administration GitHub repos are setup (ask TA Chi Zhang for any issues) HW 1 is due

More information

Machine Learning

Machine Learning Machine Learning 10-601 Maria Florina Balcan Machine Learning Department Carnegie Mellon University 02/10/2016 Today: Artificial neural networks Backpropagation Reading: Mitchell: Chapter 4 Bishop: Chapter

More information

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

COMP 551 Applied Machine Learning Lecture 14: Neural Networks COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: Ryan Lowe (ryan.lowe@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted,

More information

Lecture 4: Perceptrons and Multilayer Perceptrons

Lecture 4: Perceptrons and Multilayer Perceptrons Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons

More information

Neural networks. Chapter 20. Chapter 20 1

Neural networks. Chapter 20. Chapter 20 1 Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms

More information

Artificial Neural Networks Examination, June 2005

Artificial Neural Networks Examination, June 2005 Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either

More information

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

AI Programming CS F-20 Neural Networks

AI Programming CS F-20 Neural Networks AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco 20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols

More information

Neural Nets Supervised learning

Neural Nets Supervised learning 6.034 Artificial Intelligence Big idea: Learning as acquiring a function on feature vectors Background Nearest Neighbors Identification Trees Neural Nets Neural Nets Supervised learning y s(z) w w 0 w

More information

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller 2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks Todd W. Neller Machine Learning Learning is such an important part of what we consider "intelligence" that

More information

Neural Networks biological neuron artificial neuron 1

Neural Networks biological neuron artificial neuron 1 Neural Networks biological neuron artificial neuron 1 A two-layer neural network Output layer (activation represents classification) Weighted connections Hidden layer ( internal representation ) Input

More information

Nonlinear Classification

Nonlinear Classification Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Neural Networks Varun Chandola x x 5 Input Outline Contents February 2, 207 Extending Perceptrons 2 Multi Layered Perceptrons 2 2. Generalizing to Multiple Labels.................

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Neural Networks: Backpropagation

Neural Networks: Backpropagation Neural Networks: Backpropagation Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others

More information

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November.

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November. COGS Q250 Fall 2012 Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November. For the first two questions of the homework you will need to understand the learning algorithm using the delta

More information

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2. 1 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2011 Recitation 8, November 3 Corrected Version & (most) solutions

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning

More information

Machine Learning and Data Mining. Linear classification. Kalev Kask

Machine Learning and Data Mining. Linear classification. Kalev Kask Machine Learning and Data Mining Linear classification Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ = f(x ; q) Parameters q Program ( Learner ) Learning algorithm Change q

More information

Neural Networks. Xiaojin Zhu Computer Sciences Department University of Wisconsin, Madison. slide 1

Neural Networks. Xiaojin Zhu Computer Sciences Department University of Wisconsin, Madison. slide 1 Neural Networks Xiaoin Zhu erryzhu@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison slide 1 Terminator 2 (1991) JOHN: Can you learn? So you can be... you know. More human. Not

More information

CSC 411 Lecture 10: Neural Networks

CSC 411 Lecture 10: Neural Networks CSC 411 Lecture 10: Neural Networks Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 10-Neural Networks 1 / 35 Inspiration: The Brain Our brain has 10 11

More information

Single layer NN. Neuron Model

Single layer NN. Neuron Model Single layer NN We consider the simple architecture consisting of just one neuron. Generalization to a single layer with more neurons as illustrated below is easy because: M M The output units are independent

More information

Multilayer Neural Networks

Multilayer Neural Networks Multilayer Neural Networks Introduction Goal: Classify objects by learning nonlinearity There are many problems for which linear discriminants are insufficient for minimum error In previous methods, the

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

Artificial Neural Networks. Edward Gatt

Artificial Neural Networks. Edward Gatt Artificial Neural Networks Edward Gatt What are Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning Very

More information

Logistic Regression & Neural Networks

Logistic Regression & Neural Networks Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

4. Multilayer Perceptrons

4. Multilayer Perceptrons 4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples

More information

Multilayer Perceptrons and Backpropagation

Multilayer Perceptrons and Backpropagation Multilayer Perceptrons and Backpropagation Informatics 1 CG: Lecture 7 Chris Lucas School of Informatics University of Edinburgh January 31, 2017 (Slides adapted from Mirella Lapata s.) 1 / 33 Reading:

More information

Learning and Neural Networks

Learning and Neural Networks Artificial Intelligence Learning and Neural Networks Readings: Chapter 19 & 20.5 of Russell & Norvig Example: A Feed-forward Network w 13 I 1 H 3 w 35 w 14 O 5 I 2 w 23 w 24 H 4 w 45 a 5 = g 5 (W 3,5 a

More information

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs) Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w x + w 2 x 2 + w 0 = 0 Feature x 2 = w w 2 x w 0 w 2 Feature 2 A perceptron can separate

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Threshold units Gradient descent Multilayer networks Backpropagation Hidden layer representations Example: Face Recognition Advanced topics 1 Connectionist Models Consider humans:

More information

Pattern Classification

Pattern Classification Pattern Classification All materials in these slides were taen from Pattern Classification (2nd ed) by R. O. Duda,, P. E. Hart and D. G. Stor, John Wiley & Sons, 2000 with the permission of the authors

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Artificial Neural Networks Examination, March 2004

Artificial Neural Networks Examination, March 2004 Artificial Neural Networks Examination, March 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Yongjin Park 1 Goal of Feedforward Networks Deep Feedforward Networks are also called as Feedforward neural networks or Multilayer Perceptrons Their Goal: approximate some function

More information

Multilayer Perceptron = FeedForward Neural Network

Multilayer Perceptron = FeedForward Neural Network Multilayer Perceptron = FeedForward Neural Networ History Definition Classification = feedforward operation Learning = bacpropagation = local optimization in the space of weights Pattern Classification

More information

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!! CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!! November 18, 2015 THE EXAM IS CLOSED BOOK. Once the exam has started, SORRY, NO TALKING!!! No, you can t even say see ya

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

CMSC 421: Neural Computation. Applications of Neural Networks

CMSC 421: Neural Computation. Applications of Neural Networks CMSC 42: Neural Computation definition synonyms neural networks artificial neural networks neural modeling connectionist models parallel distributed processing AI perspective Applications of Neural Networks

More information

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks Sections 18.6 and 18.7 Analysis of Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline Univariate regression

More information

Lecture 4: Training a Classifier

Lecture 4: Training a Classifier Lecture 4: Training a Classifier Roger Grosse 1 Introduction Now that we ve defined what binary classification is, let s actually train a classifier. We ll approach this problem in much the same way as

More information

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples Back-Propagation Algorithm Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples 1 Inner-product net =< w, x >= w x cos(θ) net = n i=1 w i x i A measure

More information

Artificial Neural Network

Artificial Neural Network Artificial Neural Network Eung Je Woo Department of Biomedical Engineering Impedance Imaging Research Center (IIRC) Kyung Hee University Korea ejwoo@khu.ac.kr Neuron and Neuron Model McCulloch and Pitts

More information

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch. Neural Networks Oliver Schulte - CMPT 726 Bishop PRML Ch. 5 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will

More information

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks. MGS Lecture 2 Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Artificial Neural Networks and Nonparametric Methods CMPSCI 383 Nov 17, 2011! Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011! 1 Todayʼs lecture" How the brain works (!)! Artificial neural networks! Perceptrons! Multilayer feed-forward networks! Error

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)

More information

Unit III. A Survey of Neural Network Model

Unit III. A Survey of Neural Network Model Unit III A Survey of Neural Network Model 1 Single Layer Perceptron Perceptron the first adaptive network architecture was invented by Frank Rosenblatt in 1957. It can be used for the classification of

More information

Intelligent Systems Discriminative Learning, Neural Networks

Intelligent Systems Discriminative Learning, Neural Networks Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm

More information

Machine Learning (CSE 446): Neural Networks

Machine Learning (CSE 446): Neural Networks Machine Learning (CSE 446): Neural Networks Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 6, 2017 1 / 22 Admin No Wednesday office hours for Noah; no lecture Friday. 2 /

More information

Linear discriminant functions

Linear discriminant functions Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative

More information

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs) Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w 1 x 1 + w 2 x 2 + w 0 = 0 Feature 1 x 2 = w 1 w 2 x 1 w 0 w 2 Feature 2 A perceptron

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17 3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Security Analytics. Topic 6: Perceptron and Support Vector Machine

Security Analytics. Topic 6: Perceptron and Support Vector Machine Security Analytics Topic 6: Perceptron and Support Vector Machine Purdue University Prof. Ninghui Li Based on slides by Prof. Jenifer Neville and Chris Clifton Readings Principle of Data Mining Chapter

More information

Multilayer Perceptrons (MLPs)

Multilayer Perceptrons (MLPs) CSE 5526: Introduction to Neural Networks Multilayer Perceptrons (MLPs) 1 Motivation Multilayer networks are more powerful than singlelayer nets Example: XOR problem x 2 1 AND x o x 1 x 2 +1-1 o x x 1-1

More information

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of

More information

Training Multi-Layer Neural Networks. - the Back-Propagation Method. (c) Marcin Sydow

Training Multi-Layer Neural Networks. - the Back-Propagation Method. (c) Marcin Sydow Plan training single neuron with continuous activation function training 1-layer of continuous neurons training multi-layer network - back-propagation method single neuron with continuous activation function

More information

Neural networks CMSC 723 / LING 723 / INST 725 MARINE CARPUAT. Slides credit: Graham Neubig

Neural networks CMSC 723 / LING 723 / INST 725 MARINE CARPUAT. Slides credit: Graham Neubig Neural networks CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu Slides credit: Graham Neubig Outline Perceptron: recap and limitations Neural networks Multi-layer perceptron Forward propagation

More information

CSC321 Lecture 6: Backpropagation

CSC321 Lecture 6: Backpropagation CSC321 Lecture 6: Backpropagation Roger Grosse Roger Grosse CSC321 Lecture 6: Backpropagation 1 / 21 Overview We ve seen that multilayer neural networks are powerful. But how can we actually learn them?

More information

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt. 1 The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt. The algorithm applies only to single layer models

More information

Sections 18.6 and 18.7 Artificial Neural Networks

Sections 18.6 and 18.7 Artificial Neural Networks Sections 18.6 and 18.7 Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline The brain vs. artifical neural

More information

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders

More information