(FeedForward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann


 Cori Flowers
 1 years ago
 Views:
Transcription
1 (FeedForward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
2 Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for SRL that can be formulated as tensor factorization problem. Furthermore, we learned about optimization techniques which can be applied for learning scorebased models. Today we will learn about a new class of algorithms which can be applied for SRL, namely neural networks. We will see how to apply them for SRL in the following lecture. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 2
3 Artificial Neurons source: Wikipedia Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 3
4 Artificial Neurons source: Wikipedia There are many different types of neurons in the nervous system, and they are quite complicated. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 3
5 Artificial Neurons source: Wikipedia There are many different types of neurons in the nervous system, and they are quite complicated. Neurons are connected to each other with synapses. Thus, they form a network. This complicates things even more. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 3
6 Artificial Neurons source: Wikipedia There are many different types of neurons in the nervous system, and they are quite complicated. Neurons are connected to each other with synapses. Thus, they form a network. This complicates things even more. How to understand this system? Basic idea: reduce the neuron to its essentials. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 3
7 Artificial Neurons source: Wikipedia There are many different types of neurons in the nervous system, and they are quite complicated. Neurons are connected to each other with synapses. Thus, they form a network. This complicates things even more. How to understand this system? Basic idea: reduce the neuron to its essentials. There are a number of greatly simplified neuron models. One of the simplest models is by McCulloch and Pitts. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 3
8 Artificial Neurons A neuron receives input from a number of other neurons. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 4
9 Artificial Neurons A neuron receives input from a number of other neurons. These inputs come in the form of spikes short pulses of electrical current. We average these spikes over time and represent them with a single number: the spike frequency ν. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 4
10 Artificial Neurons A neuron receives input from a number of other neurons. These inputs come in the form of spikes short pulses of electrical current. We average these spikes over time and represent them with a single number: the spike frequency ν. The spikes arrive at the neuron s membrane and alter the electrical potential at this point. For each neuron we keep track of its membrane potential u. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 4
11 Artificial Neurons A neuron receives input from a number of other neurons. These inputs come in the form of spikes short pulses of electrical current. We average these spikes over time and represent them with a single number: the spike frequency ν. The spikes arrive at the neuron s membrane and alter the electrical potential at this point. For each neuron we keep track of its membrane potential u. There are excitatory and inhibitory connections between neurons, and they can be of different strength. We model the effect of a connection on the membrane potential as the product of the synaptic weight w with the spike frequency. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 4
12 Artificial Neurons A neuron receives input from a number of other neurons. These inputs come in the form of spikes short pulses of electrical current. We average these spikes over time and represent them with a single number: the spike frequency ν. The spikes arrive at the neuron s membrane and alter the electrical potential at this point. For each neuron we keep track of its membrane potential u. There are excitatory and inhibitory connections between neurons, and they can be of different strength. We model the effect of a connection on the membrane potential as the product of the synaptic weight w with the spike frequency. The weight is positive for excitatory and negative for inhibitory connections. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 4
13 Artificial Neurons A neuron receives input from a number of other neurons. These inputs come in the form of spikes short pulses of electrical current. We average these spikes over time and represent them with a single number: the spike frequency ν. The spikes arrive at the neuron s membrane and alter the electrical potential at this point. For each neuron we keep track of its membrane potential u. There are excitatory and inhibitory connections between neurons, and they can be of different strength. We model the effect of a connection on the membrane potential as the product of the synaptic weight w with the spike frequency. The weight is positive for excitatory and negative for inhibitory connections. The absolute value of the weight is small for weak connections and large for strong connections. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 4
14 Artificial Neurons When the neuron s membrane potential exceeds a threshold then the neuron emits a spike (which can propagate to multiple receivers) and resets its membrane potential. The spike frequency as a function of the incoming power is a nonlinear transfer function (also simply called nonlinearity): We call such a function a sigmoid or sigmoidal function. The standard formula is 1 ν = σ(u) = 1 + e u. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 5
15 Artificial Neurons Now assume we have a pool of neurons, numbered 1,..., n. Let u i be the membrane potential and ν i be the firing rate of neuron number i, and let w ji be the synaptic weight of the connection from i to j (which is zero if the neurons are not connected). Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 6
16 Artificial Neurons Now assume we have a pool of neurons, numbered 1,..., n. Let u i be the membrane potential and ν i be the firing rate of neuron number i, and let w ji be the synaptic weight of the connection from i to j (which is zero if the neurons are not connected). Then we arrive at the model u j n w ji ν i i=1 ν j σ(u j ). Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 6
17 Artificial Neurons Now assume we have a pool of neurons, numbered 1,..., n. Let u i be the membrane potential and ν i be the firing rate of neuron number i, and let w ji be the synaptic weight of the connection from i to j (which is zero if the neurons are not connected). Then we arrive at the model u j n w ji ν i i=1 ν j σ(u j ). This model draws inspiration from biology. However, it is so abstract that in the end it has little in common with its biological counterpart. It should rather be viewed as a computational unit in a mathematical learning machine. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 6
18 Artificial Neurons The relation u j n i=1 w ji ν i is familiar: if ν i are the inputs, then this is a linear function, which can be understood as a Perceptron model. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 7
19 Artificial Neurons The relation u j n i=1 w ji ν i is familiar: if ν i are the inputs, then this is a linear function, which can be understood as a Perceptron model. The sigmoid does not change the way decisions are made: ν j σ ( n i=1 w ji ν i ). The threshold changes to 1/2 (instead of zero), but the decision boundary remains the same. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 7
20 Artificial Neurons The relation u j n i=1 w ji ν i is familiar: if ν i are the inputs, then this is a linear function, which can be understood as a Perceptron model. The sigmoid does not change the way decisions are made: ν j σ ( n i=1 w ji ν i ). The threshold changes to 1/2 (instead of zero), but the decision boundary remains the same. In other words, the Perceptron is a model of a single neuron. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 7
21 Artificial Neurons The relation u j n i=1 w ji ν i is familiar: if ν i are the inputs, then this is a linear function, which can be understood as a Perceptron model. The sigmoid does not change the way decisions are made: ν j σ ( n i=1 w ji ν i ). The threshold changes to 1/2 (instead of zero), but the decision boundary remains the same. In other words, the Perceptron is a model of a single neuron. Usually many neurons process the input, so we have multiple Perceptrons in parallel: Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 7
22 Layered Neural Networks Let ν (0) R m denote the vector of firing rates coming from the inputs (sensors, data), and let ν (1) R n denote the vector of firing rates of the neurons. Let W R n m be the matrix with entries w ji. Also, let σ denote the componentwise application of the transfer function. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 8
23 Layered Neural Networks Let ν (0) R m denote the vector of firing rates coming from the inputs (sensors, data), and let ν (1) R n denote the vector of firing rates of the neurons. Let W R n m be the matrix with entries w ji. Also, let σ denote the componentwise application of the transfer function. The computation can be written in compact form: ν (1) = σ(w ν (0) ) Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 8
24 Layered Neural Networks Let ν (0) R m denote the vector of firing rates coming from the inputs (sensors, data), and let ν (1) R n denote the vector of firing rates of the neurons. Let W R n m be the matrix with entries w ji. Also, let σ denote the componentwise application of the transfer function. The computation can be written in compact form: ν (1) = σ(w ν (0) ) Neurons can not only receive input from sensors, but also from other neurons. What if we feed the outputs into another layer of neurons? Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 8
25 Layered Neural Networks The resulting architecture is called a (layered) feedforward neural network, or multi layer Perceptron (MLP). Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 9
26 Layered Neural Networks The resulting architecture is called a (layered) feedforward neural network, or multi layer Perceptron (MLP). This example network has an input layer with 5 nodes, two hidden layers with 8 and 6 neurons, and an output layer with 2 neurons. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 9
27 Layered Neural Networks The resulting architecture is called a (layered) feedforward neural network, or multi layer Perceptron (MLP). This example network has an input layer with 5 nodes, two hidden layers with 8 and 6 neurons, and an output layer with 2 neurons. The size of the input and output layers is determined by the problem (dimension of the vectors x and y), but number and size of the hidden layers is arbitrary. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 9
28 Layered Neural Networks Now let ν (0) denote the vector of inputs, let ν (i) denote the vector of firing rates in layer i, and let W (i) denote the matrix of connections from layer i 1 to layer i. Then we have the overall model: ( ν (n) = σ out (W (n) σ W (n 1) σ (... σ(w (1) ν (0) )... ))). Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 10
29 Layered Neural Networks Now let ν (0) denote the vector of inputs, let ν (i) denote the vector of firing rates in layer i, and let W (i) denote the matrix of connections from layer i 1 to layer i. Then we have the overall model: ( ν (n) = σ out (W (n) σ W (n 1) σ (... σ(w (1) ν (0) )... ))). The model processes a data point x as follows: Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 10
30 Layered Neural Networks Now let ν (0) denote the vector of inputs, let ν (i) denote the vector of firing rates in layer i, and let W (i) denote the matrix of connections from layer i 1 to layer i. Then we have the overall model: ( ν (n) = σ out (W (n) σ W (n 1) σ (... σ(w (1) ν (0) )... ))). The model processes a data point x as follows: Set ν (0) = x. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 10
31 Layered Neural Networks Now let ν (0) denote the vector of inputs, let ν (i) denote the vector of firing rates in layer i, and let W (i) denote the matrix of connections from layer i 1 to layer i. Then we have the overall model: ( ν (n) = σ out (W (n) σ W (n 1) σ (... σ(w (1) ν (0) )... ))). The model processes a data point x as follows: Set ν (0) = x. Apply the model, i.e., compute ν (n) from ν (0). Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 10
32 Layered Neural Networks Now let ν (0) denote the vector of inputs, let ν (i) denote the vector of firing rates in layer i, and let W (i) denote the matrix of connections from layer i 1 to layer i. Then we have the overall model: ( ν (n) = σ out (W (n) σ W (n 1) σ (... σ(w (1) ν (0) )... ))). The model processes a data point x as follows: Set ν (0) = x. Apply the model, i.e., compute ν (n) from ν (0). Output ŷ = ν (n) (which is hopefully close to the true label y). Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 10
33 Layered Neural Networks ( ν (n) = σ out (W (n) σ W (n 1) σ (... σ(w (1) ν (0) )... ))) Hidden layers employ a sigmoid transfer function. The transfer function σ out of the output layer is chosen task specific: Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 11
34 Layered Neural Networks ( ν (n) = σ out (W (n) σ W (n 1) σ (... σ(w (1) ν (0) )... ))) Hidden layers employ a sigmoid transfer function. The transfer function σ out of the output layer is chosen task specific: Regression problems usually need an unbounded range of values. Then a sigmoid is not appropriate. The identity function is used in this case (socalled linear output neurons). Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 11
35 Layered Neural Networks ( ν (n) = σ out (W (n) σ W (n 1) σ (... σ(w (1) ν (0) )... ))) Hidden layers employ a sigmoid transfer function. The transfer function σ out of the output layer is chosen task specific: Regression problems usually need an unbounded range of values. Then a sigmoid is not appropriate. The identity function is used in this case (socalled linear output neurons). For classification the range of values does not matter. Either linear or sigmoid output layer neurons can be used. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 11
36 Layered Neural Networks ( ν (n) = σ out (W (n) σ W (n 1) σ (... σ(w (1) ν (0) )... ))) Hidden layers employ a sigmoid transfer function. The transfer function σ out of the output layer is chosen task specific: Regression problems usually need an unbounded range of values. Then a sigmoid is not appropriate. The identity function is used in this case (socalled linear output neurons). For classification the range of values does not matter. Either linear or sigmoid output layer neurons can be used. The linear function u = W ν is usually extended to an affine function u = W ν + b by means of a socalled bias neuron. This neuron has a constant firing rate of one and is input to all other neurons, with connection weights b i. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 11
37 Layered Neural Networks Hidden layers employ a sigmoid transfer function. The transfer function σ out of the output layer is chosen task specific: Regression problems usually need an unbounded range of values. Then a sigmoid is not appropriate. The identity function is used in this case (socalled linear output neurons). For classification the range of values does not matter. Either linear or sigmoid output layer neurons can be used. The linear function u = W ν is usually extended to an affine function u = W ν + b by means of a socalled bias neuron. This neuron has a constant firing rate of one and is input to all other neurons, with connection weights b i. This is effectively the same as embedding affine functions into linear functions, one dimension up. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 11 ( ν (n) = σ out (W (n) σ W (n 1) σ (... σ(w (1) ν (0) )... )))
38 Layered Neural Networks A layered neural network alternates the application of two types of transformations: Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 12
39 Layered Neural Networks A layered neural network alternates the application of two types of transformations: 1 A linear map: left multiplication with the matrix W (i). This matrix is a parameter of the model, so it can be subject to learning. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 12
40 Layered Neural Networks A layered neural network alternates the application of two types of transformations: 1 A linear map: left multiplication with the matrix W (i). This matrix is a parameter of the model, so it can be subject to learning. 2 A nonlinear function: componentwise transfer function σ. This function is fixed. It has no parameters that can be adjusted. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 12
41 Layered Neural Networks A layered neural network alternates the application of two types of transformations: 1 A linear map: left multiplication with the matrix W (i). This matrix is a parameter of the model, so it can be subject to learning. 2 A nonlinear function: componentwise transfer function σ. This function is fixed. It has no parameters that can be adjusted. The nonlinearities are not adaptive, but they are nevertheless important! Without them the model would collapse into the linear map W = W (n)... W (1). Then all computations were linear. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 12
42 Layered Neural Networks A layered neural network alternates the application of two types of transformations: 1 A linear map: left multiplication with the matrix W (i). This matrix is a parameter of the model, so it can be subject to learning. 2 A nonlinear function: componentwise transfer function σ. This function is fixed. It has no parameters that can be adjusted. The nonlinearities are not adaptive, but they are nevertheless important! Without them the model would collapse into the linear map W = W (n)... W (1). Then all computations were linear. It turns out that a neural network with sigmoid transfer functions is indeed far more powerful than a linear model. In a sense, it can compute everything. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 12
43 Universal Approximation Property Theorem. Let σ : R R be a continuous, nonconstant, bounded, and monotonically increasing function. Let K R m be compact, and let C(K) denote the space of continuous functions K R. Then, given a function g C(K) and an accuracy ε > 0, there exists a hidden layer size N N and a set of coefficients w (1) i R m, w (2) i, b i R (for i {1,..., N}), such that f : K R ; f (x) = N i=1 is an εapproximation of g, that is, w (2) i ( ) σ (w (1) i ) T x + b i f g := max f (x) g(x) < ε. x K Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 13
44 Universal Approximation Property Theorem. Let σ : R R be a continuous, nonconstant, bounded, and monotonically increasing function. Let K R m be compact, and let C(K) denote the space of continuous functions K R. Then, given a function g C(K) and an accuracy ε > 0, there exists a hidden layer size N N and a set of coefficients w (1) i R m, w (2) i, b i R (for i {1,..., N}), such that f : K R ; f (x) = N i=1 is an εapproximation of g, that is, w (2) i ( ) σ (w (1) i ) T x + b i f g := max f (x) g(x) < ε. x K Corollary. The theorem extends trivially to multiple outputs. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 13
45 Universal Approximation Property Theorem. Let σ : R R be a continuous, nonconstant, bounded, and monotonically increasing function. Let K R m be compact, and let C(K) denote the space of continuous functions K R. Then, given a function g C(K) and an accuracy ε > 0, there exists a hidden layer size N N and a set of coefficients w (1) i R m, w (2) i, b i R (for i {1,..., N}), such that f : K R ; f (x) = N i=1 is an εapproximation of g, that is, w (2) i ( ) σ (w (1) i ) T x + b i f g := max f (x) g(x) < ε. x K Corollary. The theorem extends trivially to multiple outputs. Corollary. Neural networks with a single sigmoidal hidden layer and linear output layer are universal approximators. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 13
46 Universal Approximation Property This means that for a given target function g there exists a sequence of networks ( f k that converges (pointwise) to )k N the target function. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 14
47 Universal Approximation Property This means that for a given target function g there exists a sequence of networks ( f k that converges (pointwise) to )k N the target function. Usually, as the networks come closer and closer to g, they will need more and more hidden neurons. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 14
48 Universal Approximation Property This means that for a given target function g there exists a sequence of networks ( f k that converges (pointwise) to )k N the target function. Usually, as the networks come closer and closer to g, they will need more and more hidden neurons. A network with fixed layer sizes can only model a subspace of all continuous functions. Its dimensionality is limited by the number of weights. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 14
49 Universal Approximation Property This means that for a given target function g there exists a sequence of networks ( f k that converges (pointwise) to )k N the target function. Usually, as the networks come closer and closer to g, they will need more and more hidden neurons. A network with fixed layer sizes can only model a subspace of all continuous functions. Its dimensionality is limited by the number of weights. The continuous functions form an infinite dimensional vector space. Therefore arbitrarily large hidden layer sizes are needed. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 14
50 Universal Approximation Property This means that for a given target function g there exists a sequence of networks ( f k that converges (pointwise) to )k N the target function. Usually, as the networks come closer and closer to g, they will need more and more hidden neurons. A network with fixed layer sizes can only model a subspace of all continuous functions. Its dimensionality is limited by the number of weights. The continuous functions form an infinite dimensional vector space. Therefore arbitrarily large hidden layer sizes are needed. The universal approximation property is not as special as it seems. For example, polynomials are universal approximators (Weierstraß theorem). Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 14
51 Other Neural Networks In this lecture we cover only feedforward neural networks, because this is the most basic and most relevant class for supervised (and unsupervised) learning and the class applied to SRL. A broad range of other network types has been developed. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 15
52 Other Neural Networks In this lecture we cover only feedforward neural networks, because this is the most basic and most relevant class for supervised (and unsupervised) learning and the class applied to SRL. A broad range of other network types has been developed. The linear+sigmoid processing model can be replaced by other functions. For example, this leads to radial basis function models. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 15
53 Other Neural Networks In this lecture we cover only feedforward neural networks, because this is the most basic and most relevant class for supervised (and unsupervised) learning and the class applied to SRL. A broad range of other network types has been developed. The linear+sigmoid processing model can be replaced by other functions. For example, this leads to radial basis function models. Convolutional Neural Networks (CNNs) are inspired by the organization of the animal visual cortex. Each neuron receives input only from a local patch (corresponding to the receptive field in real neurons). Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 15
54 Other Neural Networks In this lecture we cover only feedforward neural networks, because this is the most basic and most relevant class for supervised (and unsupervised) learning and the class applied to SRL. A broad range of other network types has been developed. The linear+sigmoid processing model can be replaced by other functions. For example, this leads to radial basis function models. Convolutional Neural Networks (CNNs) are inspired by the organization of the animal visual cortex. Each neuron receives input only from a local patch (corresponding to the receptive field in real neurons). Synapses can form loops. This requires the introduction of time delays. Then we speak of Recurrent Neural Networks (RNNs). These are even more powerful models: they are not simple mappings, but stateful computers. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 15
55 Other Neural Networks In this lecture we cover only feedforward neural networks, because this is the most basic and most relevant class for supervised (and unsupervised) learning and the class applied to SRL. A broad range of other network types has been developed. The linear+sigmoid processing model can be replaced by other functions. For example, this leads to radial basis function models. Convolutional Neural Networks (CNNs) are inspired by the organization of the animal visual cortex. Each neuron receives input only from a local patch (corresponding to the receptive field in real neurons). Synapses can form loops. This requires the introduction of time delays. Then we speak of Recurrent Neural Networks (RNNs). These are even more powerful models: they are not simple mappings, but stateful computers. Autoencoders, (restricted) Boltzmann machines, and selforganizing maps are used for unsupervised learning. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 15
56 Other Neural Networks In this lecture we cover only feedforward neural networks, because this is the most basic and most relevant class for supervised (and unsupervised) learning and the class applied to SRL. A broad range of other network types has been developed. The linear+sigmoid processing model can be replaced by other functions. For example, this leads to radial basis function models. Convolutional Neural Networks (CNNs) are inspired by the organization of the animal visual cortex. Each neuron receives input only from a local patch (corresponding to the receptive field in real neurons). Synapses can form loops. This requires the introduction of time delays. Then we speak of Recurrent Neural Networks (RNNs). These are even more powerful models: they are not simple mappings, but stateful computers. Autoencoders, (restricted) Boltzmann machines, and selforganizing maps are used for unsupervised learning. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 15
57 Deep Learning Many NNs used e.g. for image processing are really deep, i.e., they consist of 10 layers or more. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 16
58 Deep Learning Many NNs used e.g. for image processing are really deep, i.e., they consist of 10 layers or more. We speak of deep learning. This has become one of the dominant buzzwords of the field (next to big data). Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 16
59 Deep Learning Many NNs used e.g. for image processing are really deep, i.e., they consist of 10 layers or more. We speak of deep learning. This has become one of the dominant buzzwords of the field (next to big data). A central concept of deep learning is that lower layers extract basic features (e.g., edge detectors), while higher layers compose them to complex features (complex cells, invariant object detectors). Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 16
60 Deep Learning Many NNs used e.g. for image processing are really deep, i.e., they consist of 10 layers or more. We speak of deep learning. This has become one of the dominant buzzwords of the field (next to big data). A central concept of deep learning is that lower layers extract basic features (e.g., edge detectors), while higher layers compose them to complex features (complex cells, invariant object detectors). This is in rough correspondence with our understanding of how the visual cortex processes images. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 16
61 Deep Learning Many NNs used e.g. for image processing are really deep, i.e., they consist of 10 layers or more. We speak of deep learning. This has become one of the dominant buzzwords of the field (next to big data). A central concept of deep learning is that lower layers extract basic features (e.g., edge detectors), while higher layers compose them to complex features (complex cells, invariant object detectors). This is in rough correspondence with our understanding of how the visual cortex processes images. Recently deep learning revolutionized a lot of fields like image and language procession, machine translations etc. It was also part of AlphaGo. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 16
62 From Models to Learners The class of functions represented by neural networks is rich enough to represent the solution to any problem, provided that the network is big enough. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 17
63 From Models to Learners The class of functions represented by neural networks is rich enough to represent the solution to any problem, provided that the network is big enough. With a large enough hidden layer the network can approximate the optimal hypothesis arbitrarily well. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 17
64 From Models to Learners The class of functions represented by neural networks is rich enough to represent the solution to any problem, provided that the network is big enough. With a large enough hidden layer the network can approximate the optimal hypothesis arbitrarily well. However, this is not helpful in practice. It does not tell us how to actually set the network size, let alone the weights. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 17
65 From Models to Learners The class of functions represented by neural networks is rich enough to represent the solution to any problem, provided that the network is big enough. With a large enough hidden layer the network can approximate the optimal hypothesis arbitrarily well. However, this is not helpful in practice. It does not tell us how to actually set the network size, let alone the weights. Until now we have defined neural networks as a class of models. We do not have a learning rule yet. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 17
66 From Models to Learners The class of functions represented by neural networks is rich enough to represent the solution to any problem, provided that the network is big enough. With a large enough hidden layer the network can approximate the optimal hypothesis arbitrarily well. However, this is not helpful in practice. It does not tell us how to actually set the network size, let alone the weights. Until now we have defined neural networks as a class of models. We do not have a learning rule yet. Neural networks are trained based on stochastic gradient descent as described in the following. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 17
67 (Online) Steepest Descent Training Let w denote a vector collecting all weights of a neural network. This is a linearized version of all of its weight matrices. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 18
68 (Online) Steepest Descent Training Let w denote a vector collecting all weights of a neural network. This is a linearized version of all of its weight matrices. Let f w be the mapping represented by the network for particular weights w. Let E(w) = 1 S L(f w (x i ), y i ) i S denote the error of the network, as a function of the weights. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 18
69 (Online) Steepest Descent Training Let w denote a vector collecting all weights of a neural network. This is a linearized version of all of its weight matrices. Let f w be the mapping represented by the network for particular weights w. Let E(w) = 1 S L(f w (x i ), y i ) i S denote the error of the network, as a function of the weights. The sum may run over the whole data set ( S = n, batch mode), over small subsets ( S n, mini batches), or only over a single example ( S = 1, online mode). Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 18
70 (Online) Steepest Descent Training E(w) = 1 S L(f w (x i ), y i ) i S The batch error is what we have called the empirical risk w.r.t. the loss function L. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 19
71 (Online) Steepest Descent Training E(w) = 1 S L(f w (x i ), y i ) i S The batch error is what we have called the empirical risk w.r.t. the loss function L. Its minimization is a straightforward learning strategy. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 19
72 (Online) Steepest Descent Training E(w) = 1 S L(f w (x i ), y i ) i S The batch error is what we have called the empirical risk w.r.t. the loss function L. Its minimization is a straightforward learning strategy. This is usually what we want when training a neural network. So why care about online and mini batch errors? Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 19
73 (Online) Steepest Descent Training E(w) = 1 S L(f w (x i ), y i ) i S The batch error is what we have called the empirical risk w.r.t. the loss function L. Its minimization is a straightforward learning strategy. This is usually what we want when training a neural network. So why care about online and mini batch errors? The reason is that the online error is much faster to compute, namely by a factor of n (size of the data set). Thus its use allows for many more gradient descent steps. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 19
74 (Online) Steepest Descent Training Assume we have computed the gradient of the error w E(w) with respect to the weights. Then we can perform a step of gradient descent with learning rate η to update the wights. w w η w E(w). Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 20
75 Backpropagation Now we come to the computation of the error gradient w E(w). Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 21
76 Backpropagation Now we come to the computation of the error gradient w E(w). The error is a simple sum over loss terms of the form E(w) = L(f w (x), y). We compute w E(w) in the following. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 21
77 Backpropagation Now we come to the computation of the error gradient w E(w). The error is a simple sum over loss terms of the form E(w) = L(f w (x), y). We compute w E(w) in the following. We write this error as ( ( ( E(w) = L σ W (n) σ W (n 1) σ (... W (1) x... ))) ), y, where W (k) are the weight matrices and σ is the componentwise nonlinearity. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 21
78 Backpropagation Now we come to the computation of the error gradient w E(w). The error is a simple sum over loss terms of the form E(w) = L(f w (x), y). We compute w E(w) in the following. We write this error as ( ( ( E(w) = L σ W (n) σ W (n 1) σ (... W (1) x... ))) ), y, where W (k) are the weight matrices and σ is the componentwise nonlinearity. The gradient can be calculated by the chain rule. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 21
79 Backpropagation Now we come to the computation of the error gradient w E(w). The error is a simple sum over loss terms of the form E(w) = L(f w (x), y). We compute w E(w) in the following. We write this error as ( ( ( E(w) = L σ W (n) σ W (n 1) σ (... W (1) x... ))) ), y, where W (k) are the weight matrices and σ is the componentwise nonlinearity. The gradient can be calculated by the chain rule. Backpropagation is an algorithm for doing this fast. It will be introduced in the next lecture. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 21
80 Summary We have introduced a simple neuron model which is composed out of linear input and a nonlinear transfer/activation function. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 22
81 Summary We have introduced a simple neuron model which is composed out of linear input and a nonlinear transfer/activation function. We have composed multiple neurons in parallel to layers, and multiple layers in sequence to feed forward neural networks (multi layer Perceptrons, MLPs). Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 22
82 Summary We have introduced a simple neuron model which is composed out of linear input and a nonlinear transfer/activation function. We have composed multiple neurons in parallel to layers, and multiple layers in sequence to feed forward neural networks (multi layer Perceptrons, MLPs). Neural networks are universal function approximators. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 22
83 Summary We have introduced a simple neuron model which is composed out of linear input and a nonlinear transfer/activation function. We have composed multiple neurons in parallel to layers, and multiple layers in sequence to feed forward neural networks (multi layer Perceptrons, MLPs). Neural networks are universal function approximators. Networks can be trained by online or batch gradient descent. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 22
84 Summary We have introduced a simple neuron model which is composed out of linear input and a nonlinear transfer/activation function. We have composed multiple neurons in parallel to layers, and multiple layers in sequence to feed forward neural networks (multi layer Perceptrons, MLPs). Neural networks are universal function approximators. Networks can be trained by online or batch gradient descent. The error gradient can be computed efficiently with the backpropagation algorithm. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 22
85 Summary We have introduced a simple neuron model which is composed out of linear input and a nonlinear transfer/activation function. We have composed multiple neurons in parallel to layers, and multiple layers in sequence to feed forward neural networks (multi layer Perceptrons, MLPs). Neural networks are universal function approximators. Networks can be trained by online or batch gradient descent. The error gradient can be computed efficiently with the backpropagation algorithm. The weights usually end up in a local optimum, not in the global optimum. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 22
86 Summary We have introduced a simple neuron model which is composed out of linear input and a nonlinear transfer/activation function. We have composed multiple neurons in parallel to layers, and multiple layers in sequence to feed forward neural networks (multi layer Perceptrons, MLPs). Neural networks are universal function approximators. Networks can be trained by online or batch gradient descent. The error gradient can be computed efficiently with the backpropagation algorithm. The weights usually end up in a local optimum, not in the global optimum. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 22
87 Acknowledgments Acknowledgments: We thank Tobias Glasmachers for providing us the material for this class which was taken from his lecture Machine Learning  Supervised Methods at the RuhrUniBochum. Dr. Hajira Jabeen, Prof. Jens Lehmann (FeedForward) Neural Networks 23
Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationArtificial Intelligence
Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons InputOutput Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory PostSynaptic Potential)
More informationMachine Learning for LargeScale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for LargeScale Data Analysis and Decision Making 8062917A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationCOMP 551 Applied Machine Learning Lecture 14: Neural Networks
COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: Ryan Lowe (ryan.lowe@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted,
More informationNeural networks. Chapter 20. Chapter 20 1
Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms
More informationIntroduction to Neural Networks
Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning
More informationAN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009
AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is
More informationNeural networks. Chapter 19, Sections 1 5 1
Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10
More informationApprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning
Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multiclass classification Learning multilayer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More informationLecture 17: Neural Networks and Deep Learning
UVA CS 6316 / CS 4501004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1Layer Neural Network Multilayer Neural Network Loss Functions
More informationARTIFICIAL INTELLIGENCE. Artificial Neural Networks
INFOB2KI 20172018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html
More informationNeural Networks and the Backpropagation Algorithm
Neural Networks and the Backpropagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the backpropagation algorithm. We closely
More informationLecture 5 Neural models for NLP
CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm3pm
More informationRegML 2018 Class 8 Deep learning
RegML 2018 Class 8 Deep learning Lorenzo Rosasco UNIGEMITIIT June 18, 2018 Supervised vs unsupervised learning? So far we have been thinking of learning schemes made in two steps f(x) = w, Φ(x) F, x
More informationArtificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino
Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as
More informationCourse 395: Machine Learning  Lectures
Course 395: Machine Learning  Lectures Lecture 12: Concept Learning (M. Pantic) Lecture 34: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 56: Evaluating Hypotheses (S. Petridis) Lecture
More informationNeural networks. Chapter 20, Section 5 1
Neural networks Chapter 20, Section 5 Chapter 20, Section 5 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 20, Section 5 2 Brains 0 neurons of
More informationARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD
ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided
More informationNeural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21
Neural Networks Chapter 8, Section 7 TB Artificial Intelligence Slides from AIMA http://aima.cs.berkeley.edu / 2 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural
More informationComputational Intelligence Winter Term 2017/18
Computational Intelligence Winter Term 207/8 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS ) Fakultät für Informatik TU Dortmund Plan for Today SingleLayer Perceptron Accelerated Learning
More informationFeedforward Neural Nets and Backpropagation
Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features
More informationLecture 4: Feed Forward Neural Networks
Lecture 4: Feed Forward Neural Networks Dr. Roman V Belavkin Middlesex University BIS4435 Biological neurons and the brain A Model of A Single Neuron Neurons as datadriven models Neural Networks Training
More informationMachine Learning and Data Mining. Multilayer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler
+ Machine Learning and Data Mining Multilayer Perceptrons & Neural Networks: Basics Prof. Alexander Ihler Linear Classifiers (Perceptrons) Linear Classifiers a linear classifier is a mapping which partitions
More informationDeep Learning Lab Course 2017 (Deep Learning Practical)
Deep Learning Lab Course 207 (Deep Learning Practical) Labs: (Computer Vision) Thomas Brox, (Robotics) Wolfram Burgard, (Machine Learning) Frank Hutter, (Neurorobotics) Joschka Boedecker University of
More informationIntroduction Biologically Motivated Crude Model Backpropagation
Introduction Biologically Motivated Crude Model Backpropagation 1 McCullochPitts Neurons In 1943 Warren S. McCulloch, a neuroscientist, and Walter Pitts, a logician, published A logical calculus of the
More informationMachine Learning. Neural Networks
Machine Learning Neural Networks Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 Biological Analogy Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 THE
More informationNeural Networks 2. 2 Receptive fields and dealing with image inputs
CS 446 Machine Learning Fall 2016 Oct 04, 2016 Neural Networks 2 Professor: Dan Roth Scribe: C. Cheng, C. Cervantes Overview Convolutional Neural Networks Recurrent Neural Networks 1 Introduction There
More informationNeural Networks. Nicholas Ruozzi University of Texas at Dallas
Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify
More informationComputational Intelligence
Plan for Today SingleLayer Perceptron Computational Intelligence Winter Term 00/ Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS ) Fakultät für Informatik TU Dortmund Accelerated Learning
More informationA summary of Deep Learning without Poor Local Minima
A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given
More informationSections 18.6 and 18.7 Artificial Neural Networks
Sections 18.6 and 18.7 Artificial Neural Networks CS4811  Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline The brain vs. artifical neural
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Nonlinearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Nonlinearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationNeural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav
Neural Networks 30.11.2015 Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav 1 Talk Outline Perceptron Combining neurons to a network Neural network, processing input to an output Learning Cost
More informationArtificial Neural Network : Training
Artificial Neural Networ : Training Debasis Samanta IIT Kharagpur debasis.samanta.iitgp@gmail.com 06.04.2018 Debasis Samanta (IIT Kharagpur) Soft Computing Applications 06.04.2018 1 / 49 Learning of neural
More informationArtificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso
Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Fall, 2018 Outline Introduction A Brief History ANN Architecture Terminology
More informationARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92
ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000
More informationArtificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen
Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition
More informationy(x n, w) t n 2. (1)
Network training: Training a neural network involves determining the weight parameter vector w that minimizes a cost function. Given a training set comprising a set of input vector {x n }, n = 1,...N,
More informationNeural Networks. Intro to AI Bert Huang Virginia Tech
Neural Networks Intro to AI Bert Huang Virginia Tech Outline Biological inspiration for artificial neural networks Linear vs. nonlinear functions Learning with neural networks: back propagation https://en.wikipedia.org/wiki/neuron#/media/file:chemical_synapse_schema_cropped.jpg
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationConvolutional Neural Networks
Convolutional Neural Networks Books» http://www.deeplearningbook.org/ Books http://neuralnetworksanddeeplearning.com/.org/ reviews» http://www.deeplearningbook.org/contents/linear_algebra.html» http://www.deeplearningbook.org/contents/prob.html»
More information4. Multilayer Perceptrons
4. Multilayer Perceptrons This is a supervised errorcorrection learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output
More informationArtificial Neural Network and Fuzzy Logic
Artificial Neural Network and Fuzzy Logic 1 Syllabus 2 Syllabus 3 Books 1. Artificial Neural Networks by B. Yagnanarayan, PHI  (Cover Topologies part of unit 1 and All part of Unit 2) 2. Neural Networks
More informationLecture 4: Perceptrons and Multilayer Perceptrons
Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II  Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons
More informationNeural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28
1 / 28 Neural Networks Mark van Rossum School of Informatics, University of Edinburgh January 15, 2018 2 / 28 Goals: Understand how (recurrent) networks behave Find a way to teach networks to do a certain
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Nonlinearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Nonlinearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationNeural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feedforward Networks Network Training Error Backpropagation Applications
Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of
More information<Special Topics in VLSI> Learning for Deep Neural Networks (Backpropagation)
Learning for Deep Neural Networks (Backpropagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent BackPropagation
More informationLast update: October 26, Neural networks. CMSC 421: Section Dana Nau
Last update: October 26, 207 Neural networks CMSC 42: Section 8.7 Dana Nau Outline Applications of neural networks Brains Neural network units Perceptrons Multilayer perceptrons 2 Example Applications
More informationSections 18.6 and 18.7 Artificial Neural Networks
Sections 18.6 and 18.7 Artificial Neural Networks CS4811  Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline The brain vs artifical neural networks
More informationNeural Networks Learning the network: Backprop , Fall 2018 Lecture 4
Neural Networks Learning the network: Backprop 11785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:
More informationCS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes
CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders
More informationIntroduction to (Convolutional) Neural Networks
Introduction to (Convolutional) Neural Networks Philipp Grohs Summer School DL and Vis, Sept 2018 Syllabus 1 Motivation and Definition 2 Universal Approximation 3 Backpropagation 4 Stochastic Gradient
More informationArtifical Neural Networks
Neural Networks Artifical Neural Networks Neural Networks Biological Neural Networks.................................. Artificial Neural Networks................................... 3 ANN Structure...........................................
More informationArtificial Neural Networks The Introduction
Artificial Neural Networks The Introduction 01001110 01100101 01110101 01110010 01101111 01101110 01101111 01110110 01100001 00100000 01110011 01101011 01110101 01110000 01101001 01101110 01100001 00100000
More informationSPSS, University of Texas at Arlington. Topics in Machine LearningEE 5359 Neural Networks
Topics in Machine LearningEE 5359 Neural Networks 1 The Perceptron Output: A perceptron is a function that maps Ddimensional vectors to real numbers. For notational convenience, we add a zeroth dimension
More informationMultilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)
Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w x + w 2 x 2 + w 0 = 0 Feature x 2 = w w 2 x w 0 w 2 Feature 2 A perceptron can separate
More informationIntroduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis
Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.
More informationPart 8: Neural Networks
METU Informatics Institute Min720 Pattern Classification ith BioMedical Applications Part 8: Neural Netors  INTRODUCTION: BIOLOGICAL VS. ARTIFICIAL Biological Neural Netors A Neuron:  A nerve cell as
More informationAI Programming CS F20 Neural Networks
AI Programming CS6622008F20 Neural Networks David Galles Department of Computer Science University of San Francisco 200: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols
More informationECE521 Lecture 7/8. Logistic Regression
ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multiclass classification 2 Logistic regression The output of a logistic regression
More informationArtificial Neural Networks. Edward Gatt
Artificial Neural Networks Edward Gatt What are Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning Very
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements FeedForward Networks Perceptrons (Singlelayer,
More informationLecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning
Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function
More informationArtificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!
Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011! 1 Todayʼs lecture" How the brain works (!)! Artificial neural networks! Perceptrons! Multilayer feedforward networks! Error
More informationArtificial Neural Networks
Introduction ANN in Action Final Observations Application: Poverty Detection Artificial Neural Networks Alvaro J. Riascos Villegas University of los Andes and Quantil July 6 2018 Artificial Neural Networks
More informationIntroduction to Convolutional Neural Networks 2018 / 02 / 23
Introduction to Convolutional Neural Networks 2018 / 02 / 23 Buzzword: CNN Convolutional neural networks (CNN, ConvNet) is a class of deep, feedforward (not recurrent) artificial neural networks that
More informationHow to do backpropagation in a brain
How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep
More informationMultilayer Perceptron
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4
More informationArtificial Neural Network
Artificial Neural Network Eung Je Woo Department of Biomedical Engineering Impedance Imaging Research Center (IIRC) Kyung Hee University Korea ejwoo@khu.ac.kr Neuron and Neuron Model McCulloch and Pitts
More informationArtificial Neural Networks. MGS Lecture 2
Artificial Neural Networks MGS 2018  Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs BackPropagation
More informationMachine Learning Basics III
Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient
More informationPattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore
Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture  27 Multilayer Feedforward Neural networks with Sigmoidal
More informationArtificial Neural Networks
Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multilayer networks Backpropagation Hidden layer representations Examples
More informationClassification goals: Make 1 guess about the label (Top1 error) Make 5 guesses about the label (Top5 error) No Bounding Box
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top1 error) Make 5 guesses
More informationNeural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Neural Networks CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Perceptrons x 0 = 1 x 1 x 2 z = h w T x Output: z x D A perceptron
More informationNeural Networks (Part 1) Goals for the lecture
Neural Networks (Part ) Mark Craven and David Page Computer Sciences 760 Spring 208 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Neural networks Daniel Hennes 21.01.2018 (WS 2017/18) University Stuttgart  IPVS  Machine Learning & Robotics 1 Today Logistic regression Neural networks Perceptron
More informationUNSUPERVISED LEARNING
UNSUPERVISED LEARNING Topics Layerwise (unsupervised) pretraining Restricted Boltzmann Machines Autoencoders LAYERWISE (UNSUPERVISED) PRETRAINING Breakthrough in 2006 Layerwise (unsupervised) pretraining
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More informationPV021: Neural networks. Tomáš Brázdil
1 PV021: Neural networks Tomáš Brázdil 2 Course organization Course materials: Main: The lecture Neural Networks and Deep Learning by Michael Nielsen http://neuralnetworksanddeeplearning.com/ (Extremely
More informationMultilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)
Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w 1 x 1 + w 2 x 2 + w 0 = 0 Feature 1 x 2 = w 1 w 2 x 1 w 0 w 2 Feature 2 A perceptron
More informationArtificial neural networks
Artificial neural networks Chapter 8, Section 7 Artificial Intelligence, spring 203, Peter Ljunglöf; based on AIMA Slides c Stuart Russel and Peter Norvig, 2004 Chapter 8, Section 7 Outline Brains Neural
More informationNeural Networks: Introduction
Neural Networks: Introduction Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai ShalevShwartz and Shai BenDavid, and others 1
More informationArtificial Neural Networks 2
CSC2515 Machine Learning Sam Roweis Artificial Neural s 2 We saw neural nets for classification. Same idea for regression. ANNs are just adaptive basis regression machines of the form: y k = j w kj σ(b
More informationIntroduction to Deep Learning
Introduction to Deep Learning A. G. Schwing & S. Fidler University of Toronto, 2015 A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 1 / 39 Outline 1 Universality of Neural Networks
More informationOptimization Methods for Machine Learning (OMML)
Optimization Methods for Machine Learning (OMML) 2nd lecture (2 slots) Prof. L. Palagi 16/10/2014 1 What is (not) Data Mining? By Namwar Rizvi  Ad Hoc Query: ad Hoc queries just examines the current data
More informationRevision: Neural Network
Revision: Neural Network Exercise 1 Tell whether each of the following statements is true or false by checking the appropriate box. Statement True False a) A perceptron is guaranteed to perfectly learn
More informationCSC242: Intro to AI. Lecture 21
CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2wide by 4high landscape pages
More informationCS:4420 Artificial Intelligence
CS:4420 Artificial Intelligence Spring 2018 Neural Networks Cesare Tinelli The University of Iowa Copyright 2004 18, Cesare Tinelli and Stuart Russell a a These notes were originally developed by Stuart
More informationMachine Learning Lecture 12
Machine Learning Lecture 12 Neural Networks 30.11.2017 Bastian Leibe RWTH Aachen http://www.vision.rwthaachen.de leibe@vision.rwthaachen.de Course Outline Fundamentals Bayes Decision Theory Probability
More informationLearning Deep Architectures for AI. Part I  Vijay Chakilam
Learning Deep Architectures for AI  Yoshua Bengio Part I  Vijay Chakilam Chapter 0: Preliminaries Neural Network Models The basic idea behind the neural network approach is to model the response as a
More information18.6 Regression and Classification with Linear Models
18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuousvalued inputs has been used for hundreds of years A univariate linear function (a straight
More informationIntroduction Neural Networks  Architecture Network Training Small Example  ZIP Codes Summary. Neural Networks  I. Henrik I Christensen
Neural Networks  I Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 303320280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Neural Networks 1 /
More informationNeural Networks in Structured Prediction. November 17, 2015
Neural Networks in Structured Prediction November 17, 2015 HWs and Paper Last homework is going to be posted soon Neural net NER tagging model This is a new structured model Paper  Thursday after Thanksgiving
More informationMultilayer Perceptrons and Backpropagation
Multilayer Perceptrons and Backpropagation Informatics 1 CG: Lecture 7 Chris Lucas School of Informatics University of Edinburgh January 31, 2017 (Slides adapted from Mirella Lapata s.) 1 / 33 Reading:
More information