4 3hrs. Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
|
|
- Derick Higgins
- 5 years ago
- Views:
Transcription
1 4 3hrs Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
2 Back to optimization Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
3 Stochastic optimization Optimize a cost that is a random variable Types of randomness: - Measurement plus noise: R + ν - Multiple effects mixed together (we might use a mixture model) - Unknown statistical properties Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
4 Monte Carlo integration Expectation of a random variable X: E {X} = ξ p x (ξ) dξ (over the whole data space E)... But only a sample {x 1,..., x n } is given (training set) Empirical distribution P x (ξ) = 1 n l=1 n δ (ξ x l) Approximate (empirical) expectation of X: E {X} = This is a Monte Carlo integral E E ξ P x (ξ) dξ = 1 n n l=1 x l Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
5 Suppose that R is classification performance (risk). We want to optimize the true risk, the one computed on all possible, infinite data: R(w) = R (y(x), w) p(x)dx. This is a function of w (the weights identify one specific neural net) It is also a function of the data distribution p(x) (the performance is estimated on the data) Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
6 When training a neural network we don t have p(x), but only the training set {x 1,..., x n } From the training set we have the empirical distribution P x (ξ) = 1 n n δ (ξ x l ) l=1 so we can compute a Monte Carlo estimate of the risk ˆR(w, X) = 1 n p this is the empirical risk. n p R (y(x l ), w) l=1 Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
7 Training by epoch Optimize using the whole training set to estimate the cost It means computing ˆR (and the W ) on the basis of a Monte Carlo estimate of risk Finds the optimal value of an approximate (empirical) cost function Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
8 Stochastic approximation A special kind of stochastic optimization R is estimated at each input pattern using that pattern alone Extremely unreliable estimation but it converges in probability! Robbins and Monro, 1951; Kiefer and Wolfowitz, 1952 Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
9 Convergence in probability: lim Pr ( ˆR n R ε ) = 0 n ˆRn is the estimate of R on a training set of size n Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
10 Stochastic approximation Given: - A function R whose gradient R we want to set to zero, or minimize (but we can t compute analytically) - A sequence G 1, G 2,..., G l,... of random samples of R, affected by random noise - A decreasing sequence η 1, η 2,..., η l,... of step size coefficients Basic iteration: w(l + 1) = w(l) η l G l Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
11 Stochastic approximation: The intuition Each sample gives a noisy (stochastic) estimate of the gradient R + noise By averaging over time, noise cancels out Random variations also make it possible to escape local minima Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
12 Results on convergence of stochastic approximation If R is twice differentiable and convex, ( ) 1 then stochastic approximation converges with a rate of O l A condition of convergence (not optimal rate of convergence): 0 < l η 2 l = A < Usually the hypotheses are not met (complex cost landscape) and we don t have guarantees. Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
13 Training by pattern is computing ˆR (and the W ) on the basis of an estimate of risk on a single point An extreme Monte Carlo estimate on a training set of one observation only Finds the approximate optimal value of an approximate cost function Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
14 Implementation of training By epoch: estimation loop, then update By pattern: estimation + update loop By pattern on a training set: l = random Learning rate η By pattern: keep it low By epoch: make it adaptive Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
15 Multi-layer neural networks Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
16 Connectionism and Parallel Distributed Processes David Rumelhart James McClelland Geoffrey Hinton Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
17 What is connectionism? Connectionism is an approach to cognitive science that characterizes learning and memory through the discrete interactions between nodes of neural networks Representation of concepts and rules not concentrated in symbols with a lot of meaning, but in sub-symbolic neural encodings (neuron activations) which have a meaning only if taken collectively as patterns Neural networks are distributed and massively parallel They rely on spontaneously-generated internal representations Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
18 Network topologies Most general: feedback. * Units may be visible or hidden (*) Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
19 Network topologies A special type of feedback is lateral connections Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
20 Network topologies Less general: a topology where cycles are forbidden: feedforward. Visible units may be input or output. Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
21 Network topologies Least general: multi-layer Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
22 Why multi-layer? Linear separability Feature discovery Hierarchies of abstractions Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
23 Example: Parity Problem: Given any input string of d bits, tell whether the number of bits set (= 1) is even. Generalizes XOR: it is not linearly separable Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
24 Example: Parity The solution requires d hidden units Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
25 Universal approximation theorem G. Cybenko 1989 A feed-forward network with a single hidden layer containing a finite number of neurons can approximate any continuous function on compact subsets of R d Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
26 How do we train a multi-layer neural network? 1 With a suitable algorithm 2 With a sequence of independent trainings Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
27 As we have seen, learning (e.g., learning to recognize) can be cast as the problem of optimizing a suitable cost function (risk) But most optimization methods rely on the necessary minimum condition E = 0 or on the direction of the gradient E requirement: E must be at least differentiable (even better if also convex, but that s not always possible) Even if E is differentiable, for hidden units we cannot compute an error term like (t a) 2 (mse) requirement: we need a way to do this Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
28 A differentiable activation function Let s write the discriminant function for a problem with two Gaussian, spherical, equal-variance classes. Translation of the origin, rotation of axes... 1-dimensional symmetrical problem in x with only two parameters p(x ω 1 ) = 1 [ ] (x µ) 2 exp 2πσ 2σ 2 p(x ω 2 ) = 1 [ ] (x + µ) 2 exp 2πσ 2σ 2 Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
29 For the Bayes theorem: P (ω 1 ) = P (ω 2 ) = p(x ω 1 )P (ω 1 ) p(x ω 1 )P (ω 1 ) + p(x ω 2 )P (ω 2 ) p(x ω 2 )P (ω 2 ) p(x ω 1 )P (ω 1 ) + p(x ω 2 )P (ω 2 ) Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
30 2-class discriminant function: g(x) = P (ω 1 ) P (ω 2 ) = [ (x µ) 2 exp [ ] exp (x µ) 2 + exp 2σ 2 removing the factors 1/ 2πσ 2σ 2 ] [ (x+µ) 2 2σ 2 ] [ exp (x+µ) 2 [ ] exp (x µ) 2 + exp 2σ 2 2σ 2 ] [ ] (x+µ) 2 2σ 2 Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
31 [ exp g(x) = exp ] x2 +µ 2 2σ 2 exp [ xµ ] [ ] σ exp x2 +µ 2 exp [ xµ ] 2 2σ 2 σ 2 [ ] x2 +µ 2 exp [ xµ ] [ ] 2σ 2 σ + exp x2 +µ 2 exp [ xµ ] 2 2σ 2 σ 2 [ ] The common positive factor exp x2 +µ 2 cancels out: 2σ 2 g(x) = e xµ σ 2 e xµ σ 2 e xµ σ 2 + e xµ σ 2 Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
32 We replace x with the score r = x w We can absorb the factor µ/σ 2 into the norm of w : We obtain w = µ σ 2 w g(r) = er e r e r + e r, r = x w g(r) = hyperbolic tangent activation, tanh(r) logistic or sigmoid activation: σ(r) = 1 tanh(r) + 1 = 1 + e r 2 Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
33 1 0.5 a 0 SIGMOID TANH r Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
34 1 0.5 a 0 HEAVISIDE SIGN r Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
35 The sigmoid is the solution of the logistic equation Therefore, by definition, y = y(1 y) σ(r) r = σ(r) ( 1 σ(r) ) Also, tanh(r) r = 1 tanh 2 (r) Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
36 The error back-propagation algorithm Discovered by Amari/Werbos/Parker/Rumelhart/Hinton/Williams from 1974 to 1986 The name appears in Rosenblatt s book Principles of Neurodynamics in 1962 A clever application of the chain rule of differential calculus We can perform gradient descent in a distributed way and without actually computing derivatives The responsibility for errors is back-propagated from the outputs back inside the network, and distributed among the hidden layers. Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
37 The chain rule Where is the chain? df(g(x)) dx df(g(h(x))) dx = df(y) dy = df(g) dg which, for instance, can be used to prove that dg(x) y=g(x) dx dg(h) dh dh(x) dx σ(r) w i = dσ(r) dr r = σ (r)x i = σ(r) ( 1 σ(r) ) x i (1) w i Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
38 Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
39 np ni nh no nw i j k x i r j r k sh j so k tg k whi ji woh kj number of patterns in the training set number of input units number of hidden units number of output units total number of weights, nw = (ni + 1)nh + (nh + 1)no index for input components index for hidden units index for output units i-th component of input pattern net stimulus of the j-th hidden unit net stimulus of the k-th output unit j-th hidden unit activation value k-th component of output k-th component of target weight to j-th hidden unit from i-th input unit [(ni + 1) nh] weight to k-th output unit from j-th hidden unit [(nh + 1) no] Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
40 Loss function λ(so k, tg k ) = (tg k so k ) 2 1 in general there may be several output units; 2 the overall cost function is not quadratic (a paraboloid) because the network is non-linear Non convex cost function Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
41 Expected cost 1 1 no E = (so k (x) tg 2 no k (x)) 2 p(x)dx (2) k=1 E is known only through its estimate on the training set (here by epoch): Ê = 1 np np l=1 1 1 no (so k (x l ) tg 2 no k (x l )) 2 (3) k=1 Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
42 Summation and differentiations are both linear and therefore can be exchanged freely. Ê = 1 1 no (so k tg 2 no k ) 2 k=1 (4) We only consider one pattern For training online = by pattern, we will apply immediately the w as we did with perceptron and Adaline For training by epoch, we will sum several w and apply them only at the end of each pass (a training epoch). For training by batch, we will sum several w and apply them only after some % of a complete pass. Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
43 The operation of the multilayer perceptron is divided in two steps: activation forward-propagation error back-propagation. Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
44 Forward propagation Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
45 Forward propagation ni j r j = whi ji x i sh j = σ(r j ) (5) k r k = i=0 nh j=0 woh kj sh j so k = σ(r k ) (6) Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
46 Error back-propagation Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
47 Error back-propagation and update we start from computation of partial derivatives, i.e., the gradient of the error. w is generically any of the weights of the network. We need all the components of the gradient Ê These are Ê w for all possible w Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
48 Ê w = no no k=1 (so k tg k ) 2 w = 1 no no k=1 (so k tg k ) so k w (7) Depending on whether w is a woh or a whi we will have different expansions of the above expression. Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
49 Hidden-to-output weights woh kj Ê woh kj = 1 no no k =1 (so k tg k ) so k r k r k sh j (8) We can drop all terms not depending on k, those with k k: Ê = 1 woh kj no (so k tg k ) so k r k (9) r k sh j We plug in quantities known from the forward pass: Ê woh kj = 1 no (so k tg k ) σ (r k )sh j (10) Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
50 If we define δ k = (so k tg k ) σ (r k ) (11) we have a generalization of the delta term which we have seen in the delta rule by Widrow and Hoff. Generalized delta rule for the hidden-to-output weights: woh kj = ηδ k sh j, (12) Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
51 Problem with the input-to-hidden weights: not all terms are readily available. We use again the chain rule to find another formulation for Ê/ whi ji Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
52 Ê = 1 1 whi ji 2 no = 1 no no k=1 no k=1 (so k tg k ) 2 whi ji = (13) (so k tg k ) so k r k r k sh j sh j whi ji (14) Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
53 Now the quantities appearing in the last equation are available, again from either the forward pass or theory: (so k tg k ) so k = δ k r k r k = woh kj sh j sh j = sh j r j = σ (r j )x i whi ji r j whi ji Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
54 Ê whi ji = 1 no no k=1 (so k tg k ) so k r k r k sh j sh j whi ji (15) = 1 no no k=1 [ δk σ (r k )woh kj ] [ σ (r j )x i ] (16) Note that the summation here does not disappear Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
55 We can further manipulate the expression, by first isolating the terms which do not contribute to the summation: [ ] 1 no [σ = δ k σ (r k )woh ] kj (r j )x i (17) no k=1 and then identifying the generalized delta for the input-to-hidden weights: [ ] 1 no δ j = δ k σ (r k )woh kj (18) no k=1 Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
56 Generalized delta rule for the input-to-hidden weights: whi ji = ηδ j x i, (19) amazingly similar in form to that for the hidden-to-output weights Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
57 Important property of multi-layer networks The layered network is the simplest possible connectivity that has the universal approximation property. Should be large enough or deep enough Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
58 Generalization and overfitting The number of weights needs to be high. We must take care of controlling overfitting. Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
59 Overfitting Is the situation where ˆR is low but ˆR R is high Symptom: While training we are happy, but then tests fail! No generalization due to too much specialization (learning the training set, not the classificatin rule) Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
60 Multi-layer perceptrons not a good model for the brain? Some evidence that the brain uses sparse (localized) rather than dense (distributed) representations. Probably both Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
61 Deep neural networks Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
62 David Hubel and Torsten Wiesel Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
63 Hubel and Wiesel placed electrodes in animals brains (visual cortex) They discovered the columnar organization of neurons Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
64 Each layer in a cortical colum extracts features from the input it receives from the previous layer These features are more and more abstract Edges Simple shapes Composite shapes Eyes, mouths, noses... Grandmother (The Grandmother Cell hypothesis) Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
65 Learning features in neural networks Internal representation in hidden layers Hierarchy requires many layers (deep networks). Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
66 Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
67 Learning: Limits of multi-layer networks Error back-propagation does not work well with very deep structures Vanishing gradient phenomenon: At each layer, the backpropagated components of the gradient become exponentially smaller. To avoid the problem: use shallow networks (theoretically sufficient). Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
68 Example of a shallow architecture Support vector machines Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
69 Representational advantage of depth In the 80 s and early 90 s some works proved that: some logical functions, that can be implemented with a depth of k layers, require exponentially more units if reduced to k 1 layers In the 2010 s: dependent inputs (variables) need very deep networks Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
70 How to avoid training the whole network altogether? Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
71 Multi-level hierarchies of networks Cascaded networks of unsupervised layers trained one after the other + Final classification layer The whole structure is finally trained with error back-propagation Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
72 The idea is not new: Neocognitron K. Fukushima, 1987 Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
73 Unsupervised learning principles Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
74 Information Bottleneck Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
75 Information Bottleneck Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
76 Techniques using the "information bottleneck" principle Using statistics and entropy Coding theory Stochastic complexity and minimum description length Using errors Autoencoders PCA Rate-distortion theory Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
77 Autoencoders An autoencoder is a special case of a multi-layer perceptron charcterized by two aspects: 1 Structure: number of units in the input layer = number of units in the output layer > number of hidden units 2 Learned task: an autoencoder is trained to approximate the identity function (= replicate its input at the output) An autoencoder is not a classifier Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
78 Autoencoders Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
79 Autoencoders What is interesting is not the output value (is an approximation to the input) but the pattern present on the hidden layer Since we don t use any target (the target coincides with the input), the autoencoder task is unsupervised Sometimes termed "self-supervised" Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
80 Learned features from a set of images Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
81 Recognizing handwritten digits Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
82 Features for recognizing 0 from 8 Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
83 Features for recognizing 1 from 8 Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
84 An example of an autoencoder for learning features from symbolic data Task: diagnose Lyme disease from patient records Problem: many features (observed signs and symptoms) are binary and very sparse Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
85 An example of an autoencoder for learning features from symbolic data Learning the features Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
86 An example of an autoencoder for learning features from symbolic data Using the learned features Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
87 Principal component analysis Is an instance of factor analysis: Discover the few unobservable factors that give rise to observable (measurable) variables Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
88 Example of factor analysis problem: Discover the abilities underlying performance in school tests Observed variables Hidden factors Marks in algebra test Marks in geometry test Marks in literature test Marks in foreign language test Marks in music test Marks in essay Linguistic ability Spatial ability Symbolic processing ability Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
89 Principal Component Analysis or PCA is a linear solution to the factor analysis problem Linear: factors are linear combinations of patterns v = λ 1 x 1 + λ 2 x λ d x d Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
90 PCA works on the Covariance matrix of data Covariance between input x i and input x j : σ i,j = σ j,i = E {(x i x i )(x j x j )} E{} expectation (or mean over te training set), x i mean of i-th input Σ = σ 1,1 σ 1,2... σ 1,d σ 2,1 σ 2,2... σ 2,d..... σ d,1 σ d,2... σ d,d Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
91 Note: If X is the training set as a matrix and all inputs have zero mean, i.e., X X, then Σ = X T X X = X-repmat(mean(X),size(X,1),1) Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
92 Principal components The "factors" in PCA are called principal components and are given by the eigenvectors of Σ: v 1,..., v d If we project pattern x = [x 1, x 2,..., x d ] onto the component v i = [v 1, v 2,..., v d ] we obtain the value of the i-th factor, or component, or feature, for pattern x: a i = x v i = i x i v i OK, components; but why "principal"? Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
93 Property 1 Eigenvectors of Σ can be ordered by the corresponding eigenvalues, from largest to smallest 2 Eigenvectors are thus ordered by variance or energy or level of activity from largest to smallest 3 Projection of the training set X onto the first r (principal) components gives the best rank r approximation to X itself, when measured by mean square error PCA is a form of lossy compression The principal components are features useful to represent the data in a synthetic way (information bottleneck) Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
94 It has been proved that an autoencoder with linear activations learns the principal components This is because the objective is the mean squared reconstruction error of a lower-rank representation, the same as PCA Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
95 Oja s neuron A single-unit model with linear (identity) activation a = x w Learning rule: w w + ηx(a aw) Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
96 It can be proven that, for small η, Oja s learning rule is a first-order Taylor approximation of the Rayleigh quotient iteration method of finding the principal eigenvector. At convergence, w is the principal component of Σ. Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
97 Oja s neuron is a neural principal component analyzer Advantages over using explicit eigensolvers (e.g., LAPACK eigensolver, or Matlab s eig function): 1 Distributed 2 Online (big data!) Disadvantages: 1 Stochastic (convergence in probability) 2 Slower because of the requirement of small η Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
98 Restricted Boltzmann Machines A generative model Invented by G. Hinton Started in the Eighties (Boltzmann machines) then developed in the following decades Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
99 Boltzmann Machines: binary-valued units bi-directional connections symmetric weight (equal in the two directions) general topology (feedback possible) Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
100 The restricted version has the limitation that its topology must be a bipartite graph This makes it more tractable Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
101 Energy v = [v i ] and h = [h i ] visible and hidden unit activation values, respectively w i,j weight between v i and h j a i and b i biases of visible and hidden units, respectively then we can define an "energy" E(v, h) = i a i v i j b j h j i v i w i,j h j j Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
102 Probability of states The probability of any possible network state is P (v, h) = 1 Z e E(v,h) with Z partition function (normalizer) Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
103 Probability of states Since intra-layer connections are not present, probability of activation of one unit does not depend on that of other units in the same layer only in the other layer P (v i = 1 h) = e (a i+ j w i,jh j ) P (h j = 1 v) = e (b j+ i w i,jv i ) Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
104 Training a RBM Algorithm called contrastive divergence Uses random sampling from the probabilities (computed as above): Apply one input Compute probability P (h v) - Sample from it to generate hidden configuration Compute a positive update step w + = vh T (outer product) Generate one possible input v from the hidden configuration Compute probability P (h v ) Compute a negative update step w = v h T Apply update: w w + η( w + w ) This does not optimize any explicit objective function!! Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
105 Training RBMs of large size is not simple There are tricks to make the task easier Example: weight sharing and convolutional neural networks These help with data having correlated inputs, as in image, video, speech, general time series. Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
106 Deep Belief Networks A DBN is a sequence of RBMs Each RBM can be trained independently of the following ones greedy strategy The last layer can be a classifier Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
107 Deep networks can be built out of RBMs, but also out of autoencoders Autoencoders are less insensitive to random noise Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
108 Neural networks: Why bothering? Deep learning achieved success in very complex tasks and won many competitions Example: extracting words from audio and transforming them in automatic subtitles (cfr. Youtube) Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
109 T H E E N D Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
Introduction to Neural Networks
Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning
More informationCS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes
CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders
More informationUNSUPERVISED LEARNING
UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training
More informationRegML 2018 Class 8 Deep learning
RegML 2018 Class 8 Deep learning Lorenzo Rosasco UNIGE-MIT-IIT June 18, 2018 Supervised vs unsupervised learning? So far we have been thinking of learning schemes made in two steps f(x) = w, Φ(x) F, x
More informationHow to do backpropagation in a brain
How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationDEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY
DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA
More informationMark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.
University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x
More informationMachine Learning. Neural Networks
Machine Learning Neural Networks Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 Biological Analogy Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 THE
More information4. Multilayer Perceptrons
4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationArtificial Neural Networks. MGS Lecture 2
Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation
More informationSpeaker Representation and Verification Part II. by Vasileios Vasilakakis
Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationBack-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples
Back-Propagation Algorithm Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples 1 Inner-product net =< w, x >= w x cos(θ) net = n i=1 w i x i A measure
More informationArtificial Neural Networks Examination, June 2004
Artificial Neural Networks Examination, June 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationIntroduction to Machine Learning
Introduction to Machine Learning Neural Networks Varun Chandola x x 5 Input Outline Contents February 2, 207 Extending Perceptrons 2 Multi Layered Perceptrons 2 2. Generalizing to Multiple Labels.................
More informationArtificial Neural Networks Examination, June 2005
Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either
More informationReading Group on Deep Learning Session 1
Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular
More informationDeep unsupervised learning
Deep unsupervised learning Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea Unsupervised learning In machine learning, there are 3 kinds of learning paradigm.
More informationy(x n, w) t n 2. (1)
Network training: Training a neural network involves determining the weight parameter vector w that minimizes a cost function. Given a training set comprising a set of input vector {x n }, n = 1,...N,
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationNeural Networks biological neuron artificial neuron 1
Neural Networks biological neuron artificial neuron 1 A two-layer neural network Output layer (activation represents classification) Weighted connections Hidden layer ( internal representation ) Input
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationNeural Networks and Deep Learning.
Neural Networks and Deep Learning www.cs.wisc.edu/~dpage/cs760/ 1 Goals for the lecture you should understand the following concepts perceptrons the perceptron training rule linear separability hidden
More informationFeedforward Neural Nets and Backpropagation
Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationNeural networks. Chapter 20, Section 5 1
Neural networks Chapter 20, Section 5 Chapter 20, Section 5 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 20, Section 5 2 Brains 0 neurons of
More informationLecture 4: Perceptrons and Multilayer Perceptrons
Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons
More informationArtificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen
Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable
More informationDeep Feedforward Networks. Seung-Hoon Na Chonbuk National University
Deep Feedforward Networks Seung-Hoon Na Chonbuk National University Neural Network: Types Feedforward neural networks (FNN) = Deep feedforward networks = multilayer perceptrons (MLP) No feedback connections
More informationMultilayer Perceptrons (MLPs)
CSE 5526: Introduction to Neural Networks Multilayer Perceptrons (MLPs) 1 Motivation Multilayer networks are more powerful than singlelayer nets Example: XOR problem x 2 1 AND x o x 1 x 2 +1-1 o x x 1-1
More informationIntroduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis
Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.
More informationNeural networks. Chapter 19, Sections 1 5 1
Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10
More informationReading Group on Deep Learning Session 4 Unsupervised Neural Networks
Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Jakob Verbeek & Daan Wynen 206-09-22 Jakob Verbeek & Daan Wynen Unsupervised Neural Networks Outline Autoencoders Restricted) Boltzmann
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationNeural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28
1 / 28 Neural Networks Mark van Rossum School of Informatics, University of Edinburgh January 15, 2018 2 / 28 Goals: Understand how (recurrent) networks behave Find a way to teach networks to do a certain
More informationDeep Neural Networks
Deep Neural Networks DT2118 Speech and Speaker Recognition Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 45 Outline State-to-Output Probability Model Artificial Neural Networks Perceptron Multi
More informationThe XOR problem. Machine learning for vision. The XOR problem. The XOR problem. x 1 x 2. x 2. x 1. Fall Roland Memisevic
The XOR problem Fall 2013 x 2 Lecture 9, February 25, 2015 x 1 The XOR problem The XOR problem x 1 x 2 x 2 x 1 (picture adapted from Bishop 2006) It s the features, stupid It s the features, stupid The
More informationKnowledge Extraction from DBNs for Images
Knowledge Extraction from DBNs for Images Son N. Tran and Artur d Avila Garcez Department of Computer Science City University London Contents 1 Introduction 2 Knowledge Extraction from DBNs 3 Experimental
More informationArtificial Neural Networks Examination, March 2004
Artificial Neural Networks Examination, March 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum
More informationMultilayer Perceptrons and Backpropagation
Multilayer Perceptrons and Backpropagation Informatics 1 CG: Lecture 7 Chris Lucas School of Informatics University of Edinburgh January 31, 2017 (Slides adapted from Mirella Lapata s.) 1 / 33 Reading:
More informationLearning Deep Architectures for AI. Part II - Vijay Chakilam
Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model
More informationGreedy Layer-Wise Training of Deep Networks
Greedy Layer-Wise Training of Deep Networks Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle NIPS 2007 Presented by Ahmed Hefny Story so far Deep neural nets are more expressive: Can learn
More informationThe Origin of Deep Learning. Lili Mou Jan, 2015
The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets
More informationLecture 16 Deep Neural Generative Models
Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed
More informationUnit III. A Survey of Neural Network Model
Unit III A Survey of Neural Network Model 1 Single Layer Perceptron Perceptron the first adaptive network architecture was invented by Frank Rosenblatt in 1957. It can be used for the classification of
More informationAdministration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6
Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects
More informationNeural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21
Neural Networks Chapter 8, Section 7 TB Artificial Intelligence Slides from AIMA http://aima.cs.berkeley.edu / 2 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural
More informationMultilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)
Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w x + w 2 x 2 + w 0 = 0 Feature x 2 = w w 2 x w 0 w 2 Feature 2 A perceptron can separate
More informationMaster Recherche IAC TC2: Apprentissage Statistique & Optimisation
Master Recherche IAC TC2: Apprentissage Statistique & Optimisation Alexandre Allauzen Anne Auger Michèle Sebag LIMSI LRI Oct. 4th, 2012 This course Bio-inspired algorithms Classical Neural Nets History
More informationMultilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)
Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w 1 x 1 + w 2 x 2 + w 0 = 0 Feature 1 x 2 = w 1 w 2 x 1 w 0 w 2 Feature 2 A perceptron
More informationSerious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions
BACK-PROPAGATION NETWORKS Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks Cannot approximate (learn) non-linear functions Difficult (if not impossible) to design
More informationNeural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann
Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable
More information18.6 Regression and Classification with Linear Models
18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight
More informationArtificial Neural Networks
Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples
More informationCourse Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch
Psychology 452 Week 12: Deep Learning What Is Deep Learning? Preliminary Ideas (that we already know!) The Restricted Boltzmann Machine (RBM) Many Layers of RBMs Pros and Cons of Deep Learning Course Structure
More informationApprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning
Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire
More informationNeural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications
Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationNeural networks. Chapter 20. Chapter 20 1
Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms
More informationPattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore
Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal
More informationNeural Networks (Part 1) Goals for the lecture
Neural Networks (Part ) Mark Craven and David Page Computer Sciences 760 Spring 208 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed
More informationArtificial Neural Networks
Artificial Neural Networks Threshold units Gradient descent Multilayer networks Backpropagation Hidden layer representations Example: Face Recognition Advanced topics 1 Connectionist Models Consider humans:
More informationA summary of Deep Learning without Poor Local Minima
A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given
More informationStatistical NLP for the Web
Statistical NLP for the Web Neural Networks, Deep Belief Networks Sameer Maskey Week 8, October 24, 2012 *some slides from Andrew Rosenberg Announcements Please ask HW2 related questions in courseworks
More informationCSC321 Lecture 5: Multilayer Perceptrons
CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3
More informationChapter 3 Supervised learning:
Chapter 3 Supervised learning: Multilayer Networks I Backpropagation Learning Architecture: Feedforward network of at least one layer of non-linear hidden nodes, e.g., # of layers L 2 (not counting the
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationIntroduction to Convolutional Neural Networks (CNNs)
Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei
More informationDeep Belief Networks are compact universal approximators
1 Deep Belief Networks are compact universal approximators Nicolas Le Roux 1, Yoshua Bengio 2 1 Microsoft Research Cambridge 2 University of Montreal Keywords: Deep Belief Networks, Universal Approximation
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I
Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 2012 Engineering Part IIB: Module 4F10 Introduction In
More informationNeural Networks. William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]
Neural Networks William Cohen 10-601 [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ] WHAT ARE NEURAL NETWORKS? William s notation Logis;c regression + 1
More informationAI Programming CS F-20 Neural Networks
AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco 20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols
More informationFeed-forward Network Functions
Feed-forward Network Functions Sargur Srihari Topics 1. Extension of linear models 2. Feed-forward Network Functions 3. Weight-space symmetries 2 Recap of Linear Models Linear Models for Regression, Classification
More informationCOMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017
COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines COMP9444 17s2 Boltzmann Machines 1 Outline Content Addressable Memory Hopfield Network Generative Models Boltzmann Machine Restricted Boltzmann
More informationAn efficient way to learn deep generative models
An efficient way to learn deep generative models Geoffrey Hinton Canadian Institute for Advanced Research & Department of Computer Science University of Toronto Joint work with: Ruslan Salakhutdinov, Yee-Whye
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Neural networks Daniel Hennes 21.01.2018 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Logistic regression Neural networks Perceptron
More informationRestricted Boltzmann Machines
Restricted Boltzmann Machines Boltzmann Machine(BM) A Boltzmann machine extends a stochastic Hopfield network to include hidden units. It has binary (0 or 1) visible vector unit x and hidden (latent) vector
More informationCSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!
CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!! November 18, 2015 THE EXAM IS CLOSED BOOK. Once the exam has started, SORRY, NO TALKING!!! No, you can t even say see ya
More information10. Artificial Neural Networks
Foundations of Machine Learning CentraleSupélec Fall 217 1. Artificial Neural Networks Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Learning
More informationArtificial Intelligence
Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement
More informationDeep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning Ian Goodfellow Last updated
Deep Feedforward Networks Lecture slides for Chapter 6 of Deep Learning www.deeplearningbook.org Ian Goodfellow Last updated 2016-10-04 Roadmap Example: Learning XOR Gradient-Based Learning Hidden Units
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17
3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural
More informationDeep Feedforward Networks
Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3
More informationArtificial Neural Network
Artificial Neural Network Eung Je Woo Department of Biomedical Engineering Impedance Imaging Research Center (IIRC) Kyung Hee University Korea ejwoo@khu.ac.kr Neuron and Neuron Model McCulloch and Pitts
More informationDeep Neural Networks
Universidad Autónoma de Madrid Escuela Politécnica Superior - Departamento de Ingeniería Informática Facultad de Ciencias - Departamento de Matemáticas Deep Neural Networks Master s thesis presented to
More informationPATTERN CLASSIFICATION
PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS
More informationArtificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino
Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as
More informationCourse 395: Machine Learning - Lectures
Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture
More informationSections 18.6 and 18.7 Analysis of Artificial Neural Networks
Sections 18.6 and 18.7 Analysis of Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline Univariate regression
More informationArtificial Neural Networks. Q550: Models in Cognitive Science Lecture 5
Artificial Neural Networks Q550: Models in Cognitive Science Lecture 5 "Intelligence is 10 million rules." --Doug Lenat The human brain has about 100 billion neurons. With an estimated average of one thousand
More information