4 3hrs. Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

Size: px
Start display at page:

Download "4 3hrs. Stefano Rovetta Introduction to neural networks 20/23-Jul / 109"

Transcription

1 4 3hrs Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

2 Back to optimization Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

3 Stochastic optimization Optimize a cost that is a random variable Types of randomness: - Measurement plus noise: R + ν - Multiple effects mixed together (we might use a mixture model) - Unknown statistical properties Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

4 Monte Carlo integration Expectation of a random variable X: E {X} = ξ p x (ξ) dξ (over the whole data space E)... But only a sample {x 1,..., x n } is given (training set) Empirical distribution P x (ξ) = 1 n l=1 n δ (ξ x l) Approximate (empirical) expectation of X: E {X} = This is a Monte Carlo integral E E ξ P x (ξ) dξ = 1 n n l=1 x l Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

5 Suppose that R is classification performance (risk). We want to optimize the true risk, the one computed on all possible, infinite data: R(w) = R (y(x), w) p(x)dx. This is a function of w (the weights identify one specific neural net) It is also a function of the data distribution p(x) (the performance is estimated on the data) Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

6 When training a neural network we don t have p(x), but only the training set {x 1,..., x n } From the training set we have the empirical distribution P x (ξ) = 1 n n δ (ξ x l ) l=1 so we can compute a Monte Carlo estimate of the risk ˆR(w, X) = 1 n p this is the empirical risk. n p R (y(x l ), w) l=1 Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

7 Training by epoch Optimize using the whole training set to estimate the cost It means computing ˆR (and the W ) on the basis of a Monte Carlo estimate of risk Finds the optimal value of an approximate (empirical) cost function Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

8 Stochastic approximation A special kind of stochastic optimization R is estimated at each input pattern using that pattern alone Extremely unreliable estimation but it converges in probability! Robbins and Monro, 1951; Kiefer and Wolfowitz, 1952 Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

9 Convergence in probability: lim Pr ( ˆR n R ε ) = 0 n ˆRn is the estimate of R on a training set of size n Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

10 Stochastic approximation Given: - A function R whose gradient R we want to set to zero, or minimize (but we can t compute analytically) - A sequence G 1, G 2,..., G l,... of random samples of R, affected by random noise - A decreasing sequence η 1, η 2,..., η l,... of step size coefficients Basic iteration: w(l + 1) = w(l) η l G l Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

11 Stochastic approximation: The intuition Each sample gives a noisy (stochastic) estimate of the gradient R + noise By averaging over time, noise cancels out Random variations also make it possible to escape local minima Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

12 Results on convergence of stochastic approximation If R is twice differentiable and convex, ( ) 1 then stochastic approximation converges with a rate of O l A condition of convergence (not optimal rate of convergence): 0 < l η 2 l = A < Usually the hypotheses are not met (complex cost landscape) and we don t have guarantees. Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

13 Training by pattern is computing ˆR (and the W ) on the basis of an estimate of risk on a single point An extreme Monte Carlo estimate on a training set of one observation only Finds the approximate optimal value of an approximate cost function Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

14 Implementation of training By epoch: estimation loop, then update By pattern: estimation + update loop By pattern on a training set: l = random Learning rate η By pattern: keep it low By epoch: make it adaptive Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

15 Multi-layer neural networks Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

16 Connectionism and Parallel Distributed Processes David Rumelhart James McClelland Geoffrey Hinton Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

17 What is connectionism? Connectionism is an approach to cognitive science that characterizes learning and memory through the discrete interactions between nodes of neural networks Representation of concepts and rules not concentrated in symbols with a lot of meaning, but in sub-symbolic neural encodings (neuron activations) which have a meaning only if taken collectively as patterns Neural networks are distributed and massively parallel They rely on spontaneously-generated internal representations Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

18 Network topologies Most general: feedback. * Units may be visible or hidden (*) Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

19 Network topologies A special type of feedback is lateral connections Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

20 Network topologies Less general: a topology where cycles are forbidden: feedforward. Visible units may be input or output. Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

21 Network topologies Least general: multi-layer Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

22 Why multi-layer? Linear separability Feature discovery Hierarchies of abstractions Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

23 Example: Parity Problem: Given any input string of d bits, tell whether the number of bits set (= 1) is even. Generalizes XOR: it is not linearly separable Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

24 Example: Parity The solution requires d hidden units Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

25 Universal approximation theorem G. Cybenko 1989 A feed-forward network with a single hidden layer containing a finite number of neurons can approximate any continuous function on compact subsets of R d Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

26 How do we train a multi-layer neural network? 1 With a suitable algorithm 2 With a sequence of independent trainings Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

27 As we have seen, learning (e.g., learning to recognize) can be cast as the problem of optimizing a suitable cost function (risk) But most optimization methods rely on the necessary minimum condition E = 0 or on the direction of the gradient E requirement: E must be at least differentiable (even better if also convex, but that s not always possible) Even if E is differentiable, for hidden units we cannot compute an error term like (t a) 2 (mse) requirement: we need a way to do this Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

28 A differentiable activation function Let s write the discriminant function for a problem with two Gaussian, spherical, equal-variance classes. Translation of the origin, rotation of axes... 1-dimensional symmetrical problem in x with only two parameters p(x ω 1 ) = 1 [ ] (x µ) 2 exp 2πσ 2σ 2 p(x ω 2 ) = 1 [ ] (x + µ) 2 exp 2πσ 2σ 2 Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

29 For the Bayes theorem: P (ω 1 ) = P (ω 2 ) = p(x ω 1 )P (ω 1 ) p(x ω 1 )P (ω 1 ) + p(x ω 2 )P (ω 2 ) p(x ω 2 )P (ω 2 ) p(x ω 1 )P (ω 1 ) + p(x ω 2 )P (ω 2 ) Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

30 2-class discriminant function: g(x) = P (ω 1 ) P (ω 2 ) = [ (x µ) 2 exp [ ] exp (x µ) 2 + exp 2σ 2 removing the factors 1/ 2πσ 2σ 2 ] [ (x+µ) 2 2σ 2 ] [ exp (x+µ) 2 [ ] exp (x µ) 2 + exp 2σ 2 2σ 2 ] [ ] (x+µ) 2 2σ 2 Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

31 [ exp g(x) = exp ] x2 +µ 2 2σ 2 exp [ xµ ] [ ] σ exp x2 +µ 2 exp [ xµ ] 2 2σ 2 σ 2 [ ] x2 +µ 2 exp [ xµ ] [ ] 2σ 2 σ + exp x2 +µ 2 exp [ xµ ] 2 2σ 2 σ 2 [ ] The common positive factor exp x2 +µ 2 cancels out: 2σ 2 g(x) = e xµ σ 2 e xµ σ 2 e xµ σ 2 + e xµ σ 2 Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

32 We replace x with the score r = x w We can absorb the factor µ/σ 2 into the norm of w : We obtain w = µ σ 2 w g(r) = er e r e r + e r, r = x w g(r) = hyperbolic tangent activation, tanh(r) logistic or sigmoid activation: σ(r) = 1 tanh(r) + 1 = 1 + e r 2 Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

33 1 0.5 a 0 SIGMOID TANH r Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

34 1 0.5 a 0 HEAVISIDE SIGN r Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

35 The sigmoid is the solution of the logistic equation Therefore, by definition, y = y(1 y) σ(r) r = σ(r) ( 1 σ(r) ) Also, tanh(r) r = 1 tanh 2 (r) Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

36 The error back-propagation algorithm Discovered by Amari/Werbos/Parker/Rumelhart/Hinton/Williams from 1974 to 1986 The name appears in Rosenblatt s book Principles of Neurodynamics in 1962 A clever application of the chain rule of differential calculus We can perform gradient descent in a distributed way and without actually computing derivatives The responsibility for errors is back-propagated from the outputs back inside the network, and distributed among the hidden layers. Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

37 The chain rule Where is the chain? df(g(x)) dx df(g(h(x))) dx = df(y) dy = df(g) dg which, for instance, can be used to prove that dg(x) y=g(x) dx dg(h) dh dh(x) dx σ(r) w i = dσ(r) dr r = σ (r)x i = σ(r) ( 1 σ(r) ) x i (1) w i Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

38 Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

39 np ni nh no nw i j k x i r j r k sh j so k tg k whi ji woh kj number of patterns in the training set number of input units number of hidden units number of output units total number of weights, nw = (ni + 1)nh + (nh + 1)no index for input components index for hidden units index for output units i-th component of input pattern net stimulus of the j-th hidden unit net stimulus of the k-th output unit j-th hidden unit activation value k-th component of output k-th component of target weight to j-th hidden unit from i-th input unit [(ni + 1) nh] weight to k-th output unit from j-th hidden unit [(nh + 1) no] Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

40 Loss function λ(so k, tg k ) = (tg k so k ) 2 1 in general there may be several output units; 2 the overall cost function is not quadratic (a paraboloid) because the network is non-linear Non convex cost function Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

41 Expected cost 1 1 no E = (so k (x) tg 2 no k (x)) 2 p(x)dx (2) k=1 E is known only through its estimate on the training set (here by epoch): Ê = 1 np np l=1 1 1 no (so k (x l ) tg 2 no k (x l )) 2 (3) k=1 Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

42 Summation and differentiations are both linear and therefore can be exchanged freely. Ê = 1 1 no (so k tg 2 no k ) 2 k=1 (4) We only consider one pattern For training online = by pattern, we will apply immediately the w as we did with perceptron and Adaline For training by epoch, we will sum several w and apply them only at the end of each pass (a training epoch). For training by batch, we will sum several w and apply them only after some % of a complete pass. Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

43 The operation of the multilayer perceptron is divided in two steps: activation forward-propagation error back-propagation. Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

44 Forward propagation Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

45 Forward propagation ni j r j = whi ji x i sh j = σ(r j ) (5) k r k = i=0 nh j=0 woh kj sh j so k = σ(r k ) (6) Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

46 Error back-propagation Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

47 Error back-propagation and update we start from computation of partial derivatives, i.e., the gradient of the error. w is generically any of the weights of the network. We need all the components of the gradient Ê These are Ê w for all possible w Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

48 Ê w = no no k=1 (so k tg k ) 2 w = 1 no no k=1 (so k tg k ) so k w (7) Depending on whether w is a woh or a whi we will have different expansions of the above expression. Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

49 Hidden-to-output weights woh kj Ê woh kj = 1 no no k =1 (so k tg k ) so k r k r k sh j (8) We can drop all terms not depending on k, those with k k: Ê = 1 woh kj no (so k tg k ) so k r k (9) r k sh j We plug in quantities known from the forward pass: Ê woh kj = 1 no (so k tg k ) σ (r k )sh j (10) Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

50 If we define δ k = (so k tg k ) σ (r k ) (11) we have a generalization of the delta term which we have seen in the delta rule by Widrow and Hoff. Generalized delta rule for the hidden-to-output weights: woh kj = ηδ k sh j, (12) Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

51 Problem with the input-to-hidden weights: not all terms are readily available. We use again the chain rule to find another formulation for Ê/ whi ji Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

52 Ê = 1 1 whi ji 2 no = 1 no no k=1 no k=1 (so k tg k ) 2 whi ji = (13) (so k tg k ) so k r k r k sh j sh j whi ji (14) Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

53 Now the quantities appearing in the last equation are available, again from either the forward pass or theory: (so k tg k ) so k = δ k r k r k = woh kj sh j sh j = sh j r j = σ (r j )x i whi ji r j whi ji Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

54 Ê whi ji = 1 no no k=1 (so k tg k ) so k r k r k sh j sh j whi ji (15) = 1 no no k=1 [ δk σ (r k )woh kj ] [ σ (r j )x i ] (16) Note that the summation here does not disappear Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

55 We can further manipulate the expression, by first isolating the terms which do not contribute to the summation: [ ] 1 no [σ = δ k σ (r k )woh ] kj (r j )x i (17) no k=1 and then identifying the generalized delta for the input-to-hidden weights: [ ] 1 no δ j = δ k σ (r k )woh kj (18) no k=1 Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

56 Generalized delta rule for the input-to-hidden weights: whi ji = ηδ j x i, (19) amazingly similar in form to that for the hidden-to-output weights Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

57 Important property of multi-layer networks The layered network is the simplest possible connectivity that has the universal approximation property. Should be large enough or deep enough Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

58 Generalization and overfitting The number of weights needs to be high. We must take care of controlling overfitting. Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

59 Overfitting Is the situation where ˆR is low but ˆR R is high Symptom: While training we are happy, but then tests fail! No generalization due to too much specialization (learning the training set, not the classificatin rule) Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

60 Multi-layer perceptrons not a good model for the brain? Some evidence that the brain uses sparse (localized) rather than dense (distributed) representations. Probably both Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

61 Deep neural networks Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

62 David Hubel and Torsten Wiesel Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

63 Hubel and Wiesel placed electrodes in animals brains (visual cortex) They discovered the columnar organization of neurons Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

64 Each layer in a cortical colum extracts features from the input it receives from the previous layer These features are more and more abstract Edges Simple shapes Composite shapes Eyes, mouths, noses... Grandmother (The Grandmother Cell hypothesis) Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

65 Learning features in neural networks Internal representation in hidden layers Hierarchy requires many layers (deep networks). Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

66 Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

67 Learning: Limits of multi-layer networks Error back-propagation does not work well with very deep structures Vanishing gradient phenomenon: At each layer, the backpropagated components of the gradient become exponentially smaller. To avoid the problem: use shallow networks (theoretically sufficient). Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

68 Example of a shallow architecture Support vector machines Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

69 Representational advantage of depth In the 80 s and early 90 s some works proved that: some logical functions, that can be implemented with a depth of k layers, require exponentially more units if reduced to k 1 layers In the 2010 s: dependent inputs (variables) need very deep networks Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

70 How to avoid training the whole network altogether? Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

71 Multi-level hierarchies of networks Cascaded networks of unsupervised layers trained one after the other + Final classification layer The whole structure is finally trained with error back-propagation Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

72 The idea is not new: Neocognitron K. Fukushima, 1987 Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

73 Unsupervised learning principles Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

74 Information Bottleneck Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

75 Information Bottleneck Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

76 Techniques using the "information bottleneck" principle Using statistics and entropy Coding theory Stochastic complexity and minimum description length Using errors Autoencoders PCA Rate-distortion theory Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

77 Autoencoders An autoencoder is a special case of a multi-layer perceptron charcterized by two aspects: 1 Structure: number of units in the input layer = number of units in the output layer > number of hidden units 2 Learned task: an autoencoder is trained to approximate the identity function (= replicate its input at the output) An autoencoder is not a classifier Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

78 Autoencoders Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

79 Autoencoders What is interesting is not the output value (is an approximation to the input) but the pattern present on the hidden layer Since we don t use any target (the target coincides with the input), the autoencoder task is unsupervised Sometimes termed "self-supervised" Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

80 Learned features from a set of images Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

81 Recognizing handwritten digits Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

82 Features for recognizing 0 from 8 Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

83 Features for recognizing 1 from 8 Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

84 An example of an autoencoder for learning features from symbolic data Task: diagnose Lyme disease from patient records Problem: many features (observed signs and symptoms) are binary and very sparse Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

85 An example of an autoencoder for learning features from symbolic data Learning the features Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

86 An example of an autoencoder for learning features from symbolic data Using the learned features Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

87 Principal component analysis Is an instance of factor analysis: Discover the few unobservable factors that give rise to observable (measurable) variables Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

88 Example of factor analysis problem: Discover the abilities underlying performance in school tests Observed variables Hidden factors Marks in algebra test Marks in geometry test Marks in literature test Marks in foreign language test Marks in music test Marks in essay Linguistic ability Spatial ability Symbolic processing ability Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

89 Principal Component Analysis or PCA is a linear solution to the factor analysis problem Linear: factors are linear combinations of patterns v = λ 1 x 1 + λ 2 x λ d x d Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

90 PCA works on the Covariance matrix of data Covariance between input x i and input x j : σ i,j = σ j,i = E {(x i x i )(x j x j )} E{} expectation (or mean over te training set), x i mean of i-th input Σ = σ 1,1 σ 1,2... σ 1,d σ 2,1 σ 2,2... σ 2,d..... σ d,1 σ d,2... σ d,d Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

91 Note: If X is the training set as a matrix and all inputs have zero mean, i.e., X X, then Σ = X T X X = X-repmat(mean(X),size(X,1),1) Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

92 Principal components The "factors" in PCA are called principal components and are given by the eigenvectors of Σ: v 1,..., v d If we project pattern x = [x 1, x 2,..., x d ] onto the component v i = [v 1, v 2,..., v d ] we obtain the value of the i-th factor, or component, or feature, for pattern x: a i = x v i = i x i v i OK, components; but why "principal"? Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

93 Property 1 Eigenvectors of Σ can be ordered by the corresponding eigenvalues, from largest to smallest 2 Eigenvectors are thus ordered by variance or energy or level of activity from largest to smallest 3 Projection of the training set X onto the first r (principal) components gives the best rank r approximation to X itself, when measured by mean square error PCA is a form of lossy compression The principal components are features useful to represent the data in a synthetic way (information bottleneck) Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

94 It has been proved that an autoencoder with linear activations learns the principal components This is because the objective is the mean squared reconstruction error of a lower-rank representation, the same as PCA Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

95 Oja s neuron A single-unit model with linear (identity) activation a = x w Learning rule: w w + ηx(a aw) Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

96 It can be proven that, for small η, Oja s learning rule is a first-order Taylor approximation of the Rayleigh quotient iteration method of finding the principal eigenvector. At convergence, w is the principal component of Σ. Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

97 Oja s neuron is a neural principal component analyzer Advantages over using explicit eigensolvers (e.g., LAPACK eigensolver, or Matlab s eig function): 1 Distributed 2 Online (big data!) Disadvantages: 1 Stochastic (convergence in probability) 2 Slower because of the requirement of small η Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

98 Restricted Boltzmann Machines A generative model Invented by G. Hinton Started in the Eighties (Boltzmann machines) then developed in the following decades Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

99 Boltzmann Machines: binary-valued units bi-directional connections symmetric weight (equal in the two directions) general topology (feedback possible) Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

100 The restricted version has the limitation that its topology must be a bipartite graph This makes it more tractable Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

101 Energy v = [v i ] and h = [h i ] visible and hidden unit activation values, respectively w i,j weight between v i and h j a i and b i biases of visible and hidden units, respectively then we can define an "energy" E(v, h) = i a i v i j b j h j i v i w i,j h j j Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

102 Probability of states The probability of any possible network state is P (v, h) = 1 Z e E(v,h) with Z partition function (normalizer) Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

103 Probability of states Since intra-layer connections are not present, probability of activation of one unit does not depend on that of other units in the same layer only in the other layer P (v i = 1 h) = e (a i+ j w i,jh j ) P (h j = 1 v) = e (b j+ i w i,jv i ) Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

104 Training a RBM Algorithm called contrastive divergence Uses random sampling from the probabilities (computed as above): Apply one input Compute probability P (h v) - Sample from it to generate hidden configuration Compute a positive update step w + = vh T (outer product) Generate one possible input v from the hidden configuration Compute probability P (h v ) Compute a negative update step w = v h T Apply update: w w + η( w + w ) This does not optimize any explicit objective function!! Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

105 Training RBMs of large size is not simple There are tricks to make the task easier Example: weight sharing and convolutional neural networks These help with data having correlated inputs, as in image, video, speech, general time series. Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

106 Deep Belief Networks A DBN is a sequence of RBMs Each RBM can be trained independently of the following ones greedy strategy The last layer can be a classifier Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

107 Deep networks can be built out of RBMs, but also out of autoencoders Autoencoders are less insensitive to random noise Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

108 Neural networks: Why bothering? Deep learning achieved success in very complex tasks and won many competitions Example: extracting words from audio and transforming them in automatic subtitles (cfr. Youtube) Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

109 T H E E N D Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning

More information

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders

More information

UNSUPERVISED LEARNING

UNSUPERVISED LEARNING UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training

More information

RegML 2018 Class 8 Deep learning

RegML 2018 Class 8 Deep learning RegML 2018 Class 8 Deep learning Lorenzo Rosasco UNIGE-MIT-IIT June 18, 2018 Supervised vs unsupervised learning? So far we have been thinking of learning schemes made in two steps f(x) = w, Φ(x) F, x

More information

How to do backpropagation in a brain

How to do backpropagation in a brain How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer. University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x

More information

Machine Learning. Neural Networks

Machine Learning. Neural Networks Machine Learning Neural Networks Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 Biological Analogy Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 THE

More information

4. Multilayer Perceptrons

4. Multilayer Perceptrons 4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks. MGS Lecture 2 Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation

More information

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Speaker Representation and Verification Part II. by Vasileios Vasilakakis Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples Back-Propagation Algorithm Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples 1 Inner-product net =< w, x >= w x cos(θ) net = n i=1 w i x i A measure

More information

Artificial Neural Networks Examination, June 2004

Artificial Neural Networks Examination, June 2004 Artificial Neural Networks Examination, June 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Neural Networks Varun Chandola x x 5 Input Outline Contents February 2, 207 Extending Perceptrons 2 Multi Layered Perceptrons 2 2. Generalizing to Multiple Labels.................

More information

Artificial Neural Networks Examination, June 2005

Artificial Neural Networks Examination, June 2005 Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Deep unsupervised learning

Deep unsupervised learning Deep unsupervised learning Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea Unsupervised learning In machine learning, there are 3 kinds of learning paradigm.

More information

y(x n, w) t n 2. (1)

y(x n, w) t n 2. (1) Network training: Training a neural network involves determining the weight parameter vector w that minimizes a cost function. Given a training set comprising a set of input vector {x n }, n = 1,...N,

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Neural Networks biological neuron artificial neuron 1

Neural Networks biological neuron artificial neuron 1 Neural Networks biological neuron artificial neuron 1 A two-layer neural network Output layer (activation represents classification) Weighted connections Hidden layer ( internal representation ) Input

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Neural Networks and Deep Learning.

Neural Networks and Deep Learning. Neural Networks and Deep Learning www.cs.wisc.edu/~dpage/cs760/ 1 Goals for the lecture you should understand the following concepts perceptrons the perceptron training rule linear separability hidden

More information

Feedforward Neural Nets and Backpropagation

Feedforward Neural Nets and Backpropagation Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Neural networks. Chapter 20, Section 5 1

Neural networks. Chapter 20, Section 5 1 Neural networks Chapter 20, Section 5 Chapter 20, Section 5 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 20, Section 5 2 Brains 0 neurons of

More information

Lecture 4: Perceptrons and Multilayer Perceptrons

Lecture 4: Perceptrons and Multilayer Perceptrons Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons

More information

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable

More information

Deep Feedforward Networks. Seung-Hoon Na Chonbuk National University

Deep Feedforward Networks. Seung-Hoon Na Chonbuk National University Deep Feedforward Networks Seung-Hoon Na Chonbuk National University Neural Network: Types Feedforward neural networks (FNN) = Deep feedforward networks = multilayer perceptrons (MLP) No feedback connections

More information

Multilayer Perceptrons (MLPs)

Multilayer Perceptrons (MLPs) CSE 5526: Introduction to Neural Networks Multilayer Perceptrons (MLPs) 1 Motivation Multilayer networks are more powerful than singlelayer nets Example: XOR problem x 2 1 AND x o x 1 x 2 +1-1 o x x 1-1

More information

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.

More information

Neural networks. Chapter 19, Sections 1 5 1

Neural networks. Chapter 19, Sections 1 5 1 Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10

More information

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Jakob Verbeek & Daan Wynen 206-09-22 Jakob Verbeek & Daan Wynen Unsupervised Neural Networks Outline Autoencoders Restricted) Boltzmann

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28 1 / 28 Neural Networks Mark van Rossum School of Informatics, University of Edinburgh January 15, 2018 2 / 28 Goals: Understand how (recurrent) networks behave Find a way to teach networks to do a certain

More information

Deep Neural Networks

Deep Neural Networks Deep Neural Networks DT2118 Speech and Speaker Recognition Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 45 Outline State-to-Output Probability Model Artificial Neural Networks Perceptron Multi

More information

The XOR problem. Machine learning for vision. The XOR problem. The XOR problem. x 1 x 2. x 2. x 1. Fall Roland Memisevic

The XOR problem. Machine learning for vision. The XOR problem. The XOR problem. x 1 x 2. x 2. x 1. Fall Roland Memisevic The XOR problem Fall 2013 x 2 Lecture 9, February 25, 2015 x 1 The XOR problem The XOR problem x 1 x 2 x 2 x 1 (picture adapted from Bishop 2006) It s the features, stupid It s the features, stupid The

More information

Knowledge Extraction from DBNs for Images

Knowledge Extraction from DBNs for Images Knowledge Extraction from DBNs for Images Son N. Tran and Artur d Avila Garcez Department of Computer Science City University London Contents 1 Introduction 2 Knowledge Extraction from DBNs 3 Experimental

More information

Artificial Neural Networks Examination, March 2004

Artificial Neural Networks Examination, March 2004 Artificial Neural Networks Examination, March 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum

More information

Multilayer Perceptrons and Backpropagation

Multilayer Perceptrons and Backpropagation Multilayer Perceptrons and Backpropagation Informatics 1 CG: Lecture 7 Chris Lucas School of Informatics University of Edinburgh January 31, 2017 (Slides adapted from Mirella Lapata s.) 1 / 33 Reading:

More information

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Learning Deep Architectures for AI. Part II - Vijay Chakilam Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model

More information

Greedy Layer-Wise Training of Deep Networks

Greedy Layer-Wise Training of Deep Networks Greedy Layer-Wise Training of Deep Networks Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle NIPS 2007 Presented by Ahmed Hefny Story so far Deep neural nets are more expressive: Can learn

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

Unit III. A Survey of Neural Network Model

Unit III. A Survey of Neural Network Model Unit III A Survey of Neural Network Model 1 Single Layer Perceptron Perceptron the first adaptive network architecture was invented by Frank Rosenblatt in 1957. It can be used for the classification of

More information

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6 Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects

More information

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA   1/ 21 Neural Networks Chapter 8, Section 7 TB Artificial Intelligence Slides from AIMA http://aima.cs.berkeley.edu / 2 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural

More information

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs) Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w x + w 2 x 2 + w 0 = 0 Feature x 2 = w w 2 x w 0 w 2 Feature 2 A perceptron can separate

More information

Master Recherche IAC TC2: Apprentissage Statistique & Optimisation

Master Recherche IAC TC2: Apprentissage Statistique & Optimisation Master Recherche IAC TC2: Apprentissage Statistique & Optimisation Alexandre Allauzen Anne Auger Michèle Sebag LIMSI LRI Oct. 4th, 2012 This course Bio-inspired algorithms Classical Neural Nets History

More information

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs) Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w 1 x 1 + w 2 x 2 + w 0 = 0 Feature 1 x 2 = w 1 w 2 x 1 w 0 w 2 Feature 2 A perceptron

More information

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions BACK-PROPAGATION NETWORKS Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks Cannot approximate (learn) non-linear functions Difficult (if not impossible) to design

More information

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

More information

18.6 Regression and Classification with Linear Models

18.6 Regression and Classification with Linear Models 18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples

More information

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch Psychology 452 Week 12: Deep Learning What Is Deep Learning? Preliminary Ideas (that we already know!) The Restricted Boltzmann Machine (RBM) Many Layers of RBMs Pros and Cons of Deep Learning Course Structure

More information

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire

More information

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Neural networks. Chapter 20. Chapter 20 1

Neural networks. Chapter 20. Chapter 20 1 Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms

More information

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal

More information

Neural Networks (Part 1) Goals for the lecture

Neural Networks (Part 1) Goals for the lecture Neural Networks (Part ) Mark Craven and David Page Computer Sciences 760 Spring 208 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Threshold units Gradient descent Multilayer networks Backpropagation Hidden layer representations Example: Face Recognition Advanced topics 1 Connectionist Models Consider humans:

More information

A summary of Deep Learning without Poor Local Minima

A summary of Deep Learning without Poor Local Minima A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given

More information

Statistical NLP for the Web

Statistical NLP for the Web Statistical NLP for the Web Neural Networks, Deep Belief Networks Sameer Maskey Week 8, October 24, 2012 *some slides from Andrew Rosenberg Announcements Please ask HW2 related questions in courseworks

More information

CSC321 Lecture 5: Multilayer Perceptrons

CSC321 Lecture 5: Multilayer Perceptrons CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3

More information

Chapter 3 Supervised learning:

Chapter 3 Supervised learning: Chapter 3 Supervised learning: Multilayer Networks I Backpropagation Learning Architecture: Feedforward network of at least one layer of non-linear hidden nodes, e.g., # of layers L 2 (not counting the

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Introduction to Convolutional Neural Networks (CNNs)

Introduction to Convolutional Neural Networks (CNNs) Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei

More information

Deep Belief Networks are compact universal approximators

Deep Belief Networks are compact universal approximators 1 Deep Belief Networks are compact universal approximators Nicolas Le Roux 1, Yoshua Bengio 2 1 Microsoft Research Cambridge 2 University of Montreal Keywords: Deep Belief Networks, Universal Approximation

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 2012 Engineering Part IIB: Module 4F10 Introduction In

More information

Neural Networks. William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]

Neural Networks. William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ] Neural Networks William Cohen 10-601 [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ] WHAT ARE NEURAL NETWORKS? William s notation Logis;c regression + 1

More information

AI Programming CS F-20 Neural Networks

AI Programming CS F-20 Neural Networks AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco 20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols

More information

Feed-forward Network Functions

Feed-forward Network Functions Feed-forward Network Functions Sargur Srihari Topics 1. Extension of linear models 2. Feed-forward Network Functions 3. Weight-space symmetries 2 Recap of Linear Models Linear Models for Regression, Classification

More information

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017 COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines COMP9444 17s2 Boltzmann Machines 1 Outline Content Addressable Memory Hopfield Network Generative Models Boltzmann Machine Restricted Boltzmann

More information

An efficient way to learn deep generative models

An efficient way to learn deep generative models An efficient way to learn deep generative models Geoffrey Hinton Canadian Institute for Advanced Research & Department of Computer Science University of Toronto Joint work with: Ruslan Salakhutdinov, Yee-Whye

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Neural networks Daniel Hennes 21.01.2018 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Logistic regression Neural networks Perceptron

More information

Restricted Boltzmann Machines

Restricted Boltzmann Machines Restricted Boltzmann Machines Boltzmann Machine(BM) A Boltzmann machine extends a stochastic Hopfield network to include hidden units. It has binary (0 or 1) visible vector unit x and hidden (latent) vector

More information

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!! CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!! November 18, 2015 THE EXAM IS CLOSED BOOK. Once the exam has started, SORRY, NO TALKING!!! No, you can t even say see ya

More information

10. Artificial Neural Networks

10. Artificial Neural Networks Foundations of Machine Learning CentraleSupélec Fall 217 1. Artificial Neural Networks Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Learning

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement

More information

Deep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning Ian Goodfellow Last updated

Deep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning  Ian Goodfellow Last updated Deep Feedforward Networks Lecture slides for Chapter 6 of Deep Learning www.deeplearningbook.org Ian Goodfellow Last updated 2016-10-04 Roadmap Example: Learning XOR Gradient-Based Learning Hidden Units

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17 3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

Artificial Neural Network

Artificial Neural Network Artificial Neural Network Eung Je Woo Department of Biomedical Engineering Impedance Imaging Research Center (IIRC) Kyung Hee University Korea ejwoo@khu.ac.kr Neuron and Neuron Model McCulloch and Pitts

More information

Deep Neural Networks

Deep Neural Networks Universidad Autónoma de Madrid Escuela Politécnica Superior - Departamento de Ingeniería Informática Facultad de Ciencias - Departamento de Matemáticas Deep Neural Networks Master s thesis presented to

More information

PATTERN CLASSIFICATION

PATTERN CLASSIFICATION PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS

More information

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture

More information

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks Sections 18.6 and 18.7 Analysis of Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline Univariate regression

More information

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5 Artificial Neural Networks Q550: Models in Cognitive Science Lecture 5 "Intelligence is 10 million rules." --Doug Lenat The human brain has about 100 billion neurons. With an estimated average of one thousand

More information