Neural networks and optimization

Size: px
Start display at page:

Download "Neural networks and optimization"

Transcription

1 Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85

2 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional neural networks Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 2 / 85

3 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional neural networks Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 3 / 85

4 Goal : classification and regression Medical imaging : cancer or not? Classification Autonomous driving : optimal wheel position Regression Kinect : where are the limbs? Regression OCR : what are the characters? Classification Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 4 / 85

5 Goal : classification and regression Medical imaging : cancer or not? Classification Autonomous driving : optimal wheel position Regression Kinect : where are the limbs? Regression OCR : what are the characters? Classification It also works for structured outputs Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 4 / 85

6 Object recognition Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 5 / 85

7 Object recognition Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 6 / 85

8 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional neural networks Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 7 / 85

9 Linear classifier Dataset : ( X (i), Y (i)) pairs, i = 1,..., N. X (i) R n, Y (i) { 1, 1}. Goal : Find w and b such that sign(w X (i) + b) = Y (i). Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 8 / 85

10 Linear classifier Dataset : ( X (i), Y (i)) pairs, i = 1,..., N. X (i) R n, Y (i) { 1, 1}. Goal : Find w and b such that sign(w X (i) + b) = Y (i). Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 8 / 85

11 Perceptron algorithm (Rosenblatt, 57) w 0 = 0, b 0 = 0 Ŷ (i) = sign(w X (i) + b) w t+1 w t + ( i Y (i) Ŷ (i)) X (i) b t+1 b t + ( i Y (i) Ŷ (i)) Linearly separable demo Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 9 / 85

12 Some data are not separable The Perceptron algorithm is NOT convergent for non linearly separable data. Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 10 / 85

13 Non convergence of the perceptron algorithm Non-linearly separable perceptron We need an algorithm which works both on separable and non separable data. Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 11 / 85

14 Classification error A good classifier minimizes l(w) = i = i l i ) l (Ŷ (i), Y (i) = i 1 sign(w X (i) +b) Y (i) Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 12 / 85

15 Cost function Classification error is not smooth. Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 13 / 85

16 Cost function Classification error is not smooth. Sigmoid is smooth but not convex. Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 13 / 85

17 Convex cost functions Classification error is not smooth. Sigmoid is smooth but not convex. Logistic loss is a convex upper bound. Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 14 / 85

18 Convex cost functions Classification error is not smooth. Sigmoid is smooth but not convex. Logistic loss is a convex upper bound. Hinge loss (SVMs) is very much like logistic. Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 14 / 85

19 Interlude Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 15 / 85

20 Convex function Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 16 / 85

21 Non-convex function Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 17 / 85

22 Not-convex-but-still-OK function Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 18 / 85

23 End of interlude Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 19 / 85

24 Solving separable AND non-separable problems Non-linearly separable logistic Linearly separable logistic Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 20 / 85

25 Non-linear classification Non-linearly separable polynomial Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 21 / 85

26 Non-linear classification Features : X 1, X 2 linear classifier Features : X 1, X 2, X 1 X 2, X1 2,... non-linear classifier Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 22 / 85

27 Choosing the features To make it work, I created lots of extra features : (X1 i X j 2 ) for i, j [0, 4] (p = 18) Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 23 / 85

28 Choosing the features To make it work, I created lots of extra features : (X1 i X j 2 ) for i, j [0, 4] (p = 18) Would it work with fewer features? Test with (X1 i X j 2 ) for i, j [1, 3] Non-linearly separable polynomial degree 2 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 23 / 85

29 A graphical view of the classifiers f (X) = w 1 X 1 + w 2 X 2 + b Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 24 / 85

30 A graphical view of the classifiers f (X) = w 1 X 1 + w 2 X 2 + w 3 X w 4 X w 5 X 1 X Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 24 / 85

31 Non-linear features A linear classifier on a non-linear transformation is non-linear. A non-linear classifier relies on non-linear features. Which ones do we choose? Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 25 / 85

32 Non-linear features A linear classifier on a non-linear transformation is non-linear. A non-linear classifier relies on non-linear features. Which ones do we choose? Example : H j = X p j 1 X q j 2 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 25 / 85

33 Non-linear features A linear classifier on a non-linear transformation is non-linear. A non-linear classifier relies on non-linear features. Which ones do we choose? Example : H j = X p j 1 X q j 2 SVM : H j = K (X, X (j) ) with K some kernel function Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 25 / 85

34 Non-linear features A linear classifier on a non-linear transformation is non-linear. A non-linear classifier relies on non-linear features. Which ones do we choose? Example : H j = X p j 1 X q j 2 SVM : H j = K (X, X (j) ) with K some kernel function Do they have to be predefined? Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 25 / 85

35 Non-linear features A linear classifier on a non-linear transformation is non-linear. A non-linear classifier relies on non-linear features. Which ones do we choose? Example : H j = X p j 1 X q j 2 SVM : H j = K (X, X (j) ) with K some kernel function Do they have to be predefined? A neural network will learn the H j s Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 25 / 85

36 A 3-step algorithm for a neural network 1 Pick an example x Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 26 / 85

37 A 3-step algorithm for a neural network 1 Pick an example x 2 Transform it in ˆx = Vx with some matrix V Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 26 / 85

38 A 3-step algorithm for a neural network 1 Pick an example x 2 Transform it in ˆx = Vx with some matrix V 3 Compute w ˆx + b Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 26 / 85

39 A 3-step algorithm for a neural network 1 Pick an example x 2 Transform it in ˆx = Vx with some matrix V 3 Compute w ˆx + b = w Vx + b = ŵx + b Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 27 / 85

40 A 3-step algorithm for a neural network 1 Pick an example x 2 Transform it in ˆx = Vx with some matrix V 3 Apply a nonlinear function g to all the elements of ˆx 4 Compute w ˆx + b = w g(vx) + b ŵx + b Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 28 / 85

41 Neural networks Usually, we use H j = g(v j X) Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 29 / 85

42 Neural networks Usually, we use H j = g(v j X) H j : Hidden unit v j : Input weight g : Transfer function Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 29 / 85

43 Transfer function f (X) = j w j H j (X) + b = j w j g(vj X) + b g is the transfer function. g used to be the sigmoid or the tanh. If g is the sigmoid, each hidden unit is a soft classifier. Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 30 / 85

44 Transfer function f (X) = j w j H j (X) + b = j w j g(vj X) + b g is the transfer function. g is now often the positive part. This is called a rectified linear unit. Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 31 / 85

45 Neural networks f (X) = j w j H j (X) + b = j w j g(vj X) + b Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 32 / 85

46 Example on the non-separable problem Non-linearly separable neural net 3 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 33 / 85

47 Training a neural network Dataset : ( X (i), Y (i)) pairs, i = 1,..., N. Goal : Find w and b such that sign ( w X (i) + b ) = Y (i) Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 34 / 85

48 Training a neural network Dataset : ( X (i), Y (i)) pairs, i = 1,..., N. Goal : Find w and b to minimize log ( 1 + exp ( Y ( (i) w X (i) + b ))) i Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 34 / 85

49 Training a neural network Dataset : ( X (i), Y (i)) pairs, i = 1,..., N. Goal : Find v 1,..., v k, w and b to minimize log 1 + exp Y (i) w j g ( vj i j X (i)) Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 34 / 85

50 Cost function s = cost function (logistic loss, hinge loss,...) ) l(v, w, b, X (i), Y (i) ) = s (Ŷ (i), Y (i) = s j w j H j (X (i) ), Y (i) = s j w j g ( v j X (i)), Y (i) Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 35 / 85

51 Cost function s = cost function (logistic loss, hinge loss,...) ) l(v, w, b, X (i), Y (i) ) = s (Ŷ (i), Y (i) = s j w j H j (X (i) ), Y (i) = s j w j g ( v j X (i)), Y (i) We can minimize it using gradient descent. Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 35 / 85

52 Finding a good transformation of x H j (x) = g(v j x) Is it a good H j? What can we say about it? Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 36 / 85

53 Finding a good transformation of x H j (x) = g(v j x) Is it a good H j? What can we say about it? It is theoretically enough (universal approximation) Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 36 / 85

54 Finding a good transformation of x H j (x) = g(v j x) Is it a good H j? What can we say about it? It is theoretically enough (universal approximation) That does not mean you should use it Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 36 / 85

55 Going deeper H j (x) = g(v j x) ˆx j = g(v j x) Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 37 / 85

56 Going deeper H j (x) = g(v j x) ˆx j = g(v j x) We can transform ˆx as well. Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 37 / 85

57 Going from 2 to 3 layers Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 38 / 85

58 Going from 2 to 3 layers We can have as many layers as we like Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 38 / 85

59 Going from 2 to 3 layers We can have as many layers as we like But it makes the optimization problem harder Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 38 / 85

60 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional neural networks Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 39 / 85

61 The need for fast learning Neural networks may need many examples (several millions or more). We need to be able to use them quickly. Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 40 / 85

62 Batch methods L(θ) = 1 l(θ, X (i), Y (i) ) N θ t+1 θ t α t N i i l(θ, X (i), Y (i) ) θ To compute one update of the parameters, we need to go through all the data. This can be very expensive. What if we have an infinite amount of data? Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 41 / 85

63 Potential solutions 1 Discard data. Seems stupid Yet many people do it Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 42 / 85

64 Potential solutions 1 Discard data. Seems stupid Yet many people do it 2 Use approximate methods. Update = average of the updates for all datapoints. Are these update really different? If not, how can we learn faster? Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 42 / 85

65 Stochastic gradient descent L(θ) = 1 l(θ, X (i), Y (i) ) N θ t+1 θ t α t N i i l(θ, X (i), Y (i) ) θ Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 43 / 85

66 Stochastic gradient descent L(θ) = 1 l(θ, X (i), Y (i) ) N i θ t+1 θ t α t l(θ, X (i t ), Y (i t ) ) θ Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 43 / 85

67 Stochastic gradient descent L(θ) = 1 l(θ, X (i), Y (i) ) N i θ t+1 θ t α t l(θ, X (i t ), Y (i t ) ) θ What do we lose when updating the parameters to satisfy just one example? Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 43 / 85

68 Disagreement µ 2 /σ 2 during optimization (log scale) As optimization progresses, disagreement increases Early on, one can pick one example at a time What about later? Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 44 / 85

69 Stochastic vs. deterministic methods Stochastic vs. batch methods Goal = best of both worlds: linearratewitho(1) iteration cost log(excess cost) stochastic deterministic time Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 45 / 85

70 Stochastic vs. deterministic methods Stochastic vs. batch methods Goal = best of both worlds: linearratewitho(1) iteration cost log(excess cost) stochastic deterministic time For non-convex problem, stochastic works best. Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 45 / 85

71 Understanding the loss function of deep networks Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 46 / 85

72 Understanding the loss function of deep networks Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 46 / 85

73 Understanding the loss function of deep networks Recent studies say most local minima are close to the optimum. Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 46 / 85

74 Regularizing deep networks Deep networks have many parameters How do we avoid overfitting? Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 47 / 85

75 Regularizing deep networks Deep networks have many parameters How do we avoid overfitting? Regularizing the parameters Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 47 / 85

76 Regularizing deep networks Deep networks have many parameters How do we avoid overfitting? Regularizing the parameters Limiting the number of units Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 47 / 85

77 Regularizing deep networks Deep networks have many parameters How do we avoid overfitting? Regularizing the parameters Limiting the number of units Dropout Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 47 / 85

78 Dropout Overfitting happens when each unit gets too specialized Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 48 / 85

79 Dropout Overfitting happens when each unit gets too specialized The core idea is to prevent each unit from relying too much on the others Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 48 / 85

80 Dropout Overfitting happens when each unit gets too specialized The core idea is to prevent each unit from relying too much on the others How do we do this? Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 48 / 85

81 Dropout - an illustration Figure taken from Dropout : A Simple Way to Prevent Neural Networks from Overfitting Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 49 / 85

82 Dropout - a conclusion By removing units at random, we force the others to be less specialized At test time, we use all the units Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 50 / 85

83 Dropout - a conclusion By removing units at random, we force the others to be less specialized At test time, we use all the units This is sometimes presented as an ensemble method. Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 50 / 85

84 Neural networks - Summary A linear classifier in a feature space can model non-linear boundaries. Finding a good feature space is essential. One can design the feature map by hand. One can learn the feature map, using fewer features than if it done by hand. Learning the feature map is potentially HARD (non-convexity). Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 51 / 85

85 Recipe for training a neural network Use rectified linear units Use dropout Use stochastic gradient descent Use 2000 units on each layer Try different number of layers Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 52 / 85

86 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional neural networks Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 53 / 85

87 v j s for images f (X) = j w j H j (X) + b = j w j g(vj X) + b If X is an image, v j is an image too. v j acts as a filter (presence or absence of a pattern). What does v j look like? Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 54 / 85

88 v j s for images - Examples Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 55 / 85

89 v j s for images - Examples Filters are mostly local Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 55 / 85

90 Basic idea of convolutional neural networks Filters are mostly local. Instead of using image-wide filters, use small ones over patches. Repeat for every patch to get a response image. Subsample the response image to get local invariance. Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 56 / 85

91 Filtering - Filter 1 Original image Filter Output image Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 57 / 85

92 Filtering - Filter 2 Original image Filter Output image Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 58 / 85

93 Filtering - Filter 3 Original image Filter Output image Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 59 / 85

94 Pooling - Filter 1 Original image Output image Subsampled image How to do 2x subsampling-pooling : Output image = O, subsampled image = S. S ij = max k over window around (2i,2j) O k. Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 60 / 85

95 Pooling - Filter 2 Original image Output image Subsampled image How to do 2x subsampling-pooling : Output image = O, subsampled image = S. S ij = max k over window around (2i,2j) O k. Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 61 / 85

96 Pooling - Filter 3 Original image Output image Subsampled image How to do 2x subsampling-pooling : Output image = O, subsampled image = S. S ij = max k over window around (2i,2j) O k. Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 62 / 85

97 A convolutional layer Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 63 / 85

98 Transforming the data with a layer Original datapoint New datapoint Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 64 / 85

99 A convolutional network Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 65 / 85

100 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 66 / 85

101 Face detection Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 67 / 85

102 Face detection Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 68 / 85

103 NORB dataset 50 toys belonging to 5 categories animal, human figure, airplane, truck, car 10 instance per category 5 instances used for training, 5 instances for testing Raw dataset 972 stereo pairs of each toy. 48,600 image pairs total. Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 69 / 85

104 NORB dataset - 2 For each instance : 18 azimuths 0 to 350 degrees every 20 degrees 9 elevations 30 to 70 degrees from horizontal every 5 degrees 6 illuminations on/off combinations of 4 lights 2 cameras (stereo), 7.5 cm apart 40 cm from the object Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 70 / 85

105 NORB dataset - 3 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 71 / 85

106 Textured and cluttered versions Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 72 / 85

107 90,857 free parameters, 3,901,162 connections. The entire network is trained end-to-end (all the layers are trained simultaneously). Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 73 / 85

108 Normalized-Uniform set Method Error Linear Classifier on raw stereo images 30.2% K-Nearest-Neighbors on raw stereo images 18.4% K-Nearest-Neighbors on PCA % Pairwise SVM on 96x96 stereo images 11.6% Pairwise SVM on 95 Principal Components 13.3% Convolutional Net on 96x96 stereo images 5.8% Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 74 / 85

109 Jittered-Cluttered Dataset 291,600 stereo pairs for training, 58,320 for testing Objects are jittered position, scale, in-plane rotation, contrast, brightness, backgrounds, distractor objects,... Input dimension : 98x98x2 (approx 18,000) Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 75 / 85

110 Jittered-Cluttered Dataset - Results Method Error SVM with Gaussian kernel 43.3% Convolutional Net with binocular input 7.8% Convolutional Net + SVM on top 5.9% Convolutional Net with monocular input 20.8% Smaller mono net (DEMO) 26.0% Dataset available from yann Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 76 / 85

111 NORB recognition - 1 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 77 / 85

112 NORB recognition - 2 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 78 / 85

113 ConvNet summary With complex problems, it is hard to design features by hand. Neural networks circumvent this problem. They can be hard to train (again...). Convolutional neural networks use knowledge about locality in images. They are much easier than standard networks. And they are FAST (again...). Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 79 / 85

114 What has not been covered In some cases, we have lots of data, but without the labels. Unsupervised learning. There are techniques to use these data to get better performance. E.g. : Task-Driven Dictionary Learning, Mairal et al. Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 80 / 85

115 Additional techniques currently used Local response normalization Data augmentation Batch gradient normalization Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 81 / 85

116 Adversarial examples ConvNets achieve stellar performance in object recognition Do they really understand what s going on in an image? Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 82 / 85

117 Adversarial examples ConvNets achieve stellar performance in object recognition Do they really understand what s going on in an image? Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 82 / 85

118 What I did not cover Algorithms for pretraining (RBMs, autoencoders) Generative models (DBMs) Recurrent neural networks for machine translation Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 83 / 85

119 Conclusions Neural networks are complex non-linear models Optimizing them is hard It is as much about theory as about engineering When properly tuned, they can perform wonders Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 84 / 85

120 Bibliography Deep Learning book, bengioy/dlbook/ Dropout : A Simple Way to Prevent Neural Networks from Overfitting by Srivistava et al. Intriguing properties of neural networks by Szegedy et al. ImageNet Classification with Deep Convolutional Neural Networks by Krizhevsky, Sutskever and Hinton Gradient-based learning applied to document recognition by LeCun et al. Qualitatively characterizing neural network optimization problems by Goodfellow and Vinyals Sequence to Sequence Learning with Neural Networks by Sutskever, Vinyals and Le Caffe, Hugo Larochelle s MOOC, https: // Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 85 / 85

Neural networks and optimization

Neural networks and optimization Neural networks and optimization Nicolas Le Roux INRIA 8 Nov 2011 Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov 2011 1 / 80 1 Introduction 2 Linear classifier 3 Convolutional neural networks

More information

Introduction to Convolutional Neural Networks (CNNs)

Introduction to Convolutional Neural Networks (CNNs) Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei

More information

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses

More information

SGD and Deep Learning

SGD and Deep Learning SGD and Deep Learning Subgradients Lets make the gradient cheating more formal. Recall that the gradient is the slope of the tangent. f(w 1 )+rf(w 1 ) (w w 1 ) Non differentiable case? w 1 Subgradients

More information

Jakub Hajic Artificial Intelligence Seminar I

Jakub Hajic Artificial Intelligence Seminar I Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Kernelized Perceptron Support Vector Machines

Kernelized Perceptron Support Vector Machines Kernelized Perceptron Support Vector Machines Emily Fox University of Washington February 13, 2017 What is the perceptron optimizing? 1 The perceptron algorithm [Rosenblatt 58, 62] Classification setting:

More information

Introduction to Neural Networks

Introduction to Neural Networks CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character

More information

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer CSE446: Neural Networks Spring 2017 Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer Human Neurons Switching time ~ 0.001 second Number of neurons 10 10 Connections per neuron 10 4-5 Scene

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Lecture 17: Neural Networks and Deep Learning

Lecture 17: Neural Networks and Deep Learning UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions

More information

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire

More information

Deep Learning: a gentle introduction

Deep Learning: a gentle introduction Deep Learning: a gentle introduction Jamal Atif jamal.atif@dauphine.fr PSL, Université Paris-Dauphine, LAMSADE February 8, 206 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 / Why

More information

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Neural Networks: A brief touch Yuejie Chi Department of Electrical and Computer Engineering Spring 2018 1/41 Outline

More information

Nonlinear Classification

Nonlinear Classification Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17 3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural

More information

CSC321 Lecture 5: Multilayer Perceptrons

CSC321 Lecture 5: Multilayer Perceptrons CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Neural networks Daniel Hennes 21.01.2018 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Logistic regression Neural networks Perceptron

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

Feature Design. Feature Design. Feature Design. & Deep Learning

Feature Design. Feature Design. Feature Design. & Deep Learning Artificial Intelligence and its applications Lecture 9 & Deep Learning Professor Daniel Yeung danyeung@ieee.org Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China Appropriately

More information

Based on the original slides of Hung-yi Lee

Based on the original slides of Hung-yi Lee Based on the original slides of Hung-yi Lee Google Trends Deep learning obtains many exciting results. Can contribute to new Smart Services in the Context of the Internet of Things (IoT). IoT Services

More information

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler + Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Prof. Alexander Ihler Linear Classifiers (Perceptrons) Linear Classifiers a linear classifier is a mapping which partitions

More information

Introduction to Deep Neural Networks

Introduction to Deep Neural Networks Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic

More information

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting

More information

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Learning Deep Architectures for AI. Part II - Vijay Chakilam Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Logistic Regression Logistic

Logistic Regression Logistic Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,

More information

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6 Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Oliver Schulte - CMPT 310 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will focus on

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right

More information

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 Machine Learning for Signal Processing Neural Networks Continue Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 1 So what are neural networks?? Voice signal N.Net Transcription Image N.Net Text

More information

Regularization in Neural Networks

Regularization in Neural Networks Regularization in Neural Networks Sargur Srihari 1 Topics in Neural Network Regularization What is regularization? Methods 1. Determining optimal number of hidden units 2. Use of regularizer in error function

More information

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group Machine Learning for Computer Vision 8. Neural Networks and Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group INTRODUCTION Nonlinear Coordinate Transformation http://cs.stanford.edu/people/karpathy/convnetjs/

More information

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu Convolutional Neural Networks II Slides from Dr. Vlad Morariu 1 Optimization Example of optimization progress while training a neural network. (Loss over mini-batches goes down over time.) 2 Learning rate

More information

Loss Functions and Optimization. Lecture 3-1

Loss Functions and Optimization. Lecture 3-1 Lecture 3: Loss Functions and Optimization Lecture 3-1 Administrative Assignment 1 is released: http://cs231n.github.io/assignments2017/assignment1/ Due Thursday April 20, 11:59pm on Canvas (Extending

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x))

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard and Mitch Marcus (and lots original slides by

More information

CSC242: Intro to AI. Lecture 21

CSC242: Intro to AI. Lecture 21 CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages

More information

Introduction to Deep Learning

Introduction to Deep Learning Introduction to Deep Learning A. G. Schwing & S. Fidler University of Toronto, 2015 A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 1 / 39 Outline 1 Universality of Neural Networks

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

CSC321 Lecture 9: Generalization

CSC321 Lecture 9: Generalization CSC321 Lecture 9: Generalization Roger Grosse Roger Grosse CSC321 Lecture 9: Generalization 1 / 26 Overview We ve focused so far on how to optimize neural nets how to get them to make good predictions

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Classification of Hand-Written Digits Using Scattering Convolutional Network

Classification of Hand-Written Digits Using Scattering Convolutional Network Mid-year Progress Report Classification of Hand-Written Digits Using Scattering Convolutional Network Dongmian Zou Advisor: Professor Radu Balan Co-Advisor: Dr. Maneesh Singh (SRI) Background Overview

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction

More information

CSC321 Lecture 9: Generalization

CSC321 Lecture 9: Generalization CSC321 Lecture 9: Generalization Roger Grosse Roger Grosse CSC321 Lecture 9: Generalization 1 / 27 Overview We ve focused so far on how to optimize neural nets how to get them to make good predictions

More information

Neural networks and support vector machines

Neural networks and support vector machines Neural netorks and support vector machines Perceptron Input x 1 Weights 1 x 2 x 3... x D 2 3 D Output: sgn( x + b) Can incorporate bias as component of the eight vector by alays including a feature ith

More information

Neural Networks and Deep Learning.

Neural Networks and Deep Learning. Neural Networks and Deep Learning www.cs.wisc.edu/~dpage/cs760/ 1 Goals for the lecture you should understand the following concepts perceptrons the perceptron training rule linear separability hidden

More information

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural Networks. Nicholas Ruozzi University of Texas at Dallas Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify

More information

1 What a Neural Network Computes

1 What a Neural Network Computes Neural Networks 1 What a Neural Network Computes To begin with, we will discuss fully connected feed-forward neural networks, also known as multilayer perceptrons. A feedforward neural network consists

More information

Neural Networks. Learning and Computer Vision Prof. Olga Veksler CS9840. Lecture 10

Neural Networks. Learning and Computer Vision Prof. Olga Veksler CS9840. Lecture 10 CS9840 Learning and Computer Vision Prof. Olga Veksler Lecture 0 Neural Networks Many slides are from Andrew NG, Yann LeCun, Geoffry Hinton, Abin - Roozgard Outline Short Intro Perceptron ( layer NN) Multilayer

More information

SVMs, Duality and the Kernel Trick

SVMs, Duality and the Kernel Trick SVMs, Duality and the Kernel Trick Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 26 th, 2007 2005-2007 Carlos Guestrin 1 SVMs reminder 2005-2007 Carlos Guestrin 2 Today

More information

Internal Covariate Shift Batch Normalization Implementation Experiments. Batch Normalization. Devin Willmott. University of Kentucky.

Internal Covariate Shift Batch Normalization Implementation Experiments. Batch Normalization. Devin Willmott. University of Kentucky. Batch Normalization Devin Willmott University of Kentucky October 23, 2017 Overview 1 Internal Covariate Shift 2 Batch Normalization 3 Implementation 4 Experiments Covariate Shift Suppose we have two distributions,

More information

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000

More information

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition

More information

Lecture 14: Deep Generative Learning

Lecture 14: Deep Generative Learning Generative Modeling CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 14: Deep Generative Learning Density estimation Reconstructing probability density function using samples Bohyung Han

More information

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Neural Networks Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Outline Part 1 Introduction Feedforward Neural Networks Stochastic Gradient Descent Computational Graph

More information

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of

More information

Deep Learning (CNNs)

Deep Learning (CNNs) 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deep Learning (CNNs) Deep Learning Readings: Murphy 28 Bishop - - HTF - - Mitchell

More information

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!! CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!! November 18, 2015 THE EXAM IS CLOSED BOOK. Once the exam has started, SORRY, NO TALKING!!! No, you can t even say see ya

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Midterm exam CS 189/289, Fall 2015

Midterm exam CS 189/289, Fall 2015 Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points

More information

Convolutional Neural Networks

Convolutional Neural Networks Convolutional Neural Networks Books» http://www.deeplearningbook.org/ Books http://neuralnetworksanddeeplearning.com/.org/ reviews» http://www.deeplearningbook.org/contents/linear_algebra.html» http://www.deeplearningbook.org/contents/prob.html»

More information

RegML 2018 Class 8 Deep learning

RegML 2018 Class 8 Deep learning RegML 2018 Class 8 Deep learning Lorenzo Rosasco UNIGE-MIT-IIT June 18, 2018 Supervised vs unsupervised learning? So far we have been thinking of learning schemes made in two steps f(x) = w, Φ(x) F, x

More information

Lecture 12. Talk Announcement. Neural Networks. This Lecture: Advanced Machine Learning. Recap: Generalized Linear Discriminants

Lecture 12. Talk Announcement. Neural Networks. This Lecture: Advanced Machine Learning. Recap: Generalized Linear Discriminants Advanced Machine Learning Lecture 2 Neural Networks 24..206 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de Talk Announcement Yann LeCun (NYU & FaceBook AI) 28..

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Convolutional Neural Networks. Srikumar Ramalingam

Convolutional Neural Networks. Srikumar Ramalingam Convolutional Neural Networks Srikumar Ramalingam Reference Many of the slides are prepared using the following resources: neuralnetworksanddeeplearning.com (mainly Chapter 6) http://cs231n.github.io/convolutional-networks/

More information

Regression with Numerical Optimization. Logistic

Regression with Numerical Optimization. Logistic CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Loss Functions and Optimization. Lecture 3-1

Loss Functions and Optimization. Lecture 3-1 Lecture 3: Loss Functions and Optimization Lecture 3-1 Administrative: Live Questions We ll use Zoom to take questions from remote students live-streaming the lecture Check Piazza for instructions and

More information

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Lecture 5 Neural models for NLP

Lecture 5 Neural models for NLP CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm

More information

CSCI 315: Artificial Intelligence through Deep Learning

CSCI 315: Artificial Intelligence through Deep Learning CSCI 35: Artificial Intelligence through Deep Learning W&L Fall Term 27 Prof. Levy Convolutional Networks http://wernerstudio.typepad.com/.a/6ad83549adb53ef53629ccf97c-5wi Convolution: Convolution is

More information

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders

More information

Introduction to Deep Learning

Introduction to Deep Learning Introduction to Deep Learning A. G. Schwing & S. Fidler University of Toronto, 2014 A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2014 1 / 35 Outline 1 Universality of Neural Networks

More information

Lecture 35: Optimization and Neural Nets

Lecture 35: Optimization and Neural Nets Lecture 35: Optimization and Neural Nets CS 4670/5670 Sean Bell DeepDream [Google, Inceptionism: Going Deeper into Neural Networks, blog 2015] Aside: CNN vs ConvNet Note: There are many papers that use

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Feed-forward Network Functions

Feed-forward Network Functions Feed-forward Network Functions Sargur Srihari Topics 1. Extension of linear models 2. Feed-forward Network Functions 3. Weight-space symmetries 2 Recap of Linear Models Linear Models for Regression, Classification

More information

Support Vector Machines

Support Vector Machines Support Vector Machines INFO-4604, Applied Machine Learning University of Colorado Boulder September 28, 2017 Prof. Michael Paul Today Two important concepts: Margins Kernels Large Margin Classification

More information

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo

More information

Comparison of Modern Stochastic Optimization Algorithms

Comparison of Modern Stochastic Optimization Algorithms Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,

More information

CSC321 Lecture 4: Learning a Classifier

CSC321 Lecture 4: Learning a Classifier CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 28 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

The Perceptron Algorithm

The Perceptron Algorithm The Perceptron Algorithm Greg Grudic Greg Grudic Machine Learning Questions? Greg Grudic Machine Learning 2 Binary Classification A binary classifier is a mapping from a set of d inputs to a single output

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE March 28, 2012 The exam is closed book. You are allowed a double sided one page cheat sheet. Answer the questions in the spaces provided on

More information

Improved Local Coordinate Coding using Local Tangents

Improved Local Coordinate Coding using Local Tangents Improved Local Coordinate Coding using Local Tangents Kai Yu NEC Laboratories America, 10081 N. Wolfe Road, Cupertino, CA 95129 Tong Zhang Rutgers University, 110 Frelinghuysen Road, Piscataway, NJ 08854

More information

The Perceptron. Volker Tresp Summer 2016

The Perceptron. Volker Tresp Summer 2016 The Perceptron Volker Tresp Summer 2016 1 Elements in Learning Tasks Collection, cleaning and preprocessing of training data Definition of a class of learning models. Often defined by the free model parameters

More information

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families Midterm Review Topics we covered Machine Learning Optimization Basics of optimization Convexity Unconstrained: GD, SGD Constrained: Lagrange, KKT Duality Linear Methods Perceptrons Support Vector Machines

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

Neural Networks: Optimization & Regularization

Neural Networks: Optimization & Regularization Neural Networks: Optimization & Regularization Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) NN Opt & Reg

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Deep Learning Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein, Pieter Abbeel, Anca Dragan for CS188 Intro to AI at UC Berkeley.

More information

CSC321 Lecture 4: Learning a Classifier

CSC321 Lecture 4: Learning a Classifier CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 31 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron

More information

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions? Online Videos FERPA Sign waiver or sit on the sides or in the back Off camera question time before and after lecture Questions? Lecture 1, Slide 1 CS224d Deep NLP Lecture 4: Word Window Classification

More information

1 Machine Learning Concepts (16 points)

1 Machine Learning Concepts (16 points) CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions

More information