Neural networks and optimization

Size: px
Start display at page:

Download "Neural networks and optimization"

Transcription

1 Neural networks and optimization Nicolas Le Roux INRIA 8 Nov 2011 Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

2 1 Introduction 2 Linear classifier 3 Convolutional neural networks 4 Stochastic gradient descent Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

3 Foreword I m here for you, I already know that stuff It s better to look silly than to stay so Ask questions if you don t understand! Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

4 Goal : classification and regression Medical imaging : cancer or not? Classification Autonomous driving : optimal wheel position Regression Kinect : where are the limbs? Regression OCR : what are the characters? Classification Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

5 Goal : classification and regression Medical imaging : cancer or not? Classification Autonomous driving : optimal wheel position Regression Kinect : where are the limbs? Regression OCR : what are the characters? Classification Regression and classification are similar problems Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

6 Goal : real-time object recognition Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

7 Linear classifier Dataset : ( X (i), Y (i)) pairs, i = 1,..., N. X (i) R n, Y (i) { 1, 1}. Goal : Find w and b such that sign(w X (i) + b) = Y (i). Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

8 Linear classifier Dataset : ( X (i), Y (i)) pairs, i = 1,..., N. X (i) R n, Y (i) { 1, 1}. Goal : Find w and b such that sign(w X (i) + b) = Y (i). Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

9 Perceptron algorithm (Rosenblatt, 57) w 0 = 0, b 0 = 0 Ŷ (i) = sign(w X (i) + b) w t+1 w t + ( i Y (i) Ŷ (i)) X (i) b t+1 b t + ( i Y (i) Ŷ (i)) Movie linearly_separable_perceptron.avi Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

10 Some data are not separable The Perceptron algorithm is NOT convergent for non linearly separable data. Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

11 Non convergence of the perceptron algorithm Movie non_linearly_separable_perceptron.avi We need an algorithm which works both on separable and non separable data. Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

12 Cost function Classification error is not smooth. Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

13 Cost function Classification error is not smooth. Sigmoid is smooth but not convex. Convexity guarantees the same solution every time. In practice, it is not always crucial. Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

14 Convex cost functions Classification error is not smooth. Sigmoid is smooth but not convex. Logistic loss is a convex upper bound. Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

15 Convex cost functions Classification error is not smooth. Sigmoid is smooth but not convex. Logistic loss is a convex upper bound. Hinge loss (SVMs) is very much like logistic. Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

16 Solving separable AND non-separable problems Movie non_linearly_separable_logistic.avi Movie linearly_separable_logistic.avi Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

17 Non-linear classification Movie non_linearly_separable_poly_kernel.avi Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

18 Non-linear classification Features : X 1, X 2 linear classifier Features : X 1, X 2, X 1 X 2, X1 2,... non-linear classifier Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

19 Choosing the features To make it work, I created lots of extra features : (X 1, X 2, X 1 X 2, X 2 1 X 2, X 1 X 2 2 )(1,2,3,...,10) Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

20 Choosing the features To make it work, I created lots of extra features : (X 1, X 2, X 1 X 2, X1 2X 2, X 1 X2 2)(1,2,3,...,10) Would it work with fewer features? Test with (X 1, X 2, X 1 X 2, X1 2X 2, X 1 X2 2)(1,2) Movie non_linearly_separable_poly_2.avi Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

21 A graphical view of the classifiers f (X) = w 1 X 1 + w 2 X 2 + b Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

22 A graphical view of the classifiers f (X) = w 1 X 1 + w 2 X 2 + w 3 X w 4 X w 5 X 1 X Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

23 Non-linear features A linear classifier on a non-linear transformation is non-linear. A non-linear classifier relies on non-linear features. Which ones do we choose? Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

24 Non-linear features A linear classifier on a non-linear transformation is non-linear. A non-linear classifier relies on non-linear features. Which ones do we choose? Example : H j = X p j 1 X q j 2 Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

25 Non-linear features A linear classifier on a non-linear transformation is non-linear. A non-linear classifier relies on non-linear features. Which ones do we choose? Example : H j = X p j 1 X q j 2 SVM : H j = K (X, X (j) ) with K some kernel function Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

26 Non-linear features A linear classifier on a non-linear transformation is non-linear. A non-linear classifier relies on non-linear features. Which ones do we choose? Example : H j = X p j 1 X q j 2 SVM : H j = K (X, X (j) ) with K some kernel function Do they have to be predefined? Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

27 Neural networks A neural network will learn the H j s Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

28 Neural networks A neural network will learn the H j s Usually, we use H j = g(v j X) Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

29 Neural networks A neural network will learn the H j s Usually, we use H j = g(v j X) H j : Hidden unit v j : Input weight g : Transfer function Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

30 Transfer function f (X) = j w j H j (X) + b = j w j g(vj X) + b g is the transfer function. Usually, g is the sigmoid or the tanh. Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

31 Neural networks f (X) = j w j H j (X) + b = j w j g(vj X) + b Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

32 Example on the non-separable problem Movie non_linearly_separable_mlp_3.avi Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

33 Training a neural network Dataset : ( X (i), Y (i)) pairs, i = 1,..., N. Goal : Find w and b such that sign ( w X (i) + b ) = Y (i) Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

34 Training a neural network Dataset : ( X (i), Y (i)) pairs, i = 1,..., N. Goal : Find w and b to minimize log ( 1 + exp ( Y ( (i) w X (i) + b ))) i Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

35 Training a neural network Dataset : ( X (i), Y (i)) pairs, i = 1,..., N. Goal : Find v 1,..., v k, w and b to minimize log 1 + exp Y (i) w j g ( vj i j X (i)) Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

36 Neural network - 8 hidden units Movie non_linearly_separable_mlp_8.avi Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

37 Neural network - 5 hidden units Movie non_linearly_separable_mlp_5.avi Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

38 Neural network - 3 hidden units Movie non_linearly_separable_mlp_3.avi Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

39 Neural network - 2 hidden units Movie non_linearly_separable_mlp_2.avi Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

40 Non-linear classification Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

41 Non-linear classification Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

42 Non-linear classification Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

43 Cost function s = cost function (logistic loss, hinge loss,...) ) l(v, w, b, X (i), Y (i) ) = s (Ŷ (i), Y (i) = s j w j H j (X (i) ), Y (i) = s j w j g ( v j X (i)), Y (i) Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

44 Backpropagation - Output weights s = cost function (logistic loss, hinge loss,...) Ŷ (i) = j w j H j (X (i) ) l(v, w, b, X (i), Y (i) ) w j = l(v, w, b, X (i), Y (i) ) Ŷ (i) Ŷ (i) w j = l(v, w, b, X (i), Y (i) ) Ŷ (i) H j (X (i) ) Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

45 Backpropagation - Input weights s = cost function (logistic loss, hinge loss,...) Ŷ (i) = j w j H j (X (i) ) H j (X (i) ) = g ( vj X (i)) l(v, w, b, X (i), Y (i) ) = l(v, w, b, X (i), Y (i) ) v j H j (X (i) ) = l(v, w, b, X (i), Y (i) ) Ŷ (i) = l(v, w, b, X (i), Y (i) ) Ŷ (i) H j (X (i) ) v j Ŷ (i) H j (X (i) ) H j (X (i) ) v j w j X (i) g ( vj X (i)) Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

46 Training neural networks - Summary For each datapoint, compute the gradient of the cost with respect to the weights. Done using the backpropagation of the gradient. Convex with respect to the output weights (linear classifier). NOT convex with respect to the input weights : POTENTIAL PROBLEMS! Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

47 Neural networks - Summary A linear classifier in a feature space can model non-linear boundaries. Finding a good feature space is essential. One can design the feature map by hand. One can learn the feature map, using fewer features than if it done by hand. Learning the feature map is potentially HARD (non-convexity). Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

48 Neural networks - Not summary Linear combination of the output of soft classifiers. This is a non-linear classifier. One can take a linear combination of these. Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

49 Neural networks - Not summary Linear combination of the output of soft classifiers. This is a non-linear classifier. One can take a linear combination of these. This becomes a neural network with two hidden layers. Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

50 Advantages of neural networks They can learn anything. Extremely fast at test time (computing the answer for a new datapoint) because fewer features. Complete control over the power of the network (by controlling the hidden layers sizes). Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

51 Problems of neural networks Highly non-convex many local minima Can learn anything but have more parameters need tons of examples to be good. Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

52 Take-home messages Neural networks are potentially extremely efficient. But it is HARD to train them! If you wish to use them, be smart (or ask someone who knows)! If you have a huge dataset, they CAN be awesome! Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

53 v j s for images f (X) = j w j H j (X) + b = j w j g(vj X) + b If X is an image, v j is an image too. v j acts as a filter (presence or absence of a pattern). What does v j look like? Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

54 v j s for images - Examples Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

55 v j s for images - Examples Filters are mostly local Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

56 Basic idea of convolutional neural networks Filters are mostly local. Instead of using image-wide filters, use small ones over patches. Repeat for every patch to get a response image. Subsample the response image to get local invariance. Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

57 Filtering - Filter 1 Original image Filter Output image Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

58 Filtering - Filter 2 Original image Filter Output image Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

59 Filtering - Filter 3 Original image Filter Output image Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

60 Pooling - Filter 1 Original image Output image Subsampled image How to do 2x subsampling-pooling : Output image = O, subsampled image = S. S ij = max k over window around (2i,2j) O k. Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

61 Pooling - Filter 2 Original image Output image Subsampled image How to do 2x subsampling-pooling : Output image = O, subsampled image = S. S ij = max k over window around (2i,2j) O k. Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

62 Pooling - Filter 3 Original image Output image Subsampled image How to do 2x subsampling-pooling : Output image = O, subsampled image = S. S ij = max k over window around (2i,2j) O k. Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

63 A convolutional layer Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

64 Transforming the data with a layer Original datapoint New datapoint Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

65 A convolutional network Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

66 Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

67 Face detection Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

68 Face detection Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

69 NORB dataset 50 toys belonging to 5 categories animal, human figure, airplane, truck, car 10 instance per category 5 instances used for training, 5 instances for testing Raw dataset 972 stereo pairs of each toy. 48,600 image pairs total. Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

70 NORB dataset - 2 For each instance : 18 azimuths 0 to 350 degrees every 20 degrees 9 elevations 30 to 70 degrees from horizontal every 5 degrees 6 illuminations on/off combinations of 4 lights 2 cameras (stereo), 7.5 cm apart 40 cm from the object Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

71 NORB dataset - 3 Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

72 Textured and cluttered versions Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

73 90,857 free parameters, 3,901,162 connections. The entire network is trained end-to-end (all the layers are trained simultaneously). Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

74 Normalized-Uniform set Method Error Linear Classifier on raw stereo images 30.2% K-Nearest-Neighbors on raw stereo images 18.4% K-Nearest-Neighbors on PCA % Pairwise SVM on 96x96 stereo images 11.6% Pairwise SVM on 95 Principal Components 13.3% Convolutional Net on 96x96 stereo images 5.8% Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

75 Jittered-Cluttered Dataset 291,600 stereo pairs for training, 58,320 for testing Objects are jittered position, scale, in-plane rotation, contrast, brightness, backgrounds, distractor objects,... Input dimension : 98x98x2 (approx 18,000) Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

76 Jittered-Cluttered Dataset - Results Method Error SVM with Gaussian kernel 43.3% Convolutional Net with binocular input 7.8% Convolutional Net + SVM on top 5.9% Convolutional Net with monocular input 20.8% Smaller mono net (DEMO) 26.0% Dataset available from yann Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

77 NORB recognition - 1 Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

78 NORB recognition - 2 Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

79 NORB recognition - 3 Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

80 NORB recognition - 4 Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

81 NORB recognition - 5 Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

82 NORB recognition - 6 Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

83 Summary With complex problems, it is hard to design features by hand. Neural networks circumvent this problem. They can be hard to train (again...). Convolutional neural networks use knowledge about locality in images. They are much easier than standard networks. And they are FAST (again...). Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

84 What has not been covered In some cases, we have lots of data, but without the labels. Unsupervised learning. There are techniques to use these data to get better performance. E.g. : Task-Driven Dictionary Learning, Mairal et al. Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

85 The need for fast learning Neural networks may need many examples (several millions or more). We need to be able to use them quickly. Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

86 Batch methods L(θ) = 1 l(θ, X (i), Y (i) ) N θ t+1 θ t α t N i i l(θ, X (i), Y (i) ) θ To compute one update of the parameters, we need to go through all the data. This can be very expensive. What if we have an infinite amount of data? Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

87 Potential solutions 1 Discard data. Seems stupid Yet many people do it Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

88 Potential solutions 1 Discard data. Seems stupid Yet many people do it 2 Use approximate methods. Update = average of the updates for all datapoints. Are these update really different? If not, how can we learn faster? Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

89 Stochastic gradient descent L(θ) = 1 l(θ, X (i), Y (i) ) N θ t+1 θ t α t N i i l(θ, X (i), Y (i) ) θ Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

90 Stochastic gradient descent L(θ) = 1 l(θ, X (i), Y (i) ) N i θ t+1 θ t α t l(θ, X (i t ), Y (i t ) ) θ Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

91 Stochastic gradient descent L(θ) = 1 l(θ, X (i), Y (i) ) N i θ t+1 θ t α t l(θ, X (i t ), Y (i t ) ) θ What do we lose when updating the parameters to satisfy just one example? Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

92 Disagreement µ 2 /σ 2 during optimization (log scale) As optimization progresses, disagreement increases Early on, one can pick one example at a time What about later? Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

93 Training vs test Standard learning paradigm : We want to solve a task on new datapoints. We have a training set. We hope that the performance on the training set is informative of the performance on new datapoints. Can we know when we start overfitting? Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

94 Overfitting When all gradients disagree, stochastic error stalls. When all gradients disagree, training and test error part. IT DOES NOT MATTER IF ONE DOES NOT REACH THE MINIMUM OF THE TRAINING ERROR! Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

95 Decomposition of the error E( f n ) E(f ) = E(f F) E(f ) Approximation error + E(f n ) E(f F) Estimation error + E( f n ) E(f n ) Optimization error Questions : 1 Do we optimize the training error to decrease E( f n ) E(f n )? 2 Do we increase n to decrease E(f n ) E(f F )? Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

96 Slide from Léon Botton Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

97 Slide from Léon Botton Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

98 Slide from Léon Botton Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

99 Slide from Léon Botton Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

100 Summary Stochastic methods update the parameters much more often than batch ones. Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

101 Summary Stochastic methods update the parameters much more often than batch ones. They are terrible to find the minimum of the training error. Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

102 Summary Stochastic methods update the parameters much more often than batch ones. They are terrible to find the minimum of the training error. It may not matter as the training error is not the test error. Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

103 Summary Stochastic methods update the parameters much more often than batch ones. They are terrible to find the minimum of the training error. It may not matter as the training error is not the test error. In practice : You will ALMOST ALWAYS have enough data. Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

104 Summary Stochastic methods update the parameters much more often than batch ones. They are terrible to find the minimum of the training error. It may not matter as the training error is not the test error. In practice : You will ALMOST ALWAYS have enough data. You will ALMOST ALWAYS lack time. Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

105 Summary Stochastic methods update the parameters much more often than batch ones. They are terrible to find the minimum of the training error. It may not matter as the training error is not the test error. In practice : You will ALMOST ALWAYS have enough data. You will ALMOST ALWAYS lack time. You must ALMOST ALWAYS use stochastic methods. Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

106 Summary Stochastic methods update the parameters much more often than batch ones. They are terrible to find the minimum of the training error. It may not matter as the training error is not the test error. In practice : You will ALMOST ALWAYS have enough data. You will ALMOST ALWAYS lack time. You must ALMOST ALWAYS use stochastic methods. How to use accelerated techniques remains to be seen. Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov / 80

Neural networks and optimization

Neural networks and optimization Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Oliver Schulte - CMPT 310 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will focus on

More information

Based on the original slides of Hung-yi Lee

Based on the original slides of Hung-yi Lee Based on the original slides of Hung-yi Lee Google Trends Deep learning obtains many exciting results. Can contribute to new Smart Services in the Context of the Internet of Things (IoT). IoT Services

More information

Kernelized Perceptron Support Vector Machines

Kernelized Perceptron Support Vector Machines Kernelized Perceptron Support Vector Machines Emily Fox University of Washington February 13, 2017 What is the perceptron optimizing? 1 The perceptron algorithm [Rosenblatt 58, 62] Classification setting:

More information

Introduction to Convolutional Neural Networks (CNNs)

Introduction to Convolutional Neural Networks (CNNs) Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei

More information

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire

More information

Convolutional Neural Networks

Convolutional Neural Networks Convolutional Neural Networks Books» http://www.deeplearningbook.org/ Books http://neuralnetworksanddeeplearning.com/.org/ reviews» http://www.deeplearningbook.org/contents/linear_algebra.html» http://www.deeplearningbook.org/contents/prob.html»

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17 3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural

More information

Nonlinear Classification

Nonlinear Classification Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions

More information

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions? Online Videos FERPA Sign waiver or sit on the sides or in the back Off camera question time before and after lecture Questions? Lecture 1, Slide 1 CS224d Deep NLP Lecture 4: Word Window Classification

More information

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x))

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard and Mitch Marcus (and lots original slides by

More information

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:

More information

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6 Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects

More information

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch. Neural Networks Oliver Schulte - CMPT 726 Bishop PRML Ch. 5 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of

More information

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer CSE446: Neural Networks Spring 2017 Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer Human Neurons Switching time ~ 0.001 second Number of neurons 10 10 Connections per neuron 10 4-5 Scene

More information

Loss Functions and Optimization. Lecture 3-1

Loss Functions and Optimization. Lecture 3-1 Lecture 3: Loss Functions and Optimization Lecture 3-1 Administrative Assignment 1 is released: http://cs231n.github.io/assignments2017/assignment1/ Due Thursday April 20, 11:59pm on Canvas (Extending

More information

Introduction to Neural Networks

Introduction to Neural Networks CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character

More information

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting

More information

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families Midterm Review Topics we covered Machine Learning Optimization Basics of optimization Convexity Unconstrained: GD, SGD Constrained: Lagrange, KKT Duality Linear Methods Perceptrons Support Vector Machines

More information

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler + Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Prof. Alexander Ihler Linear Classifiers (Perceptrons) Linear Classifiers a linear classifier is a mapping which partitions

More information

The Perceptron. Volker Tresp Summer 2016

The Perceptron. Volker Tresp Summer 2016 The Perceptron Volker Tresp Summer 2016 1 Elements in Learning Tasks Collection, cleaning and preprocessing of training data Definition of a class of learning models. Often defined by the free model parameters

More information

Loss Functions and Optimization. Lecture 3-1

Loss Functions and Optimization. Lecture 3-1 Lecture 3: Loss Functions and Optimization Lecture 3-1 Administrative: Live Questions We ll use Zoom to take questions from remote students live-streaming the lecture Check Piazza for instructions and

More information

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural Networks. Nicholas Ruozzi University of Texas at Dallas Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify

More information

CSC321 Lecture 4: Learning a Classifier

CSC321 Lecture 4: Learning a Classifier CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 28 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron

More information

SGD and Deep Learning

SGD and Deep Learning SGD and Deep Learning Subgradients Lets make the gradient cheating more formal. Recall that the gradient is the slope of the tangent. f(w 1 )+rf(w 1 ) (w w 1 ) Non differentiable case? w 1 Subgradients

More information

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.

More information

Midterm exam CS 189/289, Fall 2015

Midterm exam CS 189/289, Fall 2015 Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points

More information

Logistic Regression & Neural Networks

Logistic Regression & Neural Networks Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability

More information

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Neural Networks: Backpropagation

Neural Networks: Backpropagation Neural Networks: Backpropagation Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others

More information

Logistic Regression Logistic

Logistic Regression Logistic Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1 Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018 From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction

More information

CMSC 421: Neural Computation. Applications of Neural Networks

CMSC 421: Neural Computation. Applications of Neural Networks CMSC 42: Neural Computation definition synonyms neural networks artificial neural networks neural modeling connectionist models parallel distributed processing AI perspective Applications of Neural Networks

More information

SVMs, Duality and the Kernel Trick

SVMs, Duality and the Kernel Trick SVMs, Duality and the Kernel Trick Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 26 th, 2007 2005-2007 Carlos Guestrin 1 SVMs reminder 2005-2007 Carlos Guestrin 2 Today

More information

1 Machine Learning Concepts (16 points)

1 Machine Learning Concepts (16 points) CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions

More information

Optimization Methods Non-Linear/Non-Convex. Non-Linear/Non-Convex Learning Problems

Optimization Methods Non-Linear/Non-Convex. Non-Linear/Non-Convex Learning Problems Optimization Optimization Methods Methods for for Non-Linear/Non-Convex Non-Linear/Non-Convex Learning Learning Problems Problems Yann YannLeCun LeCun Courant CourantInstitute Institute http://yann.lecun.com

More information

Neural networks and support vector machines

Neural networks and support vector machines Neural netorks and support vector machines Perceptron Input x 1 Weights 1 x 2 x 3... x D 2 3 D Output: sgn( x + b) Can incorporate bias as component of the eight vector by alays including a feature ith

More information

Support vector machines Lecture 4

Support vector machines Lecture 4 Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The

More information

Feedforward Neural Nets and Backpropagation

Feedforward Neural Nets and Backpropagation Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features

More information

CSC321 Lecture 4: Learning a Classifier

CSC321 Lecture 4: Learning a Classifier CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 31 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron

More information

CS60021: Scalable Data Mining. Large Scale Machine Learning

CS60021: Scalable Data Mining. Large Scale Machine Learning J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Large Scale Machine Learning Sourangshu Bhattacharya Example: Spam filtering Instance

More information

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000

More information

Lecture 17: Neural Networks and Deep Learning

Lecture 17: Neural Networks and Deep Learning UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions

More information

ML4NLP Multiclass Classification

ML4NLP Multiclass Classification ML4NLP Multiclass Classification CS 590NLP Dan Goldwasser Purdue University dgoldwas@purdue.edu Social NLP Last week we discussed the speed-dates paper. Interesting perspective on NLP problems- Can we

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Neural Networks Varun Chandola x x 5 Input Outline Contents February 2, 207 Extending Perceptrons 2 Multi Layered Perceptrons 2 2. Generalizing to Multiple Labels.................

More information

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Artificial Neural Networks and Nonparametric Methods CMPSCI 383 Nov 17, 2011! Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011! 1 Todayʼs lecture" How the brain works (!)! Artificial neural networks! Perceptrons! Multilayer feed-forward networks! Error

More information

Machine Learning Basics

Machine Learning Basics Security and Fairness of Deep Learning Machine Learning Basics Anupam Datta CMU Spring 2019 Image Classification Image Classification Image classification pipeline Input: A training set of N images, each

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Convolutional Neural Networks. Srikumar Ramalingam

Convolutional Neural Networks. Srikumar Ramalingam Convolutional Neural Networks Srikumar Ramalingam Reference Many of the slides are prepared using the following resources: neuralnetworksanddeeplearning.com (mainly Chapter 6) http://cs231n.github.io/convolutional-networks/

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Multi-Layer Perceptrons Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

CSC321 Lecture 5: Multilayer Perceptrons

CSC321 Lecture 5: Multilayer Perceptrons CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3

More information

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009 AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is

More information

CSC242: Intro to AI. Lecture 21

CSC242: Intro to AI. Lecture 21 CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages

More information

Deep Learning for NLP

Deep Learning for NLP Deep Learning for NLP CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) Machine Learning and NLP NER WordNet Usually machine learning

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

CENG 793. On Machine Learning and Optimization. Sinan Kalkan

CENG 793. On Machine Learning and Optimization. Sinan Kalkan CENG 793 On Machine Learning and Optimization Sinan Kalkan 2 Now Introduction to ML Problem definition Classes of approaches K-NN Support Vector Machines Softmax classification / logistic regression Parzen

More information

y(x n, w) t n 2. (1)

y(x n, w) t n 2. (1) Network training: Training a neural network involves determining the weight parameter vector w that minimizes a cost function. Given a training set comprising a set of input vector {x n }, n = 1,...N,

More information

11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1)

11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1) 11/3/15 Machine Learning and NLP Deep Learning for NLP Usually machine learning works well because of human-designed representations and input features CS224N WordNet SRL Parser Machine learning becomes

More information

Natural Language Processing with Deep Learning CS224N/Ling284

Natural Language Processing with Deep Learning CS224N/Ling284 Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 4: Word Window Classification and Neural Networks Richard Socher Organization Main midterm: Feb 13 Alternative midterm: Friday Feb

More information

Reconnaissance d objetsd et vision artificielle

Reconnaissance d objetsd et vision artificielle Reconnaissance d objetsd et vision artificielle http://www.di.ens.fr/willow/teaching/recvis09 Lecture 6 Face recognition Face detection Neural nets Attention! Troisième exercice de programmation du le

More information

CSCI 315: Artificial Intelligence through Deep Learning

CSCI 315: Artificial Intelligence through Deep Learning CSCI 35: Artificial Intelligence through Deep Learning W&L Fall Term 27 Prof. Levy Convolutional Networks http://wernerstudio.typepad.com/.a/6ad83549adb53ef53629ccf97c-5wi Convolution: Convolution is

More information

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation. ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent

More information

ANLP Lecture 22 Lexical Semantics with Dense Vectors

ANLP Lecture 22 Lexical Semantics with Dense Vectors ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Neural networks Daniel Hennes 21.01.2018 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Logistic regression Neural networks Perceptron

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction

More information

Supervised Learning. George Konidaris

Supervised Learning. George Konidaris Supervised Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom Mitchell,

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Kaggle.

Kaggle. Administrivia Mini-project 2 due April 7, in class implement multi-class reductions, naive bayes, kernel perceptron, multi-class logistic regression and two layer neural networks training set: Project

More information

The Perceptron Algorithm

The Perceptron Algorithm The Perceptron Algorithm Greg Grudic Greg Grudic Machine Learning Questions? Greg Grudic Machine Learning 2 Binary Classification A binary classifier is a mapping from a set of d inputs to a single output

More information

Warm up: risk prediction with logistic regression

Warm up: risk prediction with logistic regression Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T

More information

17 Neural Networks NEURAL NETWORKS. x XOR 1. x Jonathan Richard Shewchuk

17 Neural Networks NEURAL NETWORKS. x XOR 1. x Jonathan Richard Shewchuk 94 Jonathan Richard Shewchuk 7 Neural Networks NEURAL NETWORKS Can do both classification & regression. [They tie together several ideas from the course: perceptrons, logistic regression, ensembles of

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

9 Classification. 9.1 Linear Classifiers

9 Classification. 9.1 Linear Classifiers 9 Classification This topic returns to prediction. Unlike linear regression where we were predicting a numeric value, in this case we are predicting a class: winner or loser, yes or no, rich or poor, positive

More information

Statistical NLP for the Web

Statistical NLP for the Web Statistical NLP for the Web Neural Networks, Deep Belief Networks Sameer Maskey Week 8, October 24, 2012 *some slides from Andrew Rosenberg Announcements Please ask HW2 related questions in courseworks

More information

Least Mean Squares Regression

Least Mean Squares Regression Least Mean Squares Regression Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture Overview Linear classifiers What functions do linear classifiers express? Least Squares Method

More information

The Perceptron. Volker Tresp Summer 2014

The Perceptron. Volker Tresp Summer 2014 The Perceptron Volker Tresp Summer 2014 1 Introduction One of the first serious learning machines Most important elements in learning tasks Collection and preprocessing of training data Definition of a

More information

Support Vector Machines

Support Vector Machines Two SVM tutorials linked in class website (please, read both): High-level presentation with applications (Hearst 1998) Detailed tutorial (Burges 1998) Support Vector Machines Machine Learning 10701/15781

More information

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs) Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w x + w 2 x 2 + w 0 = 0 Feature x 2 = w w 2 x w 0 w 2 Feature 2 A perceptron can separate

More information

Machine Learning. Boris

Machine Learning. Boris Machine Learning Boris Nadion boris@astrails.com @borisnadion @borisnadion boris@astrails.com astrails http://astrails.com awesome web and mobile apps since 2005 terms AI (artificial intelligence)

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)

More information

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function

More information

CSC 411 Lecture 17: Support Vector Machine

CSC 411 Lecture 17: Support Vector Machine CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Jakub Hajic Artificial Intelligence Seminar I

Jakub Hajic Artificial Intelligence Seminar I Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Regression with Numerical Optimization. Logistic

Regression with Numerical Optimization. Logistic CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204

More information

Neural Networks, Convexity, Kernels and Curses

Neural Networks, Convexity, Kernels and Curses Neural Networks, Convexity, Kernels and Curses Yoshua Bengio Work done with Nicolas Le Roux, Olivier Delalleau and Hugo Larochelle August 26th 2005 Perspective Curse of Dimensionality Most common non-parametric

More information