Introduction to CNN and PyTorch

Size: px
Start display at page:

Download "Introduction to CNN and PyTorch"

Transcription

1 Introduction to CNN and PyTorch Kripasindhu Sarkar Kaiserslautern University, DFKI Deutsches Forschungszentrum für Künstliche Intelligenz Some of the contents are taken from Stanford CS231n Introduction to CNN and PyTorch - Kripasindhu Sarkar - May 2018

2 Contents Machine learning fundamentals Convolutional Neural Networks Models, parameters, scores Loss function Back-propagation Convolution filters Convolution operator CNN architecture Popular CNN architectures General training principles: transfer learning/model initializations/optimization strategies Introduction to PyTorch General overview Examples from Official documentation Introduction to CNN and PyTorch - Kripasindhu Sarkar - May 2018

3 Machine learning - Classification Given an image predict the label Introduction to CNN and PyTorch - Kripasindhu Sarkar - May 2018

4 Machine learning - Classification Classifier (magic) label cat input Introduction to CNN and PyTorch - Kripasindhu Sarkar - May 2018

5 Machine learning - Classification Train Classifier (Learn W) Learn: Model f() with parameters W Predict Classifier (Use W) label cat input Introduction to CNN and PyTorch - Kripasindhu Sarkar - May 2018

6 Machine learning - Classification Model/Score function - F(X, W) Takes input: data sample - X and parameters - W W - internal parameters or weights Maps input data X to class scores More score for a class - more likely it belongs to that class N classes - N different scores Predicted class - class with max score? Loss function L (F, data samples) Data samples - { (Xi, yi) } Measures how good is F Higher loss - bad model Lower loss - good model Introduction to CNN and PyTorch - Kripasindhu Sarkar - May 2018

7 Parametric Approach Image f(x,w) Array of 32x32x3 numbers (3072 numbers total) 10 numbers giving class scores W parameters or weights Lecture 2-7 April 5, 2018

8 Parametric Approach: Linear Classifier Image f(x,w) = Wx f(x,w) Array of 32x32x3 numbers (3072 numbers total) 10 numbers giving class scores W parameters or weights Lecture 2-8 April 5, 2018

9 Parametric Approach: Linear Classifier Image f(x,w) = Wx +3072x1 b 10x1 10x1 10x3072 f(x,w) Array of 32x32x3 numbers (3072 numbers total) 10 numbers giving class scores W parameters or weights Lecture 2-9 April 5, 2018

10 Example with an image with 4 pixels, and 3 classes (cat/dog/ship) Stretch pixels into column Input image Lecture 2-10 April 5, 2018

11 Example with an image with 4 pixels, and 3 classes (cat/dog/ship) Stretch pixels into column Input image Cat score Dog score Ship score = W b Lecture 2-11 April 5, 2018

12 Machine learning - Classification Model/Score function - F(X, W) Takes input: data sample - X and parameters - W W - internal parameters or weights Maps input data X to class scores More score for a class - more likely it belongs to that class N classes - N different scores Predicted class - class with max score? Loss function L (F, data samples) Data samples - { (Xi, yi) } Measures how good is F Higher loss - bad model Lower loss - good model Introduction to CNN and PyTorch - Kripasindhu Sarkar - May 2018

13 Suppose: 3 training examples, 3 classes. are: With some W the scores A loss function tells how good our current classifier is Given a dataset of examples cat car frog Where is image and is (integer) label Loss over the dataset is a sum of loss over examples: Lecture 3-13 April 10, 2018

14 Suppose: 3 training examples, 3 classes. are: With some W the scores Multiclass SVM loss: Given an example is the image and where where is the (integer) label, and using the shorthand for the scores vector: cat car frog the SVM loss has the form: Lecture 3-14 April 10, 2018

15 Suppose: 3 training examples, 3 classes. are: With some W the scores Multiclass SVM loss: Given an example Hinge loss where is the image and where is the (integer) label, cat car frog and using the shorthand for the scores vector: the SVM loss has the form: Lecture 3-15 April 10, 2018

16 E.g. Suppose that we found a W such that L = 0. Is this W unique? No! 2W is also has L = 0! How do we choose between W and 2W? Lecture 3-16 April 10, 2018

17 Regularization Data loss: Model predictions should match training data = regularization strength (hyperparameter) Regularization: Prevent the model from doing too well on training data Simple examples L2 regularization: L1 regularization: More complex: Dropout Batch normalization Stochastic depth, fractional pooling, etc Elastic net (L1 + L2): Lecture 3-17 April 10, 2018

18 Softmax Classifier (Multinomial Logistic Regression) Want to interpret raw classifier scores as probabilities Softmax Function Probabilities must be >= 0 cat car frog exp Probabilities must sum to 1 normalize unnormalized probabilities probabilities Lecture 3-18 April 10, 2018

19 Softmax Classifier (Multinomial Logistic Regression) Want to interpret raw classifier scores as probabilities Softmax Function Probabilities must be >= 0 cat car frog Unnormalized log-probabilities / logits exp Probabilities must sum to 1 normalize unnormalized probabilities Li = -log(0.13) = 0.89 Cross entropy loss probabilities Lecture 3-19 April 10, 2018

20 Machine learning - Classification Model/Score function - F(X, W) Takes input: data sample - X and parameters - W W - internal parameters or weights Maps input data X to class scores More score for a class - more likely it belongs to that class N classes - N different scores Predicted class - class with max score? Loss function L (F, data samples) Data samples - { (Xi, yi) } Measures how good F is Higher loss - bad model Lower loss - good model Introduction to CNN and PyTorch - Kripasindhu Sarkar - May 2018

21 Machine learning - Classification Model/Score function - F(X, W) Takes input: data sample - X and parameters - W W - internal parameters or weights Maps input data X to class scores More score for a class - more likely it belongs to that class N classes - N different scores Predicted class - class with max score? Loss function L (F, data samples) Data samples - { (Xi, yi) } Measures how good F is Higher loss - bad model Lower loss - good model Determine the parameters W that give the lowest loss! Introduction to CNN and PyTorch - Kripasindhu Sarkar - May 2018

22 Optimization Find W that minimizes the loss function L(X, W). Random search? Follow the -ve gradient Gradient of f(x): f(x) - f(x) : direction of the steepest descend The direction along which the function decreases the maximum amount Introduction to CNN and PyTorch - Kripasindhu Sarkar - May 2018

23 Optimization Minimize the loss function L(W) wrt. W Find the gradient of L: Update W as W = W - h x h = step size/learning rate Do this iteratively Introduction to CNN and PyTorch - Kripasindhu Sarkar - May 2018

24 Optimization Find W that minimizes the loss function. W_2 original W negative gradient direction W_1 April 10, Introduction to CNN and PyTorch - Kripasindhu Sarkar - May

25 Optimization - Backpropagation A systematic method of computing gradient of a function f(x) Writing compound expression of the function f(x) as a computational graph. Recursively applying chain rule. Often called backward phase of the network. Introduction to CNN and PyTorch - Kripasindhu Sarkar - May 2018

26 Backpropagation: a simple example e.g. x = -2, y = 5, z = -4 Want: Lecture 4-26 April 13, 2017

27 Backpropagation: a simple example e.g. x = -2, y = 5, z = -4 Want: Lecture 4-27 April 13, 2017

28 Backpropagation: a simple example e.g. x = -2, y = 5, z = -4 Want: Lecture 4-28 April 13, 2017

29 Backpropagation: a simple example e.g. x = -2, y = 5, z = -4 Want: Lecture 4-29 April 13, 2017

30 Backpropagation: a simple example e.g. x = -2, y = 5, z = -4 Want: Lecture 4-30 April 13, 2017

31 Backpropagation: a simple example e.g. x = -2, y = 5, z = -4 Want: Lecture 4-31 April 13, 2017

32 Backpropagation: a simple example e.g. x = -2, y = 5, z = -4 Want: Lecture 4-32 April 13, 2017

33 Backpropagation: a simple example e.g. x = -2, y = 5, z = -4 Chain rule: Want: Upstream gradient Lecture 4-33 Local gradient April 13, 2017

34 Backpropagation: a simple example e.g. x = -2, y = 5, z = -4 Chain rule: Want: Upstream gradient Lecture 4-34 Local gradient April 13, 2017

35 Backpropagation: a simple example e.g. x = -2, y = 5, z = -4 Chain rule: Want: Upstream gradient Lecture 4-35 Local gradient April 13, 2017

36 Backpropagation: a simple example e.g. x = -2, y = 5, z = -4 Chain rule: Want: Upstream gradient Lecture 4-36 Local gradient April 13, 2017

37 local gradient f gradients Lecture 4-37 April 12, 2018

38 Machine learning - Summary Model/Score function - F(X, W) Loss function L (F, data samples) Maps input data X to class scores More score for a class - more likely it belongs to that class Measures how good F is Good parameters are found by - minimizing loss function L(W) wrt the variables W The function is minimized by iteratively updating the weights in the negative gradient direction. Gradients are computed using backpropagation. Introduction to CNN and PyTorch - Kripasindhu Sarkar - May 2018

39 Machine learning - Examples Models - F(X, W) Loss functions Linear Fully Connected (FC), multilayered FC CNNs SVM Cross Entropy Euclidean/L2 (mostly for regression) Optimization strategy Gradient descent ADAM, RMSPROP Introduction to CNN and PyTorch - Kripasindhu Sarkar - May 2018

40 Machine learning - Examples Models - F(X, W) Linear Fully Connected (FC), multilayered FC Loss functions CNNs SVM Cross Entropy Euclidean/L2 (mostly for regression) Optimization strategy Gradient descent ADAM, RMSPROP Introduction to CNN and PyTorch - Kripasindhu Sarkar - May 2018

41 Neural networks: without the brain stuff (Before) Linear score function: (Now) 2-layer Neural Network x 3072 W1 h 100 W2 s 10 Lecture 4-41 April 12, 2018

42 Impulses carried toward cell body dendrite presynaptic terminal axon cell body Impulses carried away from cell body This image by Felipe Perucho is licensed under CC-BY 3.0 sigmoid activation function Lecture 4-42 April 12, 2018

43 Convolutional Neural Networks Lecture 5-43 April 17, 2018

44 Fully Connected Layer 32x32x3 image -> stretch to 3072 x 1 input activation 10 x 3072 weights 1 10 Lecture 5-44 April 17, 2018

45 Fully Connected Layer 32x32x3 image -> stretch to 3072 x 1 input activation 10 x 3072 weights number: the result of taking a dot product between a row of W and the input (a 3072-dimensional dot product) Lecture 5-45 April 17, 2018

46 Convolution Layer 32x32x3 image -> preserve spatial structure 32 height 32 width 3 depth Lecture 5-46 April 17, 2018

47 Convolution Layer 32x32x3 image 5x5x3 filter 32 Convolve the filter with the image i.e. slide over the image spatially, computing dot products 32 3 Lecture 5-47 April 17, 2018

48 Convolution Layer Filters always extend the full depth of the input volume 32x32x3 image 5x5x3 filter 32 Convolve the filter with the image i.e. slide over the image spatially, computing dot products 32 3 Lecture 5-48 April 17, 2018

49 Convolution Layer 32x32x3 image 5x5x3 filter number: the result of taking a dot product between the filter and a small 5x5x3 chunk of the image (i.e. 5*5*3 = 75-dimensional dot product + bias) 3 Lecture 5-49 April 17, 2018

50 Convolution Layer activation map 32x32x3 image 5x5x3 filter convolve (slide) over all spatial locations Lecture 5-50 April 17, 2018

51 Convolution Layer 32 consider a second, green filter activation maps 32x32x3 image 5x5x3 filter 28 convolve (slide) over all spatial locations Lecture 5-33 April 17, 2018

52 For example, if we had 6 5x5 filters, we ll get 6 separate activation maps: activation maps Convolution Layer We stack these up to get a new image of size 28x28x6! Lecture 5-52 April 17, 2018

53 Preview: ConvNet is a sequence of Convolution Layers, interspersed with activation functions CONV, ReLU e.g. 6 5x5x3 filters 28 6 Lecture 5-53 April 17, 2018

54 Preview: ConvNet is a sequence of Convolution Layers, interspersed with activation functions CONV, ReLU e.g. 6 5x5x3 filters CONV, ReLU e.g. 10 5x5x6 filters 6 CONV, ReLU Lecture 5-54 April 17, 2018

55 Preview [Zeiler and Fergus 2013] Lecture 5-55 April 17, 2018

56 A closer look at spatial dimensions: 7 7x7 input (spatially) assume 3x3 filter 7 Lecture 5-56 April 17, 2018

57 A closer look at spatial dimensions: 7 7x7 input (spatially) assume 3x3 filter 7 Lecture 5-57 April 17, 2018

58 A closer look at spatial dimensions: 7 7x7 input (spatially) assume 3x3 filter 7 Lecture 5-58 April 17, 2018

59 A closer look at spatial dimensions: 7 7x7 input (spatially) assume 3x3 filter 7 Lecture 5-59 April 17, 2018

60 A closer look at spatial dimensions: 7 7x7 input (spatially) assume 3x3 filter 7 => 5x5 output Lecture 5-60 April 17, 2018

61 Common settings: K = (powers of 2, e.g. 32, 64, 128, 512) - F = 3, S = 1, P = 1 - F = 5, S = 1, P = 2 - F = 5, S = 2, P =? (whatever fits) - F = 1, S = 1, P = 0 Lecture 5-61 April 17, 2018

62 Corresponding PyTorch miethod Class torch.nn.conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=true) Lecture 5-62 April 17, 2018

63 Remember back to E.g. 32x32 input convolved repeatedly with 5x5 filters shrinks volumes spatially! (32 -> 28 -> 24...). Shrinking too fast is not good, doesn t work well CONV, ReLU e.g. 6 5x5x3 filters CONV, ReLU e.g. 10 5x5x6 filters 6 CONV, ReLU Lecture 5-63 April 17, 2018

64 Pooling layer - makes the representations smaller and more manageable - operates over each activation map independently: Lecture 5-64 April 17, 2018

65 MAX POOLING Single depth slice x max pool with 2x2 filters and stride y Lecture 5-65 April 17, 2018

66 Summary - ConvNets stack CONV,POOL,FC layers - Trend towards smaller filters and deeper architectures - Trend towards getting rid of POOL/FC layers (just CONV) - Typical architectures look like [(CONV-RELU)*N-POOL?]*M-(FC-RELU)*K,SOFTMAX where N is usually up to ~5, M is large, 0 <= K <= 2. - but recent advances such as ResNet/GoogLeNet challenge this paradigm Lecture 5-66 April 17, 2018

67 CNN Architectures Case Studies - AlexNet VGG GoogLeNet ResNet Also... - NiN (Network in Network) Wide ResNet ResNeXT Stochastic Depth Squeeze-and-Excitation Network - DenseNet FractalNet SqueezeNet NASNet Lecture 9-6 May 1, 2018

68 Review: LeNet-5 [LeCun et al., 1998] Conv filters were 5x5, applied at stride 1 Subsampling (Pooling) layers were 2x2 applied at stride 2 i.e. architecture is [CONV-POOL-CONV-POOL-FC-FC] Lecture 9-6 May 1, 2018

69 Case Study: AlexNet [Krizhevsky et al. 2012] Architecture: CONV1 MAX POOL1 NORM1 CONV2 MAX POOL2 NORM2 CONV3 CONV4 CONV5 Max POOL3 FC6 FC7 FC8 Lecture 9-6 May 1, 2018

70 Case Study: AlexNet [Krizhevsky et al. 2012] Full (simplified) AlexNet architecture: [227x227x3] INPUT [55x55x96] CONV1: 96 11x11 filters at stride 4, pad 0 [27x27x96] MAX POOL1: 3x3 filters at stride 2 [27x27x96] NORM1: Normalization layer [27x27x256] CONV2: 256 5x5 filters at stride 1, pad 2 [13x13x256] MAX POOL2: 3x3 filters at stride 2 [13x13x256] NORM2: Normalization layer [13x13x384] CONV3: 384 3x3 filters at stride 1, pad 1 [13x13x384] CONV4: 384 3x3 filters at stride 1, pad 1 [13x13x256] CONV5: 256 3x3 filters at stride 1, pad 1 [6x6x256] MAX POOL3: 3x3 filters at stride 2 [4096] FC6: 4096 neurons [4096] FC7: 4096 neurons [1000] FC8: 1000 neurons (class scores) Details/Retrospectives: -first use of ReLU -used Norm layers (not common anymore) -heavy data augmentation -dropout 0.5 -batch size 128 -SGD Momentum 0.9 -Learning rate 1e-2, reduced by 10 manually when val accuracy plateaus -L2 weight decay 5e-4 Lecture 9-70 May 1, 2018

71 ImageNet Large Scale Visual Recognition Challenge (ILSVRC) winners 152 layers 152 layers 152 layers Deeper Networks 19 layers shallow 8 layers 22 layers 8 layers Lecture 9-71 May 1, 2018

72 Case Study: VGGNet [Simonyan and Zisserman, 2014] Small filters, Deeper networks 8 layers (AlexNet) -> layers (VGG16Net) Only 3x3 CONV stride 1, pad 1 and 2x2 MAX POOL stride % top 5 error in ILSVRC 13 (ZFNet) -> 7.3% top 5 error in ILSVRC 14 AlexNet VGG16 Lecture 9-72 VGG19 May 1, 2018

73 Case Study: GoogLeNet [Szegedy et al., 2014] Deeper networks, with computational efficiency - 22 layers Efficient Inception module No FC layers Only 5 million parameters! 12x less than AlexNet - ILSVRC 14 classification winner (6.7% top 5 error) Inception module Lecture 9-73 May 1, 2018

74 ImageNet Large Scale Visual Recognition Challenge (ILSVRC) winners Revolution of Depth 152 layers layers 19 layers shallow 8 layers layers 22 layers 8 layers Lecture 9-74 May 1, 2018

75 Case Study: ResNet [He et al., 2015] relu Very deep networks using residual connections F(x) + x layer model for ImageNet - ILSVRC 15 classification winner (3.57% top 5 error) - Swept all classification and detection competitions in ILSVRC 15 and COCO 15! F(x) relu X identity X Residual block Lecture 9-75 May 1, 2018

76 Transfer Learning You need a lot of a data if you want to train/use CNNs Lecture 7-76 April 24, 2018

77 D Transfer Learning BU S TE You need a lot of a data if you want to train/use CNNs Lecture 7-77 April 24, 2018

78 Donahue et al, DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition, ICML 2014 Razavian et al, CNN Features Off-the-Shelf: An Astounding Baseline for Recognition, CVPR Workshops 2014 Transfer Learning with CNNs 1. Train on Imagenet 2. Small Dataset (C classes) 3. Bigger dataset FC-1000 FC-C FC-C FC-4096 FC-4096 FC-4096 FC-4096 FC-4096 MaxPool MaxPool Conv-512 Conv-512 Conv-512 Conv-512 Conv-512 Conv-512 MaxPool MaxPool MaxPool Conv-512 Conv-512 Conv-512 Conv-512 Conv-512 Conv-512 Reinitialize this and train Freeze these MaxPool MaxPool MaxPool Conv-256 Conv-256 Conv-256 Conv-256 Conv-256 Conv-256 With bigger dataset, train more layers MaxPool MaxPool MaxPool MaxPool Conv-128 Conv-128 Conv-128 Conv-128 Conv-128 Conv-128 MaxPool MaxPool MaxPool Conv-64 Conv-64 Conv-64 Conv-64 Conv-64 Conv-64 Image Image Image Train these FC Lecture 7-1 Freeze these Lower learning rate when finetuning; 1/10 of original LR is good starting point April 24, 2018

79 very similar dataset FC-1000 FC-4096 FC-4096 very different dataset MaxPool Conv-512 Conv-512 MaxPool Conv-512 very little data Use Linear You re in Classifier on top trouble Try layer linear classifier from different stages quite a lot of data Finetune a few layers More specific Conv-512 MaxPool Conv-256 Conv-256 MaxPool More generic Conv-128 Conv-128 MaxPool Conv-64 Finetune a larger number of layers Conv-64 Image 10 Lecture 7-4 April 24, 2018

80 Deep Learning Software Lecture 8-2 April 26, 2018

81 A zoo of frameworks! PaddlePaddle Caffe Caffe2 (UC Berkeley) (Facebook) Chainer (Baidu) MXNet CNTK (Amazon) Torch PyTorch (NYU / Facebook) (Facebook) (Microsoft) Developed by U Washington, CMU, MIT, Hong Kong U, etc but main framework of choice at AWS Deeplearning4j Theano TensorFlow (U Montreal) (Google) And others... Lecture 8-2 April 26, 2018

82 Recall: Computational Graphs x * s (scores) hinge loss L + W R Lecture 8-2 April 26, 2018

83 Recall: Computational Graphs input image weights loss Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, Reproduced with permission. Lecture 8-2 April 26, 2018

84 Computational Graphs Numpy x y z * a + b Σ c Lecture 8-2 April 26, 2018

85 Computational Graphs Numpy x y z * Good: - Clean API, easy to write numeric code a + b Σ c Bad: - Have to compute our own gradients - Can t run on GPU Lecture 8-3 April 26, 2018

86 Computational Graphs Numpy x y z PyTorch * a + b Σ c Looks exactly like numpy! Lecture 8-3 April 26, 2018

87 Computational Graphs Numpy x y z PyTorch * a + b Σ c PyTorch handles gradients for us! Lecture 8-3 April 26, 2018

88 PyTorch - Tutorial Tensors Tensors are similar to NumPy s ndarrays, with the addition being that Tensors can also be used on a GPU to accelerate computing. import torch x = torch.rand(5, 3) print(x) Out: tensor([[ [ [ [ [ , , , , , , , , , , ], ], ], ], ]]) Operations y = torch.rand(5, 3) print(x + y) Examples taken from - PyTorch: 60 Minute Blitz tutorial Introduction to CNN and PyTorch - Kripasindhu Sarkar - May 2018

89 PyTorch - Tensor If you set its attribute.requires_grad as True, it starts to track all operations on it. When you finish your computation call.backward() and have all the gradients computed automatically. x = torch.ones(2, 2, requires_grad=true) y = x + 2 //more operations z = y * y * 3 out = z.mean() Gradients //Doing back propagation on the entire computation graph out.backward() //print d(out)/dx print(x.grad) Out: tensor([[ , [ , ], ]]) Introduction to CNN and PyTorch - Kripasindhu Sarkar - May 2018

90 Review: LeNet-5 [LeCun et al., 1998] Conv filters were 5x5, applied at stride 1 Subsampling (Pooling) layers were 2x2 applied at stride 2 i.e. architecture is [CONV-POOL-CONV-POOL-FC-FC] Lecture 9-9 May 1, 2018

91 Machine learning - Summary Model/Score function - F(X, W) Loss function L (F, data samples) Maps input data X to class scores More score for a class - more likely it belongs to that class Measures how good F is Good parameters are found by - minimizing loss function L(W) wrt the variables W The function is minimized by iteratively updating the weights in the negative gradient direction. Gradients are computed using backpropagation. Introduction to CNN and PyTorch - Kripasindhu Sarkar - May 2018

92 PyTorch - Defining the network/model import torch import torch.nn as nn import torch.nn.functional as F class Net(nn.Module): def init (self): super(net, self). init () # 1 input image channel, 6 output channels, 5x5 square convolution # kernel self.conv1 = nn.conv2d(1, 6, 5) self.conv2 = nn.conv2d(6, 16, 5) # an affine operation: y = Wx + b self.fc1 = nn.linear(16 * 5 * 5, 120) self.fc2 = nn.linear(120, 84) self.fc3 = nn.linear(84, 10) def forward(self, x): # Max pooling over a (2, 2) window x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2)) # If the size is a square you can only specify a single number x = F.max_pool2d(F.relu(self.conv2(x)), 2) x = x.view(-1, self.num_flat_features(x)) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x def num_flat_features(self, x): size = x.size()[1:] # all dimensions except the batch dimension num_features = 1 for s in size: num_features *= s return num_features An nn.module contains layers, and a method forward(input) that returns the output. The learnable parameters of a model are returned by net.parameters() Introduction to CNN and PyTorch - Kripasindhu Sarkar - May 2018

93 PyTorch - Defining the network import torch import torch.nn as nn import torch.nn.functional as F class Net(nn.Module): def init (self): super(net, self). init () # 1 input image channel, 6 output channels, 5x5 square convolution # kernel self.conv1 = nn.conv2d(1, 6, 5) self.conv2 = nn.conv2d(6, 16, 5) # an affine operation: y = Wx + b self.fc1 = nn.linear(16 * 5 * 5, 120) self.fc2 = nn.linear(120, 84) self.fc3 = nn.linear(84, 10) def forward(self, x): # Max pooling over a (2, 2) window x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2)) # If the size is a square you can only specify a single number x = F.max_pool2d(F.relu(self.conv2(x)), 2) x = x.view(-1, self.num_flat_features(x)) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x def num_flat_features(self, x): size = x.size()[1:] # all dimensions except the batch dimension num_features = 1 for s in size: num_features *= s return num_features input = torch.randn(1, 1, 32, 32) out = net(input) print(out) tensor([[ , , , , , , , , ]]) net.zero_grad() out.backward(torch.randn(1, 10)) , Recap: torch.tensor - A multi-dimensional array with support for autograd operations like backward(). Also holds the gradient w.r.t. the tensor. nn.module - Neural network module. Convenient way of encapsulating parameters, with helpers for moving them to GPU, exporting, loading, etc. nn.parameter - A kind of Tensor, that is automatically registered as a parameter when assigned as an attribute to a Module. Introduction to CNN and PyTorch - Kripasindhu Sarkar - May 2018

94 PyTorch - Loss Function output = net(input) target = torch.arange(1, 11) # a dummy target, for example target = target.view(1, -1) # make it the same shape as output criterion = nn.mseloss() loss = criterion(output, target) print(loss) tensor( ) Backprop net.zero_grad() # zeroes the gradient buffers of all parameters print('conv1.bias.grad before backward') print(net.conv1.bias.grad) loss.backward() print('conv1.bias.grad after backward') print(net.conv1.bias.grad) conv1.bias.grad before backward tensor([ 0., 0., 0., 0., 0., 0.]) conv1.bias.grad after backward tensor([ , , , , , ]) Introduction to CNN and PyTorch - Kripasindhu Sarkar - May 2018

95 PyTorch - Optimize weight = weight - learning_rate * gradient learning_rate = 0.01 for f in net.parameters(): f.data.sub_(f.grad.data * learning_rate) import torch.optim as optim # create your optimizer optimizer = optim.sgd(net.parameters(), lr=0.01) # in your training loop: optimizer.zero_grad() # zero the gradient buffers output = net(input) loss = criterion(output, target) loss.backward() optimizer.step() # Does the update Train the network Step multiple times using all the data Introduction to CNN and PyTorch - Kripasindhu Sarkar - May 2018

96 PyTorch - Complete training Setup data trans = transforms.compose([transforms.totensor(), transforms.normalize((0.5,), (1.0,))]) # if not exist, download mnist dataset train_set = dset.mnist(root=root, train=true, transform=trans, download=true) test_set = dset.mnist(root=root, train=false, transform=trans, download=true) Train! for epoch in xrange(10): for batch_idx, (x, target) in enumerate(train_loader): batch_size = 100 train_loader = torch.utils.data.dataloader( optimizer.zero_grad() dataset=train_set, out = model(x) batch_size=batch_size, loss = criterion(out, target) shuffle=true) loss.backward() test_loader = torch.utils.data.dataloader( optimizer.step() dataset=test_set, batch_size=batch_size, shuffle=false) model = Net() optimizer = optim.sgd(model.parameters(), lr=0.01, momentum=0.9) criterion = nn.crossentropyloss() Introduction to CNN and PyTorch - Kripasindhu Sarkar - May 2018

97 Further reading CS231n: Introduction to CNN and PyTorch - Kripasindhu Sarkar - May 2018

Convolutional Neural Network Architecture

Convolutional Neural Network Architecture Convolutional Neural Network Architecture Zhisheng Zhong Feburary 2nd, 2018 Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 1 / 55 Outline 1 Introduction of Convolution Motivation

More information

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses

More information

Introduction to Convolutional Neural Networks (CNNs)

Introduction to Convolutional Neural Networks (CNNs) Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei

More information

Lecture 7 Convolutional Neural Networks

Lecture 7 Convolutional Neural Networks Lecture 7 Convolutional Neural Networks CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 17, 2017 We saw before: ŷ x 1 x 2 x 3 x 4 A series of matrix multiplications:

More information

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu Convolutional Neural Networks II Slides from Dr. Vlad Morariu 1 Optimization Example of optimization progress while training a neural network. (Loss over mini-batches goes down over time.) 2 Learning rate

More information

Loss Functions and Optimization. Lecture 3-1

Loss Functions and Optimization. Lecture 3-1 Lecture 3: Loss Functions and Optimization Lecture 3-1 Administrative Assignment 1 is released: http://cs231n.github.io/assignments2017/assignment1/ Due Thursday April 20, 11:59pm on Canvas (Extending

More information

Loss Functions and Optimization. Lecture 3-1

Loss Functions and Optimization. Lecture 3-1 Lecture 3: Loss Functions and Optimization Lecture 3-1 Administrative: Live Questions We ll use Zoom to take questions from remote students live-streaming the lecture Check Piazza for instructions and

More information

Lecture 35: Optimization and Neural Nets

Lecture 35: Optimization and Neural Nets Lecture 35: Optimization and Neural Nets CS 4670/5670 Sean Bell DeepDream [Google, Inceptionism: Going Deeper into Neural Networks, blog 2015] Aside: CNN vs ConvNet Note: There are many papers that use

More information

Lecture 17: Neural Networks and Deep Learning

Lecture 17: Neural Networks and Deep Learning UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions

More information

Outline. CSCI567 Machine Learning (Spring 2019) Outline. Math formulation. Prof. Victor Adamchik. Feb. 12, 2019

Outline. CSCI567 Machine Learning (Spring 2019) Outline. Math formulation. Prof. Victor Adamchik. Feb. 12, 2019 Outline CSCI56 Machine Learning (Spring 29) Review of last lecture Prof. Victor Adamchik 2 Convolutional neural networks U of Southern California Feb. 2, 29 Kernel methods February 2, 29 / 48 February

More information

Convolutional neural networks

Convolutional neural networks 11-1: Convolutional neural networks Prof. J.C. Kao, UCLA Convolutional neural networks Motivation Biological inspiration Convolution operation Convolutional layer Padding and stride CNN architecture 11-2:

More information

CS 1674: Intro to Computer Vision. Final Review. Prof. Adriana Kovashka University of Pittsburgh December 7, 2016

CS 1674: Intro to Computer Vision. Final Review. Prof. Adriana Kovashka University of Pittsburgh December 7, 2016 CS 1674: Intro to Computer Vision Final Review Prof. Adriana Kovashka University of Pittsburgh December 7, 2016 Final info Format: multiple-choice, true/false, fill in the blank, short answers, apply an

More information

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35 Neural Networks David Rosenberg New York University July 26, 2017 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 1 / 35 Neural Networks Overview Objectives What are neural networks? How

More information

Convolutional Neural Networks

Convolutional Neural Networks Convolutional Neural Networks Books» http://www.deeplearningbook.org/ Books http://neuralnetworksanddeeplearning.com/.org/ reviews» http://www.deeplearningbook.org/contents/linear_algebra.html» http://www.deeplearningbook.org/contents/prob.html»

More information

Introduction to Machine Learning (67577)

Introduction to Machine Learning (67577) Introduction to Machine Learning (67577) Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Deep Learning Shai Shalev-Shwartz (Hebrew U) IML Deep Learning Neural Networks

More information

Deep Learning (CNNs)

Deep Learning (CNNs) 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deep Learning (CNNs) Deep Learning Readings: Murphy 28 Bishop - - HTF - - Mitchell

More information

Introduction to PyTorch

Introduction to PyTorch Introduction to PyTorch Benjamin Roth Centrum für Informations- und Sprachverarbeitung Ludwig-Maximilian-Universität München beroth@cis.uni-muenchen.de Benjamin Roth (CIS) Introduction to PyTorch 1 / 16

More information

arxiv: v1 [cs.cv] 11 May 2015 Abstract

arxiv: v1 [cs.cv] 11 May 2015 Abstract Training Deeper Convolutional Networks with Deep Supervision Liwei Wang Computer Science Dept UIUC lwang97@illinois.edu Chen-Yu Lee ECE Dept UCSD chl260@ucsd.edu Zhuowen Tu CogSci Dept UCSD ztu0@ucsd.edu

More information

Pytorch Tutorial. Xiaoyong Yuan, Xiyao Ma 2018/01

Pytorch Tutorial. Xiaoyong Yuan, Xiyao Ma 2018/01 (Li Lab) National Science Foundation Center for Big Learning (CBL) Department of Electrical and Computer Engineering (ECE) Department of Computer & Information Science & Engineering (CISE) Pytorch Tutorial

More information

Understanding How ConvNets See

Understanding How ConvNets See Understanding How ConvNets See Slides from Andrej Karpathy Springerberg et al, Striving for Simplicity: The All Convolutional Net (ICLR 2015 workshops) CSC321: Intro to Machine Learning and Neural Networks,

More information

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 Machine Learning for Signal Processing Neural Networks Continue Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 1 So what are neural networks?? Voice signal N.Net Transcription Image N.Net Text

More information

SGD and Deep Learning

SGD and Deep Learning SGD and Deep Learning Subgradients Lets make the gradient cheating more formal. Recall that the gradient is the slope of the tangent. f(w 1 )+rf(w 1 ) (w w 1 ) Non differentiable case? w 1 Subgradients

More information

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting

More information

Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets)

Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets) COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets) Sanjeev Arora Elad Hazan Recap: Structure of a deep

More information

Jakub Hajic Artificial Intelligence Seminar I

Jakub Hajic Artificial Intelligence Seminar I Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network

More information

Deep Learning Lecture 2

Deep Learning Lecture 2 Fall 2016 Machine Learning CMPSCI 689 Deep Learning Lecture 2 Sridhar Mahadevan Autonomous Learning Lab UMass Amherst COLLEGE Outline of lecture New type of units Convolutional units, Rectified linear

More information

Neural Networks 2. 2 Receptive fields and dealing with image inputs

Neural Networks 2. 2 Receptive fields and dealing with image inputs CS 446 Machine Learning Fall 2016 Oct 04, 2016 Neural Networks 2 Professor: Dan Roth Scribe: C. Cheng, C. Cervantes Overview Convolutional Neural Networks Recurrent Neural Networks 1 Introduction There

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Machine Learning Basics

Machine Learning Basics Security and Fairness of Deep Learning Machine Learning Basics Anupam Datta CMU Spring 2019 Image Classification Image Classification Image classification pipeline Input: A training set of N images, each

More information

Neural networks and optimization

Neural networks and optimization Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional

More information

CSCI 315: Artificial Intelligence through Deep Learning

CSCI 315: Artificial Intelligence through Deep Learning CSCI 35: Artificial Intelligence through Deep Learning W&L Fall Term 27 Prof. Levy Convolutional Networks http://wernerstudio.typepad.com/.a/6ad83549adb53ef53629ccf97c-5wi Convolution: Convolution is

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Introduction to Deep Neural Networks

Introduction to Deep Neural Networks Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Neural Networks: A brief touch Yuejie Chi Department of Electrical and Computer Engineering Spring 2018 1/41 Outline

More information

Neural Architectures for Image, Language, and Speech Processing

Neural Architectures for Image, Language, and Speech Processing Neural Architectures for Image, Language, and Speech Processing Karl Stratos June 26, 2018 1 / 31 Overview Feedforward Networks Need for Specialized Architectures Convolutional Neural Networks (CNNs) Recurrent

More information

Convolutional Neural Networks. Srikumar Ramalingam

Convolutional Neural Networks. Srikumar Ramalingam Convolutional Neural Networks Srikumar Ramalingam Reference Many of the slides are prepared using the following resources: neuralnetworksanddeeplearning.com (mainly Chapter 6) http://cs231n.github.io/convolutional-networks/

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation) Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Slide Set 6: Neural Networks and Deep Learning January 2018 Heikki Huttunen heikki.huttunen@tut.fi Department of Signal Processing Tampere University of Technology

More information

Introduction to (Convolutional) Neural Networks

Introduction to (Convolutional) Neural Networks Introduction to (Convolutional) Neural Networks Philipp Grohs Summer School DL and Vis, Sept 2018 Syllabus 1 Motivation and Definition 2 Universal Approximation 3 Backpropagation 4 Stochastic Gradient

More information

Bayesian Networks (Part I)

Bayesian Networks (Part I) 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Networks (Part I) Graphical Model Readings: Murphy 10 10.2.1 Bishop 8.1,

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Lecture 9 Numerical optimization and deep learning Niklas Wahlström Division of Systems and Control Department of Information Technology Uppsala University niklas.wahlstrom@it.uu.se

More information

Νεςπο-Ασαυήρ Υπολογιστική Neuro-Fuzzy Computing

Νεςπο-Ασαυήρ Υπολογιστική Neuro-Fuzzy Computing Νεςπο-Ασαυήρ Υπολογιστική Neuro-Fuzzy Computing ΗΥ418 Διδάσκων Δημήτριος Κατσαρός @ Τμ. ΗΜΜΥ Πανεπιστήμιο Θεσσαλίαρ Διάλεξη 21η BackProp for CNNs: Do I need to understand it? Why do we have to write the

More information

Supervised Deep Learning

Supervised Deep Learning Supervised Deep Learning Joana Frontera-Pons Grigorios Tsagkatakis Dictionary Learning on Manifolds workshop Nice, September 2017 Supervised Learning Data Labels Model Prediction Spiral Exploiting prior

More information

A Practitioner s Guide to MXNet

A Practitioner s Guide to MXNet 1/34 A Practitioner s Guide to MXNet Xingjian Shi Hong Kong University of Science and Technology (HKUST) HKUST CSE Seminar, March 31st, 2017 2/34 Outline 1 Introduction Deep Learning Basics MXNet Highlights

More information

Neural Networks. Learning and Computer Vision Prof. Olga Veksler CS9840. Lecture 10

Neural Networks. Learning and Computer Vision Prof. Olga Veksler CS9840. Lecture 10 CS9840 Learning and Computer Vision Prof. Olga Veksler Lecture 0 Neural Networks Many slides are from Andrew NG, Yann LeCun, Geoffry Hinton, Abin - Roozgard Outline Short Intro Perceptron ( layer NN) Multilayer

More information

Based on the original slides of Hung-yi Lee

Based on the original slides of Hung-yi Lee Based on the original slides of Hung-yi Lee Google Trends Deep learning obtains many exciting results. Can contribute to new Smart Services in the Context of the Internet of Things (IoT). IoT Services

More information

Neural Networks: Backpropagation

Neural Networks: Backpropagation Neural Networks: Backpropagation Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others

More information

Tasks ADAS. Self Driving. Non-machine Learning. Traditional MLP. Machine-Learning based method. Supervised CNN. Methods. Deep-Learning based

Tasks ADAS. Self Driving. Non-machine Learning. Traditional MLP. Machine-Learning based method. Supervised CNN. Methods. Deep-Learning based UNDERSTANDING CNN ADAS Tasks Self Driving Localizati on Perception Planning/ Control Driver state Vehicle Diagnosis Smart factory Methods Traditional Deep-Learning based Non-machine Learning Machine-Learning

More information

Introduction to Deep Learning CMPT 733. Steven Bergner

Introduction to Deep Learning CMPT 733. Steven Bergner Introduction to Deep Learning CMPT 733 Steven Bergner Overview Renaissance of artificial neural networks Representation learning vs feature engineering Background Linear Algebra, Optimization Regularization

More information

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group Machine Learning for Computer Vision 8. Neural Networks and Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group INTRODUCTION Nonlinear Coordinate Transformation http://cs.stanford.edu/people/karpathy/convnetjs/

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Neural networks Daniel Hennes 21.01.2018 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Logistic regression Neural networks Perceptron

More information

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer CSE446: Neural Networks Spring 2017 Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer Human Neurons Switching time ~ 0.001 second Number of neurons 10 10 Connections per neuron 10 4-5 Scene

More information

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17 3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural

More information

Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net

Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net Supplementary Material Xingang Pan 1, Ping Luo 1, Jianping Shi 2, and Xiaoou Tang 1 1 CUHK-SenseTime Joint Lab, The Chinese University

More information

>TensorFlow and deep learning_

>TensorFlow and deep learning_ >TensorFlow and deep learning_ without a PhD deep Science! #Tensorflow deep Code... @martin_gorner Hello World: handwritten digits classification - MNIST? MNIST = Mixed National Institute of Standards

More information

Local Affine Approximators for Improving Knowledge Transfer

Local Affine Approximators for Improving Knowledge Transfer Local Affine Approximators for Improving Knowledge Transfer Suraj Srinivas & François Fleuret Idiap Research Institute and EPFL {suraj.srinivas, francois.fleuret}@idiap.ch Abstract The Jacobian of a neural

More information

CSCI567 Machine Learning (Fall 2018)

CSCI567 Machine Learning (Fall 2018) CSCI567 Machine Learning (Fall 2018) Prof. Haipeng Luo U of Southern California Sep 12, 2018 September 12, 2018 1 / 49 Administration GitHub repos are setup (ask TA Chi Zhang for any issues) HW 1 is due

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

EVERYTHING YOU NEED TO KNOW TO BUILD YOUR FIRST CONVOLUTIONAL NEURAL NETWORK (CNN)

EVERYTHING YOU NEED TO KNOW TO BUILD YOUR FIRST CONVOLUTIONAL NEURAL NETWORK (CNN) EVERYTHING YOU NEED TO KNOW TO BUILD YOUR FIRST CONVOLUTIONAL NEURAL NETWORK (CNN) TARGETED PIECES OF KNOWLEDGE Linear regression Activation function Multi-Layers Perceptron (MLP) Stochastic Gradient Descent

More information

Binary Convolutional Neural Network on RRAM

Binary Convolutional Neural Network on RRAM Binary Convolutional Neural Network on RRAM Tianqi Tang, Lixue Xia, Boxun Li, Yu Wang, Huazhong Yang Dept. of E.E, Tsinghua National Laboratory for Information Science and Technology (TNList) Tsinghua

More information

Handwritten Indic Character Recognition using Capsule Networks

Handwritten Indic Character Recognition using Capsule Networks Handwritten Indic Character Recognition using Capsule Networks Bodhisatwa Mandal,Suvam Dubey, Swarnendu Ghosh, RiteshSarkhel, Nibaran Das Dept. of CSE, Jadavpur University, Kolkata, 700032, WB, India.

More information

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ

More information

Theories of Deep Learning

Theories of Deep Learning Theories of Deep Learning Lecture 02 Donoho, Monajemi, Papyan Department of Statistics Stanford Oct. 4, 2017 1 / 50 Stats 385 Fall 2017 2 / 50 Stats 285 Fall 2017 3 / 50 Course info Wed 3:00-4:20 PM in

More information

PRUNING CONVOLUTIONAL NEURAL NETWORKS. Pavlo Molchanov Stephen Tyree Tero Karras Timo Aila Jan Kautz

PRUNING CONVOLUTIONAL NEURAL NETWORKS. Pavlo Molchanov Stephen Tyree Tero Karras Timo Aila Jan Kautz PRUNING CONVOLUTIONAL NEURAL NETWORKS Pavlo Molchanov Stephen Tyree Tero Karras Timo Aila Jan Kautz 2017 WHY WE CAN PRUNE CNNS? 2 WHY WE CAN PRUNE CNNS? Optimization failures : Some neurons are "dead":

More information

Non-Linearity. CS 188: Artificial Intelligence. Non-Linear Separators. Non-Linear Separators. Deep Learning I

Non-Linearity. CS 188: Artificial Intelligence. Non-Linear Separators. Non-Linear Separators. Deep Learning I Non-Linearity CS 188: Artificial Intelligence Deep Learning I Instructors: Pieter Abbeel & Anca Dragan --- University of California, Berkeley [These slides were created by Dan Klein, Pieter Abbeel, Anca

More information

Recurrent Neural Networks (RNN) and Long-Short-Term-Memory (LSTM) Yuan YAO HKUST

Recurrent Neural Networks (RNN) and Long-Short-Term-Memory (LSTM) Yuan YAO HKUST 1 Recurrent Neural Networks (RNN) and Long-Short-Term-Memory (LSTM) Yuan YAO HKUST Summary We have shown: Now First order optimization methods: GD (BP), SGD, Nesterov, Adagrad, ADAM, RMSPROP, etc. Second

More information

Deep Learning: a gentle introduction

Deep Learning: a gentle introduction Deep Learning: a gentle introduction Jamal Atif jamal.atif@dauphine.fr PSL, Université Paris-Dauphine, LAMSADE February 8, 206 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 / Why

More information

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo

More information

Deep Residual. Variations

Deep Residual. Variations Deep Residual Network and Its Variations Diyu Yang (Originally prepared by Kaiming He from Microsoft Research) Advantages of Depth Degradation Problem Possible Causes? Vanishing/Exploding Gradients. Overfitting

More information

Deep Learning Lecture 2

Deep Learning Lecture 2 Fall 2015 Deep Learning CMPSCI 697L Deep Learning Lecture 2 Sridhar Mahadevan Autonomous Learning Lab UMass Amherst COLLEGE Outline Some topics to be covered: 1. Quick review of classic neural nets, single

More information

A summary of Deep Learning without Poor Local Minima

A summary of Deep Learning without Poor Local Minima A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given

More information

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural Networks. Nicholas Ruozzi University of Texas at Dallas Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify

More information

Understanding CNNs using visualisation and transformation analysis

Understanding CNNs using visualisation and transformation analysis Understanding CNNs using visualisation and transformation analysis Andrea Vedaldi Medical Imaging Summer School August 2016 Image representations 2 encoder Φ representation An encoder maps the data into

More information

Neural Networks and Introduction to Deep Learning

Neural Networks and Introduction to Deep Learning 1 Neural Networks and Introduction to Deep Learning Neural Networks and Introduction to Deep Learning 1 Introduction Deep learning is a set of learning methods attempting to model data with complex architectures

More information

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4 Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:

More information

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions? Online Videos FERPA Sign waiver or sit on the sides or in the back Off camera question time before and after lecture Questions? Lecture 1, Slide 1 CS224d Deep NLP Lecture 4: Word Window Classification

More information

P-TELU : Parametric Tan Hyperbolic Linear Unit Activation for Deep Neural Networks

P-TELU : Parametric Tan Hyperbolic Linear Unit Activation for Deep Neural Networks P-TELU : Parametric Tan Hyperbolic Linear Unit Activation for Deep Neural Networks Rahul Duggal rahulduggal2608@gmail.com Anubha Gupta anubha@iiitd.ac.in SBILab (http://sbilab.iiitd.edu.in/) Deptt. of

More information

Day 3 Lecture 3. Optimizing deep networks

Day 3 Lecture 3. Optimizing deep networks Day 3 Lecture 3 Optimizing deep networks Convex optimization A function is convex if for all α [0,1]: f(x) Tangent line Examples Quadratics 2-norms Properties Local minimum is global minimum x Gradient

More information

arxiv: v4 [cs.cv] 6 Sep 2017

arxiv: v4 [cs.cv] 6 Sep 2017 Deep Pyramidal Residual Networks Dongyoon Han EE, KAIST dyhan@kaist.ac.kr Jiwhan Kim EE, KAIST jhkim89@kaist.ac.kr Junmo Kim EE, KAIST junmo.kim@kaist.ac.kr arxiv:1610.02915v4 [cs.cv] 6 Sep 2017 Abstract

More information

Architecture Multilayer Perceptron (MLP)

Architecture Multilayer Perceptron (MLP) Architecture Multilayer Perceptron (MLP) 1 Output Hidden Input Neurons partitioned into layers; y 1 y 2 one input layer, one output layer, possibly several hidden layers layers numbered from 0; the input

More information

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 8. Sinan Kalkan

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 8. Sinan Kalkan CENG 783 Special topics in Deep Learning AlchemyAPI Week 8 Sinan Kalkan Loss functions Many correct labels case: Binary prediction for each label, independently: L i = σ j max 0, 1 y ij f j y ij = +1 if

More information

Demystifying deep learning. Artificial Intelligence Group Department of Computer Science and Technology, University of Cambridge, UK

Demystifying deep learning. Artificial Intelligence Group Department of Computer Science and Technology, University of Cambridge, UK Demystifying deep learning Petar Veličković Artificial Intelligence Group Department of Computer Science and Technology, University of Cambridge, UK London Data Science Summit 20 October 2017 Introduction

More information

Machine Learning Lecture 10

Machine Learning Lecture 10 Machine Learning Lecture 10 Neural Networks 26.11.2018 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Today s Topic Deep Learning 2 Course Outline Fundamentals Bayes

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks Philipp Koehn 3 October 207 Linear Models We used before weighted linear combination of feature values h j and weights λ j score(λ, d i ) = j λ j h j (d i ) Such models

More information

CSC321 Lecture 16: ResNets and Attention

CSC321 Lecture 16: ResNets and Attention CSC321 Lecture 16: ResNets and Attention Roger Grosse Roger Grosse CSC321 Lecture 16: ResNets and Attention 1 / 24 Overview Two topics for today: Topic 1: Deep Residual Networks (ResNets) This is the state-of-the

More information

text classification 3: neural networks

text classification 3: neural networks text classification 3: neural networks CS 585, Fall 2018 Introduction to Natural Language Processing http://people.cs.umass.edu/~miyyer/cs585/ Mohit Iyyer College of Information and Computer Sciences University

More information

Introduction to Neural Networks

Introduction to Neural Networks CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character

More information

Let your machine do the learning

Let your machine do the learning Let your machine do the learning Neural Networks Daniel Hugo Cámpora Pérez Universidad de Sevilla CERN Inverted CERN School of Computing, 6th - 8th March, 2017 Daniel Hugo Cámpora Pérez Let your machine

More information

Introduction to Convolutional Neural Networks 2018 / 02 / 23

Introduction to Convolutional Neural Networks 2018 / 02 / 23 Introduction to Convolutional Neural Networks 2018 / 02 / 23 Buzzword: CNN Convolutional neural networks (CNN, ConvNet) is a class of deep, feed-forward (not recurrent) artificial neural networks that

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Intro to Neural Networks and Deep Learning

Intro to Neural Networks and Deep Learning Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi UVA CS 6316 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions Backpropagation Nonlinearity Functions NNs

More information

Backpropagation and Neural Networks part 1. Lecture 4-1

Backpropagation and Neural Networks part 1. Lecture 4-1 Lecture 4: Backpropagation and Neural Networks part 1 Lecture 4-1 Administrative A1 is due Jan 20 (Wednesday). ~150 hours left Warning: Jan 18 (Monday) is Holiday (no class/office hours) Also note: Lectures

More information

Introduction to Deep Learning

Introduction to Deep Learning Introduction to Deep Learning A. G. Schwing & S. Fidler University of Toronto, 2015 A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 1 / 39 Outline 1 Universality of Neural Networks

More information

Introduction to Deep Learning

Introduction to Deep Learning Introduction to Deep Learning Some slides and images are taken from: David Wolfe Corne Wikipedia Geoffrey A. Hinton https://www.macs.hw.ac.uk/~dwcorne/teaching/introdl.ppt Feedforward networks for function

More information

Understanding Neural Networks : Part I

Understanding Neural Networks : Part I TensorFlow Workshop 2018 Understanding Neural Networks Part I : Artificial Neurons and Network Optimization Nick Winovich Department of Mathematics Purdue University July 2018 Outline 1 Neural Networks

More information

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of

More information

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders

More information

Classification of Hand-Written Digits Using Scattering Convolutional Network

Classification of Hand-Written Digits Using Scattering Convolutional Network Mid-year Progress Report Classification of Hand-Written Digits Using Scattering Convolutional Network Dongmian Zou Advisor: Professor Radu Balan Co-Advisor: Dr. Maneesh Singh (SRI) Background Overview

More information