Convolutional Networks 2: Training, deep convolutional networks

Size: px

Start display at page:

Download "Convolutional Networks 2: Training, deep convolutional networks"

Bethany Barnett
5 years ago
Views:

1 Convoutiona Networks 2: Training, deep convoutiona networks Hakan Bien Machine Learning Practica MLP Lecture 8 30 October / 6 November 2018 MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 1

2 Question Q1. How can we increase the receptive fied area of a conv ayer? MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 2

3 Question Q1. How can we increase the receptive fied area of a conv ayer? Q2. Can we do it without increasing kerne size? MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 2

4 Input arguments for convoution function in channes out channes kerne size stride padding bias MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 3

5 Input arguments for convoution function in channes out channes kerne size stride padding bias diation? groups? MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 3

6 Diated convoutions MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 4

7 Diated convoutions Increased receptive fied by infating the kerne by inserting D 1 spaces between the kerne eements MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 4

8 Diated convoutions Increased receptive fied by infating the kerne by inserting D 1 spaces between the kerne eements MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 4

9 Diated convoutions Increased receptive fied by infating the kerne by inserting D 1 spaces between the kerne eements Why to increase receptive fied size? MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 4

Diated convoutions Increased receptive fied by infating the kerne by inserting D 1 spaces between the kerne eements Yu & Kotun, Muti-scae context aggregation by diated

10 Diated convoutions Increased receptive fied by infating the kerne by inserting D 1 spaces between the kerne eements Yu & Kotun, Muti-scae context aggregation by diated convoutions, ICLR, MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 4

11 (Convoutiona) fiter groups MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 5

12 (Convoutiona) fiter groups G is number of groups Reduces number of convoutiona fiters (or parameters) Reguarisation effect MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 5

13 Convoution and cross-correation We can write the feature map hidden unit equation (Index-0): h i,j = I (m + i, n + j)w (m, n) m=0 n=0 h = W I is a cross-correation MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 6

14 Convoution and cross-correation We can write the feature map hidden unit equation (Index-0): h i,j = I (m + i, n + j)w (m, n) m=0 n=0 h = W I is a cross-correation In signa processing a 2D convoution is written as h i,j = (V I ) = I (m, n)v (i m, j n) m=0 n=0 h i,j = (I V ) = I (i m, j n)v (m, n) m=0 n=0 If we fip (refect horizontay and verticay) W (cross-correation) then we obtain V (convoution) MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 6

15 Training Convoutiona Networks Forward pass MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 7

16 Training Convoutiona Networks Forward pass Backward pass MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 7

17 Exampe h11 = w11h w 12h w 21h w 22h b h12 = w11h w 12h w 21h w 22h b h21 = w11h w 12h w 21h w 22h b h22 = w11h w 12h w 21h w 22h b MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 8

18 Gradients of E w.r.t W MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 9

19 Exampe h11 = w11h w 12h w 21h w 22h b h12 = w11h w 12h w 21h w 22h b h21 = w11h w 12h w 21h w 22h b h22 = w11h w 12h w 21h w 22h b Let s cacuate the parameter updates ( E W ) E w 11 = E h11 h11 w11 + E h12 h12 w11 + E h21 h21 w11 + E h22 h22 w11 MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 10

20 Exampe h11 = w11h w 12h w 21h w 22h b h12 = w11h w 12h w 21h w 22h b h21 = w11h w 12h w 21h w 22h b h22 = w11h w 12h w 21h w 22h b Let s cacuate the parameter updates ( E W ) E w 11 = E h11 h E h12 h E h21 h E h22 h22 1 MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 10

21 Exampe h11 = w11h w 12h w 21h w 22h b h12 = w11h w 12h w 21h w 22h b h21 = w11h w 12h w 21h w 22h b h22 = w11h w 12h w 21h w 22h b Let s cacuate the parameter updates ( E W ) E w 12 = E h11 h E h1,2 h E h21 h E h22 h23 1 MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 10

22 Exampe h11 = w11h w 12h w 21h w 22h b h12 = w11h w 12h w 21h w 22h b h21 = w11h w 12h w 21h w 22h b h22 = w11h w 12h w 21h w 22h b Let s cacuate the parameter updates ( E W ) E w 2,1 = E h11 h E h12 h E h21 h E h22 h32 1 MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 10

23 Exampe h11 = w11h w 12h w 21h w 22h b h12 = w11h w 12h w 21h w 22h b h21 = w11h w 12h w 21h w 22h b h22 = w11h w 12h w 21h w 22h b Let s cacuate the parameter updates ( E W ) E w 2,2 = E h11 h E h12 h E h21 h E h22 h33 1 MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 10

24 Gradients of E w.r.t W E w 1,1 E w 12 E w 21 E w 22 = E h11 h E h12 h E h21 h E h22 h 1 22 = E h11 h E h12 h E h21 h E h22 h 1 23 = E h11 h E h12 h E h21 h E h22 h 1 32 = E h11 h E h12 h E h21 h E h22 h 1 33 Given H R M N E w r,s = M N E h m=1 n=1 m,n h 1 r+m 1,s+n 1 MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 11

25 Gradients of E w.r.t W E w r,s = M N E h m=1 n=1 m,n h 1 r+m 1,s+n 1 MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 12

26 Gradients of E w.r.t H 1 Forward pass Backward pass MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 13

27 Gradients of E w.r.t H 1 MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 14

28 Gradients of E w.r.t H 1 Imagine inverting the receptive fied! MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 14

29 Gradients of E w.r.t h 1 h11 = w11h w 12h w 21h w 22h b h12 = w11h w 12h w 21h w 22h b h21 = w11h w 12h w 21h w 22h b h22 = w11h w 12h w 21h w 22h b Let s cacuate the gradients of oss function (E) with respect to previous ayer (H 1 ) E h 1 11 = E h 11 h11 h E h12 h12 h E h21 h21 h E h22 h 22 h 1 11 MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 15

30 Gradients of E w.r.t h 1 h11 = w11h w 12h w 21h w 22h b h12 = w11h w 12h w 21h w 22h b h21 = w11h w 12h w 21h w 22h b h22 = w11h w 12h w 21h w 22h b Let s cacuate the gradients of oss function (E) with respect to previous ayer (H 1 ) E h 1 11 = E h 11 h11 h E h12 h12 h E h21 h21 h E h22 h 22 h 1 11 E h 1 11 = E h11 w11 + E h E h E h22 0 MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 15

31 Gradients of E w.r.t h 1 h11 = w11h w 12h w 21h w 22h b h12 = w11h w 12h w 21h w 22h b h21 = w11h w 12h w 21h w 22h b h22 = w11h w 12h w 21h w 22h b Let s cacuate the gradients of oss function (E) with respect to previous ayer (H 1 ) E h 1 22 = E h 11 h11 h E h12 h12 h E h21 h21 h E h22 h 22 h 1 22 MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 15

32 Gradients of E w.r.t h 1 h11 = w11h w 12h w 21h w 22h b h12 = w11h w 12h w 21h w 22h b h21 = w11h w 12h w 21h w 22h b h22 = w11h w 12h w 21h w 22h b Let s cacuate the gradients of oss function (E) with respect to previous ayer (H 1 ) E h 1 22 = E h 11 h11 h E h12 h12 h E h21 h21 h E h22 h 22 h 1 22 E h 1 22 = E h11 w22 + E h12 w21 + E h21 w12 + E h22 w11 MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 15

33 Gradients of E w.r.t H 1 Imagine inverting the receptive fied! MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 16

34 Backpropagation for pooing Max function: m = max(a, b) { 1 if a > b, m a = 0 ese. m b = { 1 if b > a, 0 ese. MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 17

35 Backpropagation for pooing Max function: m = max(a, b) { 1 if a > b, m a = 0 ese. m b = { 1 if b > a, 0 ese. MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 17

36 Backpropagation for pooing Max function: m = max(a, b) { 1 if a > b, m a = 0 ese. m b = { 1 if b > a, 0 ese. MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 17

37 Impementing fuy-connected networks Exampe at a time: d k k input vector output vector d weight matrix MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 18

38 Impementing fuy-connected networks Minibatch: d k k n d n input vector (minibatch) weight matrix output vector (minibatch) MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 18

39 Impementing fuy-connected networks Minibatch: d k k n d n input vector (minibatch) weight matrix output vector (minibatch) input dimension x minibatch: Represent each ayer as a 2-dimension matrix, where each row corresponds to a training exampe, and the number of minibatch exampes is the number of rows MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 18

40 Impementing Convoutiona Networks Exampe at a time, singe input image, singe feature map: x m y input image weight matrix (kerne) feature map MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 19

41 Impementing Convoutiona Networks Exampe at a time, singe input image, mutipe feature map: x m y input image weight matrices (kernes) feature maps MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 19

42 Impementing Convoutiona Networks Exampe at a time, mutipe input images, mutipe feature map: x m y mutipe input images weight matrices (kernes) feature maps MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 19

43 Impementing Convoutiona Networks Minibatch, mutipe input images, mutipe feature map: x m y n n weight matrices (kernes) minibatch of mutipe input images minibatch of feature maps MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 19

44 Impementing Convoutiona Networks Inputs / ayer vaues: Each input image (and convutiona and pooing ayer) is 2-dimensions (x,y) If we have mutipe feature maps, then that is a third dimension And the minibatch adds a fourth dimension Thus we represent each input (ayer vaues) using a 4-dimension tensor (array): (minibatch-size, num-fmaps, x, y) Weight matrices (kernes) Each weight matrix used to scan across an image has 2 spatia dimensions (x,y) If there are mutipe feature maps to be computed, then that is a third dimension Mutipe input feature maps adds a fourth dimension Thus the weight matrices are aso represented using a 4-dimension tensor: (F in, F out, x, y) MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 20

45 4D tensors in numpy Both forward and back prop thus invoves mutipying 4D tensors. There are various ways to do this: Expicity oop over the dimensions: this resuts in simper code, but can be inefficient. Athough using cython to compie the oops as C can speed things up Seriaisation: By repicating input patches and weight matrices, it is possibe to convert the required 4D tensor mutipications into a arge dot product. Requires carefu manipuation of indices! Convoutions: use expicit convoution functions for forward and back prop, rotating for the backprop MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 21

46 Deep convoutiona networks MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 22

47 LeNet5 (LeCun et a, 1997) 2 convoutiona ayers {C1, C3} + non-inearity 2 average pooing {S2, S4} 2 fuy connected hidden ayer (no weight sharing) {C5, F6} Softmax cassifier ayer MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 23

ImageNet Cassification ( AexNet ) Krizhevsky, Sutskever and Hinton, ImageNet Cassification with Deep Convoutiona Neura Networks, NIPS 12. http://papers.nips.

48 ImageNet Cassification ( AexNet ) Krizhevsky, Sutskever and Hinton, ImageNet Cassification with Deep Convoutiona Neura Networks, NIPS convoutiona ayers + non-inearity (ReLU) 3 max pooing ayers 2 fuy connected hidden ayer Softmax cassifier ayer MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 24

ImageNet Cassification ( VGGNet ) Simonyan and Zisserman, Very Deep Convoutiona Networks for Large-Scae Visua Recognition, ILSVRC-2014. http://www.robots.ox.ac.

49 ImageNet Cassification ( VGGNet ) Simonyan and Zisserman, Very Deep Convoutiona Networks for Large-Scae Visua Recognition, ILSVRC Network Design Key design choices: 3x3 conv. kernes very sma conv. stride 1 no oss of information Other detais: Rectification (ReLU) non-inearity 5 max-poo ayers (x2 reduction) no normaisation 3 fuy-connected (FC) ayers 224x x112 56x56 28x28 14x14 7x7 image conv-64 conv-64 maxpoo conv-128 conv-128 maxpoo conv-256 conv-256 maxpoo conv-512 conv-512 maxpoo conv-512 conv-512 maxpoo FC-4096 FC-4096 FC-1000 softmax 0.1M wts 0.2M wts 0.3M wts 0.6M wts 1.2M wts 2.4M wts 2.4M wts 2.4M wts 49x512x4096 = 102.8M wts 16.8M wts 4.1M wts Tota 134M wts MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 25

50 Simpy stacking more ayers? He et a, Deep Residua Learning for Image Recognition, CVPR ayer net has higher training error and test error than 20-ayer net! MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 26

51 Deep Residua Learning ( ResNets ) He et a, Deep Residua Learning for Image Recognition, CVPR MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 27

52 Deep Residua Learning ( ResNets ) He et a, Deep Residua Learning for Image Recognition, CVPR MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 27

53 Deep Residua Learning ( ResNets ) He et a, Deep Residua Learning for Image Recognition, CVPR MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 27

Deep Residua Learning ( ResNets ) He et a, Deep Residua Learning for Image Recognition, CVPR-2016. http://arxiv.org/abs/1512.

A soution by construction: origina ayers: copied from a earned extra ayers: set as identity at east the same training error MLP Lecture 8 / 30 October / 6 November 2018 are

In this paper, we address the degradation probem by introducing a deep residua earning framework.

54 Deep Residua Learning ( ResNets ) He et a, Deep Residua Learning for Image Recognition, CVPR x weight ayer F(x) reu x weight ayer identity F(x) + x reu Figure 2. Residua earning: a buiding bock. A soution by construction: origina ayers: copied from a earned extra ayers: set as identity at east the same training error MLP Lecture 8 / 30 October / 6 November 2018 are comparaby good or better than the constructed soution (or unabe to do so in feasibe time). In this paper, we address the degradation probem by introducing a deep residua earning framework. Instead of hoping each few stacked ayers directy fit a desired underying mapping, we expicity et these ayfit a residua mapping. Formay, denoting the desired shaowerers mode underying mapping as H(x), we et the stacked noninear ayers fit another mapping of F(x) := H(x) x. The origina mapping is recast into F(x)+x. We hypothesize that it is easier to optimize the residua mapping than to optimize Convoutiona Networks 2: Training, deep convoutiona networks 27

Deep Residua Learning ( ResNets ) He et a, Deep Residua Learning for Image Recognition, CVPR-2016. http://arxiv.

55 Deep Residua Learning ( ResNets ) He et a, Deep Residua Learning for Image Recognition, CVPR MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 28

Hierarchica Representations Pixe edge texton motif part object Zeier & Fergus, Visuaizing and Understanding Convoutiona Networks, ECCV 14. https://cs.nyu.

56 Hierarchica Representations Pixe edge texton motif part object Zeier & Fergus, Visuaizing and Understanding Convoutiona Networks, ECCV Side credits: Lecun & Ranzato MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 29

57 Summary Convoutiona networks incude oca receptive fieds, weight sharing, and pooing eading Backprop training can aso be impemented as a reverse convoutiona ayer (with the weight matrix rotated) Impement using 4D tensors: Inputs / Layer vaues: minibatch-size, number-fmaps, x, y Weights: F in, F out, x, y Arguments: stride, kerne size, diation, fiter groups Reading: Goodfeow et a, Deep Learning (ch 9) MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks 2: Training, deep convoutiona networks 30

Stochastic Variational Inference with Gradient Linearization

Stochastic Variational Inference with Gradient Linearization Stochastic Variationa Inference with Gradient Linearization Suppementa Materia Tobias Pötz * Anne S Wannenwetsch Stefan Roth Department of Computer Science, TU Darmstadt Preface In this suppementa materia,